Back
other Nov 27, 2023

Critique of Paper: “An astonishing regularity in student learning rate”

1) The reported learning rates are not actually as quantitatively similar as is suggested by the language used to describe them. 2) The learning rates are measured in a way that rests on a critical assumption that students learn nothing from the initial instruction preceding the practice problems – i.e., you can have one student who learns a lot more from the initial instruction and requires far fewer practice problems, and when you calculate their learning rate, it can come out the same as for a student who learns a lot less from the initial instruction and requires far more practice problems.

by Justin Skycak (@justinskycak) justinmath.com 10,774 words
View original

1) The reported learning rates are not actually as quantitatively similar as is suggested by the language used to describe them. 2) The learning rates are measured in a way that rests on a critical assumption that students learn nothing from the initial instruction preceding the practice problems — i.e., you can have one student who learns a lot more from the initial instruction and requires far fewer practice problems, and when you calculate their learning rate, it can come out the same as for a student who learns a lot less from the initial instruction and requires far more practice problems.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.


The paper An astonishing regularity in student learning rate has been making rounds in the news lately. Here are a couple articles:

I’ve been asked about this paper multiple times. For most people, the result is counterintuitive and doesn’t pass the sniff test – but they’re not able to pinpoint a specific issue with the setup, data, or interpretation of the experiment. The purpose of this post is to suggest a specific issue.

Summary

This article has gotten pretty long, so here’s a summary. I have some major criticisms of this paper. Last year, this paper made a lot of headlines and I was asked about it enough that I took some time to dig into the analysis. After digging into the analysis, I came away with the following impressions:

  1. The reported learning rates are not actually as quantitatively similar as is suggested by the language used to describe them — the 75th percentile learners learn 2x as fast per opportunity as the 25th percentile, and even the 75th percentile is far lower than the kind of person we have in mind when we think of somebody who is shockingly good at math.
  2. The learning rates are measured in a way that rests on a critical assumption that students learn nothing from the initial instruction preceding the practice problems — i.e., you can have one student who learns a lot more from the initial instruction and requires far fewer practice problems, and when you calculate their learning rate, it can come out the same as for a student who learns a lot less from the initial instruction and requires far more practice problems.

To summarize these critiques:

Criticism #1: The 75th percentile students learn 2x as fast per opportunity as 25th percentile students. Is that really a “similar” learning rate? That seems like a pretty big difference to me.

If you measure in raw percents, as the paper does, the 75th percentile learners are found to increase their knowledge about 1.5x as fast as 25th percentile learners per problem. If you measure performance in log-odds, which is a more appropriate metric that accounts for the fact that it’s harder to increase performance when one’s performance is high to begin with, the multiplier rises from 1.5x to 2x. It’s debatable whether 2x is really a “similar” learning rate. Personally, I think it is not – not only does “learns twice as fast” feel like a substantial difference, but it is also only comparing the 25th and 75th percentiles, and even the 75th percentile is far lower than the kind of person we have in mind when we think of somebody who is shockingly good at math. For instance, math majors at elite universities tend to be well above the 99th percentile in math.

Criticism #2: You can have one student who learns a lot more from the initial instruction and requires far fewer practice problems, and when you calculate their learning rate per the methodology described in the paper, it can come out the same as for a student who learns a lot less from the initial instruction and requires far more practice problems.

Here’s a concrete illustration using numbers pulled directly from the paper (the 25th and 75th percentile students in Table 2). Suppose you’re teaching two students how to solve a type of math problem.

This clearly illustrates a difference in learning rates, right? Student A needed 3 or 4 questions. Student B needed 13. Student A learns faster, student B learns slower.

Well, in the study, the operational definition of “learning rate” is, to quote, “log-odds increase in performance per opportunity… to reach mastery after noninteractive verbal instruction (i.e., text or lecture).” Opportunities mean practice questions. Log-odds just means you take the performance $P$ and plug it into the formula $\ln \left( \frac{P}{1-P} \right).$

So… according to this definition of learning rate, students A and B learn at roughly the same rate, about 0.1 log odds per practice opportunity.

Full Analysis

Okay, here’s my full analysis. It begins with a more detailed walkthrough of the above and then dives into some personal experiences and research literature.

The Critique

The paper finds that students have wildly different baseline knowledge after initial instruction. Immediately after being shown how to do a problem, some students are almost at mastery right away and only need a couple practice problems. Other students need many more practice problems. However, during those practice problems, students’ knowledge increases at more similar rates (*). The paper interprets this to mean that students come in with wildly different prior knowledge, but learn at similar rates.

I don’t think that’s the right interpretation. It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible. But that assumption just doesn’t make any sense to me, having worked with lots of kids across different ability levels. Even if they have all achieved a reasonably high bar for mastery on each prerequisite and have not yet seen the new topic, some kids just get it right away after you demonstrate just one instance of new skill, whereas other kids need lots of different examples and plenty of practice with feedback before they really start to grok it.

(*) Note that if you measure in raw percents, as the paper does, the 75th percentile learners are found to increase their knowledge about 1.5x as fast as 25th percentile learners per problem. If you measure performance in log-odds, which is a more appropriate metric that accounts for the fact that it’s harder to increase performance when one’s performance is high to begin with, the multiplier rises from 1.5x to 2x. It’s debatable whether 2x is really a “similar” learning rate. Personally, I think it is not — not only does “learns twice as fast” feel like a substantial difference, but it is also only comparing the 25th and 75th percentiles, and even the 75th percentile is far lower than the kind of person we have in mind when we think of somebody who is shockingly good at math. For instance, math majors at elite universities tend to be well above the 99th percentile in math. However, this is not the focus of my critique. In this critique, I wish to highlight a more subtle methodological issue and demonstrate that even if the performance improvement per practice opportunity came out to be exactly the same for all students, this would still not be enough to conclude that all students learn at the same rate.

Concrete Illustration

Here’s a concrete illustration using numbers pulled directly from the paper (the 25th and 75th percentile students in Table 2). Suppose you’re teaching two students how to solve a type of math problem.

This clearly illustrates a difference in learning rates, right? Student A needed 3 or 4 questions. Student B needed 13. Student A learns faster, student B learns slower.

Well, in the study, the operational definition of “learning rate” is, to quote, “log-odds increase in performance per opportunity… to reach mastery after noninteractive verbal instruction (i.e., text or lecture).” Opportunities mean practice questions. Log-odds just means you take the performance $P$ and plug it into the formula $\ln \left( \frac{P}{1-P} \right).$

So… according to this definition of learning rate, students A and B learn at roughly the same rate, about 0.1 log odds per practice opportunity.

Critique Stated More Precisely

Now that we’ve worked through an example with the metrics involved in the paper, I can phrase my critique more preciesely. For simplicity, I’ll refer to “y-intercept” as the student’s level of performance immediately after initial instruction and “slope” as a student’s log-odds performance increase per opportunity.

While I was mildly surprised to read that the 75th percentile students don’t have that much greater slope than the 25th percentile students, it seems at least within the realm of possibility to me, and that’s not what I’m arguing against (though as I elaborated in the earlier section introducing The Critique, calling it “astonishing regularity” seems like hyperbole).

My critique is that I don’t think the y-intercept only measures differences in background knowledge. In my experience teaching and tutoring, I have noticed a second component that is independent from the student’s level of background knowledge, and this second component becomes increasingly relevant as you get up into higher levels of math.

I would loosely describe this second component as some kind of generalization ability. In my experience, individual differences in generalization ability create a phenomenon that math gets really hard for different people at different levels.

While I would agree that most people have enough generalization ability to get through algebra and geometry with a reasonable amount of productive practice, I’ve noticed lots of students run into issues learning higher math (especially calculus and beyond) as the level of technicality and abstraction increases, even in a mastery-based learning environment where they engage in numerous practice opportunities with immediate explanatory feedback.

For this reason I don’t agree that the study results naturally extrapolate as suggested:

Counterexamples

Okay, now it’s time for me to back up my claims about this so-called “generalization ability.” I’ll provide some case studies from personal experience, in which students with the same background knowledge learned new material at very different rates due to differences in generalization ability.

Case Study 1

I spent several years teaching in a radically accelerated math sequence that used a fully individualized, mastery-based learning system. Kids came into the program in 6th grade with knowledge of arithmetic. Granted, at this point, they all had different background knowledge and were starting at different places in the curriculum (typically between 20% and 60% of the way through pre-algebra). But each time they were given a topic, they had mastered all the prerequisites leading up to that topic.

Students’ knowledge profiles were initially estimated through a diagnostic, so there was of course a bit of uncertainty in their knowledge profile when they first started on the system. But after a couple years of working on the system, their knowledge profiles were grounded very soundly in actual work they had completed on the system (not just an “estimate” like the initial diagnostic). The kids were not doing extra math outside of school – they did not do math “for fun” and they would even stop doing their homework if I didn’t stay on top of them.

So we’re talking about 8th graders who are taking calculus, who have completed pre-algebra / algebra / geometry / algebra 2 / precalculus using the same curriculum, and who have not been learning extra math outside of school. They are each moving through individualized learning paths so that whenever they are asked to learn something new, they have evidenced knowledge of all the prerequisites. So it seems more than reasonable to say that they have the same background knowledge.

When these kids hit calculus, everybody finds derivative computations (e.g. the power rule) to be fairly straightforward, but there’s a particular topic that throws some students for a loop: the idea that the derivative is the slope of the tangent line to a curve. Different kids pick up on this topic at wildly different rates. Some kids just get it right away because the difference quotient is an instance of the slope formula. Some kids don’t really make the connection until they see an animation of the secant line turning into the tangent line. Other kids don’t get it even after you show that animation to them, and they continually forget that “slope of tangent line” means the same thing as “derivative” – even though they can tell you what the slope of a line is, compute the slope given two points on a curve, tell you what a limit is, evaluate limits, etc.

The same thing happens in tests of convergence, in particular, the limit comparison test. If you take a really strong student who has learned all the prerequisites, and you show them the series $\sum \frac{1}{-1+n+n^2}$ and ask them to guess whether it converges by thinking of a similar series, they might correctly guess that it converges because it’s similar to the convergent series $\sum \frac{1}{n^2}.$ But if you do the same with a different student who has also learned all the prerequisites, they might incorrectly guess that it diverges because it’s similar to the divergent series $\sum \frac{1}{n}.$ Even though they’ve never encountered this material before, the really strong student intuitively knows what we mean by a “similar series” in the context of this question, while the other student needs many more practice problems to develop that understanding.

Case Study 2

To offer another case study with some more concrete numbers: in an extreme case, while teaching these radically accelerated courses, I was also mentoring another 7th grade student (let’s call him M for “mentoring”) who was learning calculus despite using far fewer practice opportunities. M was incredibly gifted at math and I had worked with him for 3 years beforehand, but not through any sort of structured curriculum, just chatting about math for an hour each weekend. He wasn’t taking any special math courses at school, just his grade-level courses. A couple times I tried to get him doing some more structured work out of some textbooks (I felt like he could be making a lot faster progress that way), but he and his parents weren’t really interested in it, and I didn’t want to push too hard.

Midway through 7th grade, M and his parents started thinking more about his future and came to agree that it would be a good idea for him to knock out the AP Calculus BC exam in 8th grade so that he could make a convincing case for enrolling in university courses the following year. But at the same time, he wasn’t going to take a separate calculus course, and he didn’t want to do a whole lot more work outside of our weekly discussions. The totality of M’s calculus instruction before the exam was limited to an hour-long chat each weekend and 5-10 homework problems per week for about 12 months, about 400 problems total, plus 3 or 4 practice exams.

Meanwhile, students that I taught in the radically accelerated school program did about 300 lessons (each with about 10 questions) and 300 reviews (each with about 3 questions) for a total of about 4000 questions, plus 6 practice exams. Not only did they solve an order of magnitude more problems than M student, they also had more one-on-one time with me (there were only 5 students in the class), they were doing this every single school day for an hour, and they were also working from a far more scaffolded and comprehensive curriculum (whereas for M, I had to slim down the curriculum to the bare essentials, otherwise we wouldn’t get through it all). Yet, M thought the AP exam was pretty easy, came out of the exam quite confident that he got a 5 out of 5, and indeed he did – whereas in my class, the average score was 3.6 out of 5 (two 5’s, a 4, two 2’s), and even the students who ended up getting a 5 did not come out of the exam confident that they got a 5.

(For more context about the level of giftedness of those 4000-problem students: those students were 8th graders studying AP Calculus BC in a radically accelerated math program, but were around the same giftedness level of typical kids in a typical honors math class at a typical school. They started the program in 6th grade, taking Prealgebra. We got them all the way up through AP Calc BC on one class period’s worth of work per school day from 6th-8th grade – we were increasing efficiency, not workload. How the they were selected: they scored at or above the 90th percentile on a middle school math placement exam typically taken by all fifth graders in the district in the spring. They were then invited to join the program. It’s a seventh-grade math skills test, so it provides a somewhat high skill level, but it’s not designed to identify math aptitude. This was in the Pasadena Unified School District, where about two-thirds of the student population qualifies for the federal free and reduced lunch program, and about 44 percent of all K-12 students are educated in private schools, compared to the California average of 11%. Four other students took AP Calculus BC on our system, unaffiliated with our Pasadena school program, completely independent of a classroom, and all but one of them scored a perfect 5 on the AP exam – the other one received a 4. More info here and here.)

Based on my experiences interacting with M and my classroom students, if I had to pick one defining trait that separated them, it would be the generalization ability I described earlier. On any topic, M would require very minimal explanation and he would naturally fill in most of the details. Most students will only absorb a fraction of information presented during initial instruction and will fill in the rest of their understanding as they solve problems that force them to grapple with things they hadn’t absorbed or hadn’t generalized. But M would typically absorb way more information from the initial explanation and generalize it much further. M would also retain it much longer after the initial practice – for instance, we could cover a new topic one week and then he’d be able to recall most of it a week or two later, whereas many students in my class would forget most of a new topic within a couple days of learning it if they did not receive additional practice. That said, M is not immune to forgetting, and it’s not like he’s “locking things into place” indefinitely in his brain. It’s just that his rate of forgetting is much slower.

I’ve worked with plenty of students who are well above average mathematically but not nearly to the extent of M. They are much slower to absorb new information, and even after they are able to consistently solve problems correctly, they will forget it almost entirely within a week or two. Imagine you’re writing code to develop some application, but you’re using some buggy version control where each day, 10% of your code is deleted. That’s how it feels working with these other students. It’s like writing in disappearing ink. On the other hand, for M, it’s like his code gets implemented in a more robust way, and less than 1% gets deleted each day.

More About Exceptionally Fast Learners

M and other exceptionally fast learners are very rare, but they do exist, and as you narrow down to world-class mathematicians, physicists, programmers, etc, they become a lot more common. It’s like how world-class basketball talent is very rare, but simultaneously common enough that the vast majority of people could not hope to become NBA players, or even minor league players, even if they engaged in optimal training for years.

Although rare, these exceptionally fast learners illustrate just how incorrect it is to assume that differences in problem-solving ability after initial instruction (or even before initial instruction) indicate differences in prior knowledge.

In my experience, they can sometimes pick up on things so quickly that if you just show them a question that is “beyond, but not too far beyond” for them, then they will figure it out on the fly without even seeing a demonstration.

As a specific example, back when I was chatting with M about algebra, pretty quickly I realized that the quickest way to get him through the content was to just ask him questions like that (“if $x$ represents a number, and $2x+3=11,$ what is $x$”) without even demonstrating anything.

With these kinds of students, you can often tell them facts and they immediately understand why the facts are true. For instance, if you tell them that $y = a(x - h)^2 + k$ is the general formula for a parabola with vertex at $(h,k),$ they immediately intuitively understand why that is: the graph turns when the $(x-h)^2$ part hits zero. (Almost all students need to be explicitly taught this intuition, and for most students, it does not fully “click” even after being taught it – they “kind of, sort of” get it, and while they can use the formula to solve problems, they have to be exposed to the intuition periodically into the future across wider variety of settings/examples before the formula makes perfect intuitive sense to them.)

You can sometimes even teach formulas by having the students derive the formulas themselves. For example:

Again, these kinds of students are very rare, but they illustrate just how incorrect it is to assume that differences in problem-solving ability after initial instruction (or even before initial instruction) indicate differences in knowledge that the student was previously taught.

Why Care About Exceptionally Fast Learners if They’re So Rare?

I realize that I’m making a big fuss about a very small segment of the population – but I think it’s warranted because if a student wants to learn a subject to an exceptionally high level, enough to build a career around it and achieve a high level of success in their field, then these are the types of other people they’re going to have to compete against. Just like professional sports – at a high enough level, everybody who plays is a member of what was (at lower levels) a very small segment of the population.

Think about it this way. Suppose we ignore the really exceptionally fast learners because they’re rare and effectively invisible in aggregate statistics. For instance, the vast majority of people do not pick up on basketball exceptionally quickly, so suppose we ignore those people who do. We’ll make the (incorrect) assumption that, because they’re a miniscule segment of the population, they will have negligible impact on any sort of conclusion we make about talent development in basketball. Consequently, when we watch a professional basketball game on TV, we’ll come to the conclusion that anyone can become a professional basketball player if they just put in enough work. Do you see the logical flaw that’s happening here?

It may help to hear the famed Douglas Hofstadter (2012) recount the time when he realized that he did not have enough of that so-called “generalization ability” to stand out as a professional mathematician:

A Potential Way to Reconcile the Conclusions of the Paper: Tightening the Definition of “Favorable Learning Conditions”

There is one setting in which the conclusions of the paper might make sense to me. It involves tightening the definition of “favorable learning conditions” to the point that it becomes more theoretical than practical, and it doesn’t imply that students actually learn at similar absolute rates, but here it is.

The paper limits its conclusions to the context of “favorable learning conditions,” which it describes as follows:

Perhaps the definition of “favorable learning conditions” also needs to specify (in some more precise way) that the curriculum is sufficiently granular relative to most students’ comfortable “bite sizes” for learning new information, and includes sufficient review relative to their forgetting rates.

Under that definition, it would make more intuitive sense to me that (barring hard cognitive limits) such favorable learning conditions could to some extent factor out cognitive differences, causing learning rates to appear surprisingly regular. A metaphor:

Of course, under this definition, the regularity in learning rate would be a ceiling effect. The faster learners would be capable of learning faster, but the curriculum would be too granular and/or provide too much review relative to their needs, thereby creating a ceiling effect that prevents fast learners from learning at their top speed.

The definition would need to be amended once more to specify that the curriculum’s granularity is equal to the student’s bite size and rate of review is equal to the student’s rate of forgetting. The amended metaphor:

This amended definition would be free of ceiling effects – and, critically, equal bite rates would not imply equal rates of food volume intake.

This definition would also allow for anecdotes / case studies of math becoming hard for different students at different levels, because the following factors affect students differentially as they move up the levels of math:

It would even allow for the concept of soft and hard ceilings on the highest level of math that one can reach:

Defense Against Misinterpretation

I am NOT saying that anyone’s level of knowledge is set in stone.

I want to see every single student grow and learn as much as they can. But in order to support every student and maximize their learning, it’s necessary to provide some students with more practice than others. If a student is catching on slowly, and you don’t give them enough practice and instead move them on to the next thing before they are able to do the current thing, then you’ll soon push them so far out of their depth that they’ll just be struggling all the time and not actually learning anything, thereby stunting their growth.

Likewise, if a student picks up on something really quickly and you make them practice it for way longer than they need to instead of allowing them to move onward to more advanced material, that’s also stunting their growth.

I’m 100% in the camp of maximizing each individual student’s growth on each individual skill that they’re learning, giving them enough practice to achieve mastery and allowing them to move on immediately after mastery.

I am NOT saying that background knowledge is unimportant.

In hierarchical subjects like mathematics, background knowledge is one of the largest, if not the largest, determinants regarding whether a student will succeed in learning new material. And that’s obvious: how can a student learn something new if they do not know the prerequisites?

All I’m saying is that background knowledge is not the sole determinant. In other words, the reason why reason why professional athletes, musicians, mathematicians, etc. are so good at their skill is typically not just that they started training earlier.

In most complex skill domains, there are typically other factors (cognitive, physical, dispositional, etc.) that come into play. Sometimes these other factors can be improved through extra training, but other times they can’t (e.g., height of basketball players). Even factors that can be improved often have soft limits to the range of improvement that can be accomplished in a reasonable amount of training time.

Of course, extra practice is a big advantage that can, to some extent, make up for a lower rate of skill acquisition. It’s often true that “hard work beats talent when talent doesn’t work hard.” And as a corollary, it’s often true that unreasonably hard work catches up to talent even when talent works reasonably hard.

But the catch is that you have to be working way harder than the people you’re trying to catch up to, and if your rate of skill acquisition is low then even the theoretical maximum possible amount of work you could put forth in your lifetime might not be enough to catch you up and make you competitive.

The good news is that in the early stages of these talent domains, the skills can be learned by virtually everybody. Virtually everybody can learn counting and basic arithmetic; virtually everybody can learn how to dribble a basketball and shoot a free throw; virtually everybody can learn how to play Hot Cross Buns or Ode to Joy on an instrument.

And more good news is that vast majority of people (not all, but most) can learn way more than the basics. Most people can learn algebra and some basic calculus; most people can learn to dribble between the legs and sink three-pointers; most people can learn to play numerous pop songs on an instrument.

But the thing is – even though these seem like advanced skills to the general population, they’re not anywhere close to the skills that you need to become a successful professional in any of these domains, much less a world-class standout professional.

I am NOT saying that students can learn a lot without solving problems.

I was asked the following question about this critique:

Yes, I totally agree that actively solving practice questions is where the vast majority of the learning typically happens. I don’t mean to suggest otherwise.

When I think of the students I’ve worked with who I would characterize as having high generalization ability, they’re a miniscule segment of the total population, so I would not expect them to influence any aggregate metrics. I also don’t think that high y-intercept can be used as a proxy for high generalization ability. I would expect students with high generalization ability to have high y-intercept, but not the converse, because a high y-intercept will also be produced by a student having previously been taught the material.

Additionally, when I claim that there exist students who can learn a lot from text or lecture, I don’t mean that they are learning from passively watching the text or video. What I mean is this: when I’ve seen these students read a text or watch a video, they tend to actively relate it to concrete examples and prior knowledge. It’s like they self-construct their own active learning experience.

It’s totally different from typical students, who don’t think much beyond the words on the page, even if their teacher tries to get them to engage with it. Some part of that is likely due to interest/motivation, but I’d be surprised if cognitive differences (e.g. working memory capacity) didn’t play a role too.

Because these students take such an active role in constructing their own active learning experience, they often end up extrapolating large-scale implications well beyond the scope of what they are expected to learn. As a result, I suppose it might be technically true that these students come in with more prior knowledge, but not because it was actually taught to them, and not in a way that provides evidence for the last sentence in the abstract of the paper:

To be clear: I would agree that often, some portion of the difference in educational achievement across students comes from differences access to learning opportunities. But I would not agree that all students would achieve the same if they all had the same access to such opportunities.

Additionally, as educational technology increases the degree to which individualized instruction is universally available, I would expect differences in achievement to shrink or grow depending on how achievement is measured. For instance:

Literature Search: Speed of Learning

Q&A #1

I received the following question about this critique:

To be clear: I would agree that many instances of “just get it right away” are “already had some exposure.” My claim is just that there are also instances to the contrary, where the student gets it right away but has not been previously introduced to it by an external entity (though maybe it’s possible they’ve thought about something similar internally when organizing information within their own mind).

I haven’t really dug into the literature surrounding the existence or non-existence of fast learners – having worked longitudinally with many students who I would call fast learners (not just “previously-exposed” learners) and many who I would not, I never realized there was even a debate about whether fast learners exist. Though now that I’m aware of this debate, I’ll put it on my reading list.

In the meantime, I’ve previously read about individual differences in working memory capacity (WMC) impacting speed of learning, so I can point to a couple references there. I realize “variation in speed of learning” does not necessarily imply the existence of “just get it right away,” but I think the heart of the question here is less about whether there is evidence some students learn fast in absolute terms, and more about whether there is evidence that some students learn fast relative to other students, due to some factor other than prior exposure to the task being learned.

McDaniel et al. (2014) summarize that multiple studies have linked individual differences in speed of learning and WMC:

These authors suggest that high WMC facilitates abstraction, that is, seeing “the forest for the trees” by learning underlying rules as opposed to memorizing example-specific details:

At the other end of the spectrum, Swanson & Siegel (2011) found that students with learning disabilities generally have lower WMC:

Q&A #2

This is a follow-up to Q&A #1 above.

I checked out the primary sources that McDaniel et al. (2014) referenced as reporting significant correlations between speed of learning and working memory capacity. In both of those primary sources, immediate feedback was provided but it was not explanatory. I also dug deeper into separate literature and did not see any studies of learning rate in the context of explanatory feedback. I agree that it would be interesting to see some of these studies re-run in the context of immediate explanatory feedback.

That said, I did come across some other studies that may be relevant here.

Forgetting Rate

I ran across a couple studies (Zerr et al., 2018; McDermott & Zerr, 2019; both in the context of non-explanatory feedback) highlighting that it is not only the learning rates that are variable, but also the forgetting rates – in particular, faster learners tend to be slower forgetters. This seems noteworthy to me because I would expect forgetting rate to depend less on the means by which the learning was acquired (i.e., favorable vs less favorable).

Additionally, in the context of An astonishing regularity in student learning rate, I wonder if individual differences in forgetting are also tangled up in the initial performance measurement, i.e., “forgotten background knowledge” being interpreted as “lack of background knowledge.”

Anecdotally, when managing learning courses that used a mastery-based learning system, I noticed that weaker students would more frequently need to refer back to reference material on prerequisites even if they had mastered those prerequisites recently, and they would also do worse on quizzes where they were unable to refer back to reference material. The effect of forgetting was more clearly represented in their quiz accuracy and the raw amount of practice time, than in their accuracy or quantity of practice opportunities.

Self-Explanations

Renkl (1997) found that when studying worked examples, self-explanation characteristics correlated with learning outcomes, even when controlling for study time and prerequisite knowledge. Here is their description of the most successful group:

And other groups:

I came across several other studies that found correlations between individual differences in metacognitive abilities and working memory capacity (WMC), which makes me suspect that WMC may be at least partially implicated in self-explanation characteristics.

For instance, Linderholm, Cong, & Zhao (2008) found that low-WMC readers were overconfident in their comprehension of a text, while high-WMC readers were similarly underconfident – I expect this would predispose high-WMC readers to analyze more deeply while reading.

More generally, Komori (2016) summarizes that differences in WMC can impact attentional control, which seems important for learning deep structure from a worked example:

Prat, Seo, & Yamasaki (2015) discuss all these things in more detail:

Anecdotally, when I recall tutoring students who seemed to generalize less from worked examples, one feature that sticks out is that they would often grasp incorrect structure from the worked example (they would typically over-simplify it), and only after struggling with a practice problem (i.e., getting it wrong or getting stuck and having to refer back to the worked example) would they notice some portion of the structure that they had failed to grasp. Gradually, as they struggled with more practice problems, they would chip away at enough of the structure in the worked example – not the full structure in its entirety, but enough to get them to the point of mastery. Students who generalized well, on the other hand, would excavate far more of the correct structure in a worked example.

Q&A #3

I recently added a section to The Math Academy Way, detailing my experience with both modes of counterexample. I’ll paste the section below:

References

Hofstadter, D., & Carter, K. (2012). Some Reflections on Mathematics from a Mathematical Non-mathematician. Mathematics in School, 41 (5), 2-4.

Koedinger, K. R., Carvalho, P. F., Liu, R., & McLaughlin, E. A. (2023). An astonishing regularity in student learning rate. Proceedings of the National Academy of Sciences, 120 (13), e2221311120.

Komori, M. (2016). Effects of working memory capacity on metacognitive monitoring: A study of group differences using a listening span test. Frontiers in psychology, 7, 172995.

Linderholm, T., Cong, X., & Zhao, Q. (2008). Differences in low and high working-memory capacity readers’ cognitive and metacognitive processing patterns as a function of reading for different purposes. Reading Psychology, 29 (1), 61-85.

McDaniel, M. A., Cahill, M. J., Robbins, M., & Wiener, C. (2014). Individual differences in learning and transfer: stable tendencies for learning exemplars versus abstracting rules. Journal of Experimental Psychology: General, 143 (2), 668.

McDermott, K. B., & Zerr, C. L. (2019). Individual differences in learning efficiency. Current Directions in Psychological Science, 28 (6), 607-613.

Prat, C. S., Seo, R., & Yamasaki, B. L. (2015). The role of individual differences in working memory capacity on reading comprehension ability. In Handbook of Individual Differences in Reading (pp. 331-347). Routledge.

Renkl, A. (1997). Learning from worked-out examples: A study on individual differences. Cognitive science, 21 (1), 1-29.

Swanson, H. L., & Siegel, L. (2011). Learning disabilities as a working memory deficit. Experimental Psychology, 49 (1), 5-28.

Zerr, C. L., Berg, J. J., Nelson, S. M., Fishell, A. K., Savalia, N. K., & McDermott, K. B. (2018). Learning efficiency: Identifying individual differences in learning rate and retention in healthy adults. Psychological science, 29 (9), 1436-1450.


Want to get notified about new posts? Join the mailing list and follow on X/Twitter.