Which Cognitive Psychology Findings are Solid, That Can Be Used to Help Students Learn Better?

There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain.

Cross-posted from here.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

In education research, there have been a number of instances where a purported scientific discovery turned out not to be true. “ Learning styles,” anyone?

Unfortunately, but understandably, many educators have come to distrust scientific findings about education as a whole – and this is compounded by an ongoing replication crisis in psychology.

But here’s the thing: sure, many findings don’t hold up, but also, many findings do.

For instance: we know that actively solving problems produces more learning than passively watching a video/lecture or re-reading notes. This sort of thing has been tested scientifically, numerous times, and it is completely replicable. It might as well be a law of physics at this point. In fact, a highly-cited meta-analysis states, verbatim:

”…[C]alls to increase the number of students receiving STEM degrees could be answered, at least in part, by abandoning traditional lecturing in favor of active learning… Given our results, it is reasonable to raise concerns about the continued use of traditional lecturing as a control in future experiments.”

So there you go, that’s one cognitive psychology finding that holds up: active learning beats passive learning.

(To be clear: active learning doesn’t mean that students never watch and listen. It just means that students are actively solving problems as soon as possible following a minimum effective dose of initial explanation, and they spend the vast majority of their time actively solving problems. Also note that active learning does not imply unguided learning or group work – active learning is most effective when all information to be learned is explicitly communicated and all active practice is performed with corrective feedback and guidance. Ideally, over the course of a learning session, students will complete numerous cycles rapidly alternating between minimum effective doses of guided instruction and active practice.)

Another finding: if you don’t review information, you forget it. You can actually model this precisely, mathematically, using a forgetting curve. I’m not exaggerating when I refer to these things as laws of physics – the only real difference is that we’ve gone up several levels of scale and are dealing with noisier stochastic processes (that also have noisier underlying variables).

Okay, but aren’t these findings obvious? Yes, but…

Yes, but in education, obvious strategies often aren’t put into practice. For instance, plenty of classes that still run on a pure lecture format and don’t review previously learned unless it’s the day before a test.
Yes, but there are plenty of other findings that replicate just as well but are not so obvious.

Here are some less obvious findings.

The spacing effect: more long-term retention occurs when you space out your practice, even if it’s the same amount of total practice. As researcher Doug Rohrer states:
- ”…[T]he spacing effect is arguably one of the largest and most robust findings in learning research, and it appears to have few constraints.”
Note: There are tons of more detailed scientific references/quotes I want to include, but I’m going to skip them so not to continue blowing up the length of this already-gigantic post. If you want to see them, here’s a draft I’m working on that covers all these findings (and more) with hundreds of references and relevant quotes pulled out of those references.
A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A “repetition” is a successful review at the appropriate time.
To maximize the amount by which your memory is extended when solving review problems, it’s necessary to avoid looking back at reference material unless you are totally stuck and cannot remember how to proceed. This is called the testing effect, also known as the retrieval practice effect: the best way to review material is to test yourself on it, that is, practice retrieving it from memory, unassisted.
The testing effect (retrieval practice effect) can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice.
During review, it’s also best to spread minimal effective doses of practice across various skills. This is known as mixed practice or interleaving — it’s the opposite of “blocked” practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Mixed practice, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy.
To free up mental processing power, it’s critical to practice low-level skills enough that they can be carried out without requiring conscious effort. This is known as automaticity. Think of a basketball player who is running, dribbling, and strategizing all at the same time — if they had to consciously manage every bounce and every stride, they’d be too overwhelmed to look around and strategize. The same is true in math. I wrote more about the importance of automaticity here.
The most effective type of active learning is deliberate practice, which consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition (effortful repetition, not mindless repetition) and successive refinement. However, because deliberate practice requires intense effort focused in areas beyond one’s repertoire, which tends to be more effortful and less enjoyable, people will tend to avoid it, instead opting to ineffectively practice within their level of comfort (which is never a form of deliberate practice, no matter what activities are performed). I wote more about deliberate practice here.
Instructional techniques that promote the most learning in experts, promote the least learning in beginners, and vice versa. This is known as the expertise reversal effect. An important consequence is that effective methods of practice for students typically should not emulate what experts do in the professional workplace (e.g., working in groups to solve open-ended problems). Beginners (i.e. students) learn most effectively through direct instruction. I wrote more about that here.

Why haven’t these findings transformed education?

Now, this might seem like a lot of new information – a common reaction is “Wow, the field of education is experiencing a revolution!”

But here’s the thing: most key findings have been known for many decades.

It’s just that they’re not widely known / circulated outside the niche fields of cognitive science & talent development, not even in seemingly adjacent fields like education.

These findings are not taught in school, and typically not even in credentialing programs for teachers themselves – no wonder they’re unheard of!

But if you just do a literature review on Google Scholar, all the research is right there – and it’s been around for many decades.

Naturally, this leads us to the following question: if cognitive psychology has found many effective learning strategies (like mastery learning, spaced repetition, the testing effect, and mixed practice), then why aren’t these key findings being leveraged in classrooms? Why do they remain relatively unknown?

Here are a handful of reasons that I’m aware of.

1. Leveraging them (at all) requires additional effort from both teachers and students.

In some way or another, each strategy increases the intensity of effort required from students and/or instructors, and the extra effort is then converted into an outsized gain in learning.

This theme is so well-documented in the literature that it even has a catchy name: a practice condition that makes the task harder, slowing down the learning process yet improving recall and transfer, is known as a desirable difficulty.

Desirable difficulties make practice more representative of true assessment conditions. Consequently, it is easy for students (and their teachers) to vastly overestimate their knowledge if they do not leverage desirable difficulties during practice, a phenomenon known as the illusion of comprehension.

However, the typical teacher is incentivized to maximize the immediate performance and/or happiness of their students, which biases them against introducing desirable difficulties and incentivizes them to promote illusions of comprehension.

Using desirable difficulties exposes the reality that students didn’t actually learn as much as they (and their teachers) “felt” they did under less effortful conditions. This reality is inconvenient to students and teachers alike; therefore, it is common to simply believe the illusion of learning and avoid activities that might present evidence to the contrary.

2. Leveraging cognitive learning strategies to their fullest extent requires an inhuman amount of effort from teachers.

Let’s imagine a classroom where these strategies are being used to their fullest extent.

Every individual student is fully engaged in productive problem-solving, with immediate feedback (including remedial support when necessary), on the specific types of problems, and in the specific types of settings (e.g., with vs without reference material, blocked vs interleaved, timed vs untimed), that will move the needle the most for their personal learning progress at that specific moment in time.
This is happening throughout the entirety of class time, the only exceptions being those brief moments when a student is introduced to a new topic and observes a worked example before jumping into active problem-solving.

Why is this an inhuman amount of work?

First of all, it’s at best extremely difficult, and at worst (and most commonly) impossible, to find a type of problem that is productive for all students in the class. Even if a teacher chooses a type of problem that is appropriate for what they perceive to be the “class average” knowledge profile, it will typically be too hard for many students and too easy for many others (an unproductive use of time for those students either way).
Additionally, to even know the specific problem types that each student needs to work on, the teacher has to separately track each student’s progress on each problem type, manage a spaced repetition schedule of when each student needs to review each topic, and continually update each schedule based on the student’s performance (which can be incredibly complicated given that each time a student learns or reviews an advanced topic, they’re implicitly reviewing many simpler topics, all of whose repetition schedules need to be adjusted as a result, depending on how the student performed). This is an inhuman amount of bookkeeping and computation.
Furthermore, even on the rare occasion that a teacher manages to find a type of problem that is productive for all students in the class, different students will require different amounts of practice to master the solution technique. Some students will catch on quickly and be ready to move on to more difficult problems after solving just a couple problems of the given type, while other students will require many more attempts before they are able to solve problems of the given type successfully on their own. Additionally, some students will solve problems quickly while others will require more time.

In the absence of the proper technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, who all need to work on different types of problems and receive immediate feedback on each attempt.

3. Most edtech systems do not actually leverage the above findings.

If you pick any edtech system off the shelf and check whether it leverages each of the cognitive learning strategies I’ve described above, you’ll probably be surprised at how few it actually uses. For instance:

Tons of systems don’t scaffold their content into bite-sized pieces.
Tons of systems allow students to move on to more material despite not demonstrating knowledge of prerequisite material.
Tons of systems don’t do spaced review. (Moreover, tons of systems don’t do any review.)

Sometimes a system will appear to leverage some finding, but if you look more closely it turns out that this is actually an illusion that is made possible by cutting corners somewhere less obvious. For instance:

Tons of systems offer bite-sized pieces of content, but they accomplish this by watering down the content, cherry-picking the simplest cases of each problem type, and skipping lots of content that would reasonably be covered in a standard textbook.
Tons of systems make students do prerequisite lessons before moving on to more advanced lessons, but they don’t actually measure tangible mastery on prerequisite lessons. Simply watching a video and/or attempting some problems is not mastery. The student has to actually be getting problems right, and those problems have to be representative of the content covered in the lesson.
Tons of systems claim to help students when they’re struggling, but the way they do this is by lowering the bar for success on the learning task (e.g., by giving away hints). Really, what the system needs to do is take actions that are most likely to strengthen a student’s area of weakness and empower them to clear the bar fully and independently on their next attempt.

Now, I’m not saying that these issues apply to all edtech systems. I do think edtech is the way forward here – optimal teaching is an inhuman amount of work, and technology is needed. Heck, I personally developed all the quantitative software behind one system that properly handles the above challenges. All I’m saying is that you can’t just take these things at face value. Many edtech systems don’t really work from a learning standpoint, just as many psychology findings don’t hold up in replication – but at the same time, some edtech systems do work, shockingly well, just as some cognitive psychology findings do hold up and can be leveraged to massively increase student learning.

4. Even if you leverage the above findings, you still have to hold students accountable for learning.

Suppose you have the Platonic ideal of an edtech system that leverages all the above cognitive learning strategies to their fullest extent.

Can you just put a student on it and expect them to learn? Heck no! That would only work for exceptionally motivated students.

Most students are not motivated to learn the subject material. They need a responsible adult – such as a parent or a teacher – to incentivize them and hold them accountable for their behavior.

I can’t tell you how many times I’ve seen the following situation play out:

Adult puts a student on an edtech system.
Student goofs off doing other things instead (e.g., watching YouTube).
Adult checks in, realizes the student is not accomplishing anything, and asks the student what’s going on.
Student says that the system is too hard or otherwise doesn’t work.
Adult might take the student’s word at face value. Or, if the adult notices that the student hasn’t actually attempted any work and calls them out on it, the scenario repeats with the student putting forth as little effort as possible — enough to convince the adult that they’re trying, but not enough to really make progress.

In these situations, here’s what needs to happen:

The adult needs to sit down next to the student and force them to actually put forth the effort required to use the system properly.
Once it’s established that the student is able to make progress by putting forth sufficient effort, the adult needs to continue holding the student accountable for their daily progress. If the student ever stops making progress, the adult needs to sit down next to the student again and get them back on the rails.
To keep the student on the rails without having to sit down next to them all the time, the adult needs to set up an incentive structure. Even little things go a long way, like “if you complete all your work this week then we’ll go get ice cream on the weekend,” or “no video games tonight until you complete your work.” The incentive has to be centered around something that the student actually cares about, whether that be dessert, gaming, movies, books, etc.

Even if an adult puts a student on an edtech system that is truly optimal, if the adult clocks out and stops holding the student accountable for completing their work every day, then of course the overall learning outcome is going to be worse.

Connecting to mechanics within the brain

Before ending this post, I want to drive home the point that the cognitive learning strategies discussed here really do connect all the way down to the mechanics of what’s going on in the brain.

The goal of mathematical instruction is to increase the quantity, depth, retrievability, and generalizability of mathematical concepts and skills in the student’s long-term memory (LTM).

At a physical level, that amounts to creating strategic connections between neurons so that the brain can more easily, quickly, accurately, and reliably activate more intricate patterns of neurons. This process is known as consolidation.

Now, here’s the catch: before information can be consolidated into LTM, it has to pass through working memory (WM), which has severely limited capacity. The brain’s working memory capacity (WMC) represents the degree to which it can focus activation on relevant neural patterns and persistently maintain their simultaneous activation, a process known as rehearsal.

Most people can only hold about 7 digits (or more generally 4 chunks of coherently grouped items) simultaneously and only for about 20 seconds. And that assumes they aren’t needing to perform any mental manipulation of those items – if they do, then fewer items can be held due to competition for limited processing resources. (Note that this is an emergent behavior of a more complicated underlying mechanism: the actual WM limitation is not a fixed number of storage units, but rather, the ability to sustain relevant neural activity while suppressing interference from irrelevant activity.)

Limited capacity makes WMC a bottleneck in the transfer of information into LTM. When the cognitive load of a learning task exceeds a student’s WMC, the student experiences cognitive overload and is not able to complete the task. Even if a student does not experience full overload, a heavy load will decrease their performance and slow down their learning in a way that is NOT a desirable difficulty.

Additionally, different students have different WMC, and those with higher WMC are typically going to find it easier to “see the forest for the trees” by learning underlying rules as opposed to memorizing example-specific details. (This is unsurprising given that understanding large-scale patterns requires balancing many concepts simultaneously in WM.)

It’s expected that higher-WMC students will more quickly improve their performance on a learning task over the course of exposure, instruction, and practice on the task. However, once a student learns a task to a sufficient level of performance, the impact of WMC on task performance is diminished because the information processing that’s required to perform the task has been transferred into long-term memory, where it can be recalled by WM without increasing the actual load placed on WM.

So, for each concept or skill you want to teach:

it needs to be introduced after the prerequisites have been learned (so that the prerequisite knowledge can be pulled from long-term memory without taxing WM),
it needs to be broken down into bite-sized pieces small enough that no piece overloads any student’s WM, and
each student needs to be given enough practice to achieve mastery on each piece (and that amount of practice may vary depending on the particular student and the particular learning task).

But also, even if you do all the above perfectly, you still have to deal with forgetting. The representations in LTM gradually, over time, decay and become harder to retrieve if they are not used, resulting in forgetting.

The solution to forgetting is review – and not just passively re-ingesting information, but actively retrieving it, unassisted, from LTM. Each time you successfully actively retrieve fuzzy information from LTM, you physically refresh and deepen the corresponding neural representation in your brain. But that doesn’t happen if you just passively re-ingest the information through your senses instead of actively retrieving it from LTM.

Here are some less obvious findings.

Why haven’t these findings transformed education?

Connecting to mechanics within the brain

Further Reading