Back
other May 14, 2025

The Field of Education is Due For a Copernican Revolution

You’d think that teacher training programs would focus on the mechanics of learning, but instead they typically focus on ritualistic compliance. If we trained doctors like we do teachers, then we’d still be bloodletting. Teacher credentialing severely lacks rigor, and this lack of rigor leads to a massive loss in human potential. Students suffer for it, and it drives serious educators out of the profession. It attracts and supports the type of people who think it’s more important to practice sharing circles than to learn about the importance and implementation of spaced review. When you make it your mission to maximize student learning – including leveraging the learning-enhancing practice techniques that have been known, reproduced, and yet ignored by the education system for decades – you realize that there is a massive amount of human potential being left on the table. Students can be learning way, way, way more than they currently are.

by Justin Skycak (@justinskycak) justinmath.com 6,372 words
View original

You’d think that teacher training programs would focus on the mechanics of learning, but instead they typically focus on ritualistic compliance. If we trained doctors like we do teachers, then we’d still be bloodletting. Teacher credentialing severely lacks rigor, and this lack of rigor leads to a massive loss in human potential. Students suffer for it, and it drives serious educators out of the profession. It attracts and supports the type of people who think it’s more important to practice sharing circles than to learn about the importance and implementation of spaced review. When you make it your mission to maximize student learning — including leveraging the learning-enhancing practice techniques that have been known, reproduced, and yet ignored by the education system for decades — you realize that there is a massive amount of human potential being left on the table. Students can be learning way, way, way more than they currently are.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.


CONTENTS

My Experience with Teacher Credentialing and Professional Development

Speaking as someone who had to suffer through a teacher credentialing program… it’s actually an anti-signal when someone references their teaching credential as a qualification to speak about how learning happens. It’s centered around political ideology rather than the science of learning.

There exist learning strategies that have been scientifically shown to improve student learning, such as mastery learning, spaced repetition, retrieval practice (the testing effect), and varied practice (interleaving).

These learning strategies have been researched extensively since the early to mid 1900s, with key findings being successfully reproduced over and over again since then.

Yet, when I completed my teaching credential from 2020-21 and attended numerous professional developments (PDs) from 2019-23, not once did I hear any mention of these learning strategies!

Instead, the focus was 100% on diversity, equity, & inclusion: readings (and an essay) on hegemonic heteronormativity, anti-racism training, “sharing circle” training, and even a presentation on the gender unicorn, complete with an extraordinarily complex gender classification flowchart, just to name a few examples.

Forget the science of learning – even the most obvious practical skills that a teacher would need to exercise on a daily basis, such as managing a rowdy classroom, communicating with parents, holding students accountable for their work, and dealing with academic dishonesty, were not covered at all in teacher credentialing nor PD.

However, there was no shortage of pointless activities.

I vividly remember a virtual PD in which the first 30 minutes was spent going around the room of 30+ teachers, the PD leader asking each teacher to describe the weather outside in their physical location and explaining how their personal feelings that day related to the weather.

At another PD, there was an activity involving a circle of traffic cones, each cone with a different emoji taped to the top. The PD leader read through a list of words and asked the teachers to walk over to the cone that they felt matched their feelings in response to the word.

I wish that my experience with teacher credentialing was an edge case, a dysfunction that is not widespread. But if you look at the curricula of standard teacher credentialing programs, and even the schools of education within well-reputed universities, you’ll find the same phenomenon. The curricula are focused entirely on making education engaging, diverse, and unbiased, and there’s little to nothing about the science of learning.

This lack of rigor would not pass in other disciplines. Engineers are required to take plenty of rigorous courses on math and physics. Doctors are required to take plenty of rigorous courses on biology and chemistry. But educators are not required to take a single course on the science of learning, much less a rigorous one.

It’s Not Just Me Who Thinks Teacher Training Lacks Rigor — It’s a Known Phenomenon in the Literature

By the way, it’s not just me who’s pointing this out. There are numerous peer-reviewed academic studies about how the science of learning is missing from teacher education. Here are some entrypoints into the literature:

Teaching the science of learning (Weinstein, Madan, & Sumeracki, 2018)

How learning happens: Seminal works in educational psychology and what they mean in practice (Kirschner & Hendrick, 2024, pp.275)

Unanswered questions about spaced interleaved mathematics practice (Rohrer & Hartwig, 2020)

Lots of People in Education Disagree with the Premise of Maximizing Learning

Ironically, the teaching profession seems to drive out the people who are most interested in optimizing student learning. That’s one thing I didn’t expect when I entered the profession: lots of people in education disagree with the premise of maximizing learning.

For instance, “Testing” and “repetition” have become dirty words in education.

However, practice testing and distributed practice (also known as spaced repetition) are widely understood by researchers to be two of the most effective practice techniques.

Moreover, deliberate practice – which has been shown to be one of the most prominent underlying factors responsible for individual differences in performance, even among highly talented elite performers – is centered around using repetitious training activities to refine whatever skills move the needle most on a student’s overall performance.

What gives? Why are there debates about scientifically proven learning techniques like testing and repetition?

Because lots of people in education disagree with the premise of maximizing learning. The debates aren’t about whether testing and repetition are effective learning techniques – the debates are about whether education should seek to maximize students’ learning.

There are plenty of students who would prefer for their education to maximize other things like fun and entertainment while, as a secondary concern, meeting some low bar for shallowly learning some surface-level basic skills.

And there are plenty of teachers who are incentivized to use easy, fun, low-accountability, hard-to-measure practice techniques that keep students, parents, and administrators off their back. (Unfortunately, these practice techniques tend to be ineffective.)

Why It’s Important to Take Learning Seriously, Especially in Math

The subfield within education that seeks to maximize learning is known as “talent development.”

In talent development, the optimization problem is clear: an individual’s performance is to be maximized, so the methods used during practice are those that most efficiently convert effort into performance improvements.

Practitioners of talent development tend to be found in hierarchical skill domains like sports and music, where each advanced skill requires many simpler skills to be applied in complex ways. This is because it’s hard to climb up the skill hierarchy without intentionally trying to do so.

To learn an advanced skill, you must be able to comfortably execute its prerequisite skills, and the prerequisite skills underlying those, and so on. Getting to the point of comfortable execution on any skill takes lots of practice over time – and even after you get there, you have to continue practicing to maintain your ability.

None of this happens naturally. If you don’t carefully manage the process, then you struggle. Nobody gets to be really good at a sport or instrument without taking their talent development seriously and intentionally trying to maximize their learning.

Success in sports and music requires talent development. But most students aren’t expected to achieve a high level of success in sports or music, so they can get away with de-prioritizing talent development. If every student in gym class were expected to be able to do a backflip by the end of the year, things would have to change – but the expectations are so low that meeting them does not require talent development.

But when it comes to math, de-prioritizing talent development becomes problematic. Like sports and music, math is an extremely hierarchical skill domain, so achieving a high level of success requires a dedication to talent development. However, unlike sports and music, most students are expected to achieve a relatively high level of success in math: many years of courses increasing in difficulty, culminating in at least algebra, typically pre-calculus, often calculus, and sometimes even higher than that.

As a result, in math, de-prioritizing talent development leads to major issues. When students do the mathematical equivalent of playing kickball during class, and then are expected to do the mathematical equivalent of a backflip at the end of the year, it’s easy to see how struggle and general negative feelings can arise.

Some Learning-Enhancing Practice Techniques So Replicable They Might As Well Be Laws of Physics

So what do we know about the techniques for optimizing student learning? I mentioned a few above: mastery learning, spaced repetition, retrieval practice, and interleaving. But here’s a more thorough list with descriptions.

All of these learning-enhancing practice strategies have been tested scientifically, numerous times, and are completely replicable. They might as well be laws of physics.

I’ll start with a few obvious findings, but there are plenty of less obvious findings later down the list.

(If they’re obvious, why cover them? Because in education, obvious strategies often aren’t put into practice. For instance, plenty of classes that still run on a pure lecture format and don’t review previously learned unless it’s the day before a test.)

Anyway, here we go:

Here are some less obvious findings.

There’s a lot more detail I want to include, along with hundreds of scientific references, but I’m going to skip it so as not to continue blowing up the length of this already-gigantic post. If you want to take the deep dive, here’s a draft I’m working on that covers all these findings (and more) with hundreds of references and relevant quotes pulled out of those references.

We’ve Known About These Techniques For Decades, So Why Aren’t They Being Taught/Used?

Now, this might seem like a lot of new information – a common reaction is “Wow, the field of education is experiencing a revolution!”

But here’s the thing: Most key findings have been known for many decades.

It’s just that they’re not widely known / circulated outside the niche fields of cognitive science & talent development, not even in seemingly adjacent fields like education.

These findings are not taught in school, and typically not even in credentialing programs for teachers themselves – no wonder they’re unheard of!

But if you just do a literature review on Google Scholar, all the research is right there – and it’s been around for many decades.

So why aren’t these key findings being leveraged in classrooms or even appearing in teacher credentialing / professional development? Why do they remain relatively unknown?

The biggest reason that I’m aware of:

Leveraging them (at all) requires additional effort from both teachers and students.

In some way or another, each strategy increases the intensity of effort required from students and/or instructors, and the extra effort is then converted into an outsized gain in learning.

This theme is so well-documented in the literature that it even has a catchy name: a practice condition that makes the task harder, slowing down the learning process yet improving recall and transfer, is known as a desirable difficulty.

Desirable difficulties make practice more representative of true assessment conditions. Consequently, it is easy for students (and their teachers) to vastly overestimate their knowledge if they do not leverage desirable difficulties during practice, a phenomenon known as the illusion of comprehension.

However, the typical teacher is incentivized to maximize the immediate performance and/or happiness of their students, which biases them against introducing desirable difficulties and incentivizes them to promote illusions of comprehension.

Using desirable difficulties exposes the reality that students didn’t actually learn as much as they (and their teachers) “felt” they did under less effortful conditions. This reality is inconvenient to students and teachers alike; therefore, it is common to simply believe the illusion of learning and avoid activities that might present evidence to the contrary.

These Techniques Connect All The Way Down to Mechanics In The Brain

The whole situation feels very similar to the Copernican Revolution. There are vested interests in keeping things the way they are, but enough evidence is compounding that it’s becoming impossible to ignore.

For instance, in addition to their effectiveness being so replicable that they might as well be laws of physics, the cognitive learning strategies discussed above actually connect all the way down to the mechanics of what’s going on in the brain.

The goal of learning is to increase the quantity, depth, retrievability, and generalizability of concepts and skills your long-term memory (LTM). At a physical level, that amounts to creating strategic connections between neurons so that the brain can more easily, quickly, accurately, and reliably activate more intricate patterns of neurons. This process is known as consolidation.

Now, here’s the catch: before information can be consolidated into LTM, it has to pass through working memory (WM), which has severely limited capacity. The brain’s working memory capacity (WMC) represents the degree to which it can focus activation on relevant neural patterns and persistently maintain their simultaneous activation, a process known as rehearsal.

Most people can only hold about 7 digits (or more generally 4 chunks of coherently grouped items) simultaneously and only for about 20 seconds. And that assumes they aren’t needing to perform any mental manipulation of those items – if they do, then fewer items can be held due to competition for limited processing resources. (Note that this is an emergent behavior of a more complicated underlying mechanism: the actual WM limitation is not a fixed number of storage units, but rather, the ability to sustain relevant neural activity while suppressing interference from irrelevant activity.)

Limited capacity makes WMC a bottleneck in the transfer of information into LTM. When the cognitive load of a learning task exceeds a student’s WMC, the student experiences cognitive overload and is not able to complete the task. Even if a student does not experience full overload, a heavy load will decrease their performance and slow down their learning in a way that is NOT a desirable difficulty.

Additionally, different students have different WMC, and those with higher WMC are typically going to find it easier to “see the forest for the trees” by learning underlying rules as opposed to memorizing example-specific details. (This is unsurprising given that understanding large-scale patterns requires balancing many concepts simultaneously in WM.)

It’s expected that higher-WMC students will more quickly improve their performance on a learning task over the course of exposure, instruction, and practice on the task. However, once a student learns a task to a sufficient level of performance, the impact of WMC on task performance is diminished because the information processing that’s required to perform the task has been transferred into long-term memory, where it can be recalled by WM without increasing the actual load placed on WM.

So, for each concept or skill you want to teach:

  1. it needs to be introduced after the prerequisites have been learned (so that the prerequisite knowledge can be pulled from long-term memory without taxing WM),
  2. it needs to be broken down into bite-sized pieces small enough that no piece overloads any student’s WM, and
  3. each student needs to be given enough practice to achieve mastery on each piece (and that amount of practice may vary depending on the particular student and the particular learning task).

But also, even if you do all the above perfectly, you still have to deal with forgetting. The representations in LTM gradually, over time, decay and become harder to retrieve if they are not used, resulting in forgetting.

The solution to forgetting is review – and not just passively re-ingesting information, but actively retrieving it, unassisted, from LTM. Each time you successfully actively retrieve fuzzy information from LTM, you physically refresh and deepen the corresponding neural representation in your brain. But that doesn’t happen if you just passively re-ingest the information through your senses instead of actively retrieving it from LTM.

”The Future Is Already Here, It’s Just Not Very Evenly Distributed”

William Gibson’s quote applies here. The science of learning has been applied in the classroom, with phenomenal results: 8th graders – who are NOT prodigies, but have talent on par with kids in a typical honors class at a typical school – getting perfect 5 out of 5 on the AP Calculus BC exam (which is normally taken by honors 12th graders and can count for a year’s worth of calculus credit at most colleges).

This happened in Math Academy’s original school program. It’s the most accelerated math program in the USA, there have been plenty of other news articles written about it, and there’s plenty of other information straight from the horse’s mouth including a summary of events 2014-20 (from Sandy and Jason’s perspective), a summary of events 2019-23 (from my perspective, with a focus on teaching in the school program and getting the algorithms in place to turn it into a fully automated system), and another summary of events 2014-20 that I gave on Anna Stokke’s Chalk and Talk Podcast #42 (I’ll paste the relevant snippet below):

We really leaned into the automated system – it allowed us to leverage these learning-enhancing practice techniques to the fullest extent, selecting individualized learning tasks tailored to each student’s own knowledge profile, so that every student would always be working on the exact tasks that would move the needle most on their learning.

From my summary of events 2019-23:

There’s plenty more to this story – we also ran a quantitative CS course sequence within the original school program from 2020-23, where we scaffolded our high schoolers up from having little to no coding experience to doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). We called this the “Eurisko” program.

It’s still pretty early on, and as of spring 2025 the very first cohort is still in undergrad (it’s currently their junior year). However, there have already been some amazing student outcomes in terms of college admissions, accelerated graduate degrees, research publications, and science fairs.

Just to name some impressive stats on 4 of the 16 students in the Eurisko program:

We haven’t been systematically tracking this info or sending out alumni surveys or anything, so there’s probably even more interesting stuff going on that we haven’t heard about yet.

But my point is this: when you make it your mission to maximize student learning – including leveraging the learning-enhancing practice techniques that have been known, reproduced, and yet ignored by the education system for decades – you realize that there is a massive amount of human potential being left on the table. Students can be learning way, way, way more than they currently are.

Further Reading

I’ve written extensively on this topic and this post is just the tip of the iceberg. See the working draft of The Math Academy Way for more info and hundreds of scientific citations to back it up.

The citations are from a wide variety of researchers, but there’s one researcher in particular who has published a TON of papers on effective learning techniques, has all (or at least most) of those papers freely available on his personal site, and has a really engaging and “to the point” writing style, so I want to give him a shout-out. His name is Doug Rohrer. You can read his papers here: drohrer.myweb.usf.edu/pubs.htm

Similarly, there are amazing practical guides on retrievalpractice.org that not only describe these learning strategies but also talk about how to leverage them in the classroom. They’re easy reading yet also incredibly informative. Here are some of my favorites:

Another website worth checking out: learningscientists.org

As far as books, check out the following:

Follow-Up Questions

Q: What motivated you to get your teacher credential? I’ve also thought about getting my teacher credential, but decided against it for the same reasons as you’ve outlined. What made you decide to suffer through it?

A: I had to do a teaching credential in order to teach in Math Academy’s original school program, which operated in public schools (Pasadena Unified School District).

I.e., I worked as a public school teacher who exclusively taught Math Academy classes, and the district required all teachers to do credentials.

The credential itself was a complete waste of time, but I was so serious about Math Academy – even back then – that I was willing to put up with the suck and suffer through it (among other things).

In hindsight, I’m actually glad I did it because it gave me more firsthand experience with how bad things have gotten.

Some of these things sound so ridiculous and unbelievable that some people actually don’t believe it’s happening. But I can say “no, it’s real, I spent years right there in the train wreck.”


Want to get notified about new posts? Join the mailing list and follow on X/Twitter.