Principle 1: Understanding human cognitive structure means building long-term memory and managing working memory - Five themes: Mental models and purposeful execution

Teach like a champion 3.0: 63 techniques that put students on the path to college - Lemov Doug 2021

Principle 1: Understanding human cognitive structure means building long-term memory and managing working memory
Five themes: Mental models and purposeful execution

Here is a simple model of the structure of human cognition as provided by Daniel Willingham in his outstanding book Why Don't Students Like School?

Schematic illustration of a simple model of the structure of human cognition.

Among the things it points out is the fact that working memory is the means through which we consciously interact with the world. Any thinking we're aware of doing, such as critical thinking, occurs here.

The power of working memory is prodigious. It has allowed humankind to discover penicillin, create the musical Hamilton, and conceptualize String Theory. But beyond its immense power the most dominant characteristic of working memory is its tiny capacity. We struggle to hold more than one or perhaps two ideas there at a time. Here's a way to test the limits of your own working memory. Reread the first two sentences of this paragraph. Then close the book and try to copy out the sentences verbatim on a piece of paper. You will likely struggle to remember even those two simple sentences. That's you coming up against the limits of your working memory. You just can't hold much information there at any given time. A version of this problem—cognition being constrained by the limits on working memory—occurs over and over for learners. If we try to keep too much information in working memory, we will fail to remember it.

If we persist in overloading working memory we force ourselves to choose among the things we are trying to work on. For example, if you are driving and also trying to use working memory for another task—having a conversation on the phone with your partner about things to pick up at the store, for example—you are suddenly several times more likely to have an accident if you make a left turn across traffic. This has nothing to do with whether your phone use is “hands free.” The problem is not free hands but free working memory. A heavy load on working memory degrades your perception and you are less able to judge the approach of other vehicles. You perceive less from the environment when your working memory is taxed and more when it is free. This by itself has profound implications for teaching, one of which I'll discuss in Chapter Two—the idea that effective preparation is designed to allow you to teach with less load on working memory. Or conversely that if you haven't prepared well, your working memory will be hard at work trying to remember what comes next in the lesson and you will be less likely to see what's happening in the classroom accurately.

A well-developed long-term memory is the solution to the limitations of working memory. If a skill, a concept, a piece of knowledge, or a body of knowledge is encoded in long-term memory, your brain can use it without degrading other functions that also rely on working memory. And long-term memory is almost unlimited. If our knowledge is encoded well and we are able to retrieve it, we can draw on it to inform our thinking and make connections. The scourge of the new-age educator, facts, mere facts, lots of them, encoded carefully in long-term memory and easily recalled through practice, are in fact the foundation of higher forms of cognition. You begin to think consciously about something in working memory—a scene in a novel you are reading, say—and suddenly the connections from your long-term memory start to pour in. It's like another book you read; it's an example of a sociological theory; what you are reading is not historically accurate. These forms of critical thinking are relying on knowledge encoded in long-term memory. As Willingham writes, “Data from the last 30 years lead to a conclusion that is not scientifically challengeable: Thinking well requires knowing facts… . The very processes that teachers care about most—critical thinking processes like reasoning and problem solving—are intimately intertwined with factual knowledge that is in long-term memory.”5 “Much of the time when we see someone apparently engaged in logical thinking, he or she is actually engaged in memory retrieval,” Willingham continues.

This notion should inform every teacher's mental model. First, critical thinking and problem solving are not the opposite of factual knowledge. They rely on it. This is important to note because a great many educators are scornful of facts. Why teach them, the argument goes, when you can Google anything? We should teach critical thinking instead. The answer to that rhetorical question as Willingham tells us is that you can't teach critical thinking without facts. Problem solving is “domain specific”; for the most part you can have deep thoughts only about things you know something about.

In a recent workshop with school leaders I tried to add a bit to Willingham's diagram to capture a bit more about what it tells us. I came up with something like this:

Schematic illustration of a slightly more detailed model of Willingham's diagram.

In my version I've tried to make working memory very small to remind you that its capacity is limited. But long-term memory is large. The dotted line suggests that as far as cognitive scientists know, it is all but unlimited. Not only does having more knowledge in long-term memory not make it harder to add something new there, it may make it easier. The more you know, the more connections you can perceive to new knowledge; this makes it easier to remember more of that new knowledge and gives you more connections to help you recall it. An expression among cognitive scientists is “things that wire together fire together.” If we think about them at the same time, recalling them will also happen in concert and, in an ideal case, remembering something from long-term memory will enhance recall of related concepts and ideas. The antidote to the argument that memory is merely isolated facts is, in part, to organize our memories so that knowledge is connected to other facts, insights, and observations. This is how initially isolated facts become something broader that we call knowledge. Remembering something requires successful storage and successful retrieval, however, and the speed and ease with which you can find it is the critical factor in your ability to use it so again organized memories with lots of connections among lots of information are also more likely to have more ways to recall the knowledge they contain successfully.

I also added the idea to my model that perception is complex because one of the things that working memory does most effectively—help us to perceive the outside world—is much more complex and fallible than we think. Broadly, if working memory is overloaded—students will both perceive and remember less. The solution is to have knowledge encoded in long-term memory. Once information is stored there it can be used with very little load of working memory.

Of course, if working memory is underloaded there are also poor outcomes—boredom and reduced learning, for starters, but also lack of attention. The mind finds other things to do. So it's critical to attend to and manage the amount of new information young brains work with. We want them constantly engaged and interested but not overloaded with more than they can manage. The science behind this is known as “Cognitive Load Theory.” It's among the most important things for educators to know about. Sweller, Kirschner, and Clark, who are its foremost researchers, define learning as a change in long-term memory and observe that “The aim of all instruction is to alter long-term memory. If nothing has changed in long-term memory, nothing has been learned.”6 That's why forgetting is so important to think about. You'll find this concept in several of the new techniques in this book.

A last critical note on managing working memory: Sweller's guidance fading effect argues that experts and novices learn differently. Problem-solving environments where learners are tasked with inferring solutions rather than being provided with guided instruction work well for experts because they perceive these environments accurately and can quickly connect what they see to their vast background knowledge. For novice learners this does not occur. They are likely to perceive incorrectly or attend to low-value phenomena or use up their scarce working memory searching for the right information. With little knowledge on the topic in their long-term memory, they make far fewer connections. For novices, carefully guided instruction is far more effective. However, too few educators are aware of this distinction. They tend to presume what works for experts is therefore best for everybody. If it's how elite mathematicians learn, well then, let's give it to everyone. But in fact the guidance fading effect tells us that this is a mistake. “Students should initially be given lots of explicit guidance to reduce their working memory load, which assists in transferring knowledge to LTM,” Sweller writes. “Once students are more knowledgeable, that guidance is unnecessary and interferes with further development of expertise and should be faded out and replaced by problem solving.” Students in a K—12 setting are usually novices, although this definition is fluid. You can be an expert on Macbeth but a novice as soon as you start reading Hamlet. Or vice versa. Technique 21, Take the Steps, in particular discusses several issues raised by the interactions of working and long-term memory—“the curse of expertise” and the necessity of parsing out new information in steps with practice interspersed to address the capacity issues of working memory—but it will play a role throughout the book. You'll want to use retrieval practice frequently to install knowledge in long-term memory and use Cold Call to ensure that everyone is getting the practice, for example. You'll want to ask students to write before discussions to reduce the strain on working memory of having to remember what they wanted to say, leaving them free to listen to one another's comments, for example.

A final point about the importance of long-term memory comes from a glimpse at what's known as the forgetting curve, which demonstrates the rate at which the typical person forgets things they've learned.

Graph depicts the percentage of content remembered by time since learning.

The original forgetting curve was derived in the 1880s by the German psychologist Hermann Ebbinghaus and plotted the actual rate at which he was able to remember a series of nonsense syllables after learning them. Though your students aren't learning nonsense syllables, the rate at which they forget what they have learned after they learned it is captured here, and the principle is broadly accepted by cognitive psychologists. The forgetting curve tells us that:

· As soon as you learn something, you begin forgetting it almost immediately.

· The rate of forgetting is often shockingly high; a few hours after learning something, people routinely remember only a small fraction of it.

· Each time you practice recalling what you know, the rate and amount of forgetting is reduced somewhat.

· Retrieving something back into working memory slows the rate of forgetting, but how and when the retrieval happens is important. (I discuss the details of retrieval more in technique 7, Retrieval Practice.)

That's immensely useful information, but forgetting curves can't tell us everything. They cannot tell you exactly what the rate of retention will be for your students generally or for a specific student at time A or time B for a specific topic you've taught. There are individual differences and factors in the learning environment, like how much attention students are paying and how new to students the information was, so the curve in most cases is theoretical but the theme is clear: We forget quickly and decisively as soon as we stop thinking about something and this process is always at work. Left unabated its effects are massive.

One way this is especially relevant is that what students appear to be able to remember at the end of a lesson does not represent what they really know, because the knowledge is not yet in long-term memory and forgetting begins when teaching stops. Students will begin to forget the moment they walk away from the class. Yes, use Exit Tickets to assess at the end of class. But know also that, barring further review, this technique will give you a false signal.7 You will assume your students know how to add fractions with unlike denominators, but the test given the next week or at the end of the year will measure original learning minus subsequent forgetting and you likely won't see what you had hoped to see. Managing forgetting is as important as managing learning (but isn't as visible).

This is especially relevant because only knowledge in long-term memory can be used without reducing working memory available for other tasks or eroding perception. If you ask a higher-order question, such as “Can you find another way to solve this problem?” then the answer is likely to be no if working memory is required in service of the calculations. If you want higher-order thinking or greater perception from students, help them to free their working memory at the moment you want them engaging with those tasks by making the skills they're using in the moment more fluid. This is why reading fluency and automaticity with math facts are critical—they are necessary because we don't want students thinking about these things at crucial moments, and fluency is the only way around the problem of working memory. You cannot perceive the author's tone if your working memory must be engaged to parse the syntax of the passage you are reading. When the foundational skills are not fully automatic it is very difficult to have profound or insightful thoughts during reading; bright and eager children can thus fail to have much to say about a passage they have read because their working memory was spent figuring out the words. Background knowledge is similar. You cannot make a leap to connect the prime minister’s attitude to his predecessor a century before unless that knowledge is in your long-term memory. “Looking it up on Google” actually requires your working memory.8

So what's the ideal number of interactions required with content if we want to encode it in long-term memory? Research suggests three or four but with many caveats and a lot of unknowns. In The Hidden Lives of Learners, Graham Nuthall, for example, finds that whether students have had three interactions with material determines with 80 percent accuracy whether they will have learned it. That is, when he and his colleagues sorted through the things that were taught during a given class and determined whether students had encountered it and attended to it—either through the teacher's instruction or through some other interaction (with peers say)—they could predict with 80 percent accuracy whether students had learned the material. So predictive was this method that Nuthall hypothesizes it is at least possible that “other factors (such as the use of open-ended questions, feedback, advance organizers, relevant examples and analogies and the interest level of the material) … may not be relevant to student learning except to the extent that they enhance the likelihood that students will encounter [and attend to] relevant content.”9

But of course even if Nuthall's research were conclusive, the complexity and familiarity of content, never mind the quality of the presentation of the material and the attentiveness of students could alter this number. Further, the idea that if you don't hear it all three times you won't learn it becomes especially important in light of research on the constancy of low-level distractions in many classrooms. So would the degree of fluency the content required. “Remembering it” can mean different things. With some knowledge it’s fine if I need a few seconds to pull it out of long-term memory. There's no rush. But some things I need in the blink of an eye and therefore, we can presume, require more iterations to ensure ease and speed of recall.

So how should this principle inform teaching decisions? You'll want to keep working memory free for students, so roll out new content in manageable chunks and be sure to constantly embed short sequences of practice and retrieval. Cold Call is a great tool for making everyone do the work of retrieval, even those whom you don't call on. You can also use Everybody Writes and other forms of writing to cause student thinking to be more durably encoded in memory. Remember that thinking hard about things encodes them in memory. A good adage to remember is that students remember what they think about, so get the ratio high and build habits of attention and focus. Constantly have an eye to building knowledge (knowledge organizers can be helpful) and reinforce reading fluency with FASE Reading.

But also don't forget your own working memory. Chapter Two will help you use lesson preparation to free it for perceiving what's happening with students while you teach. When gathering data on student mastery, remember that the data can quickly overload your working memory so track it via Active Observation.

Online Lessons

Managing the limitations of working memory is one of the core challenges of teaching in any setting. Online, its challenges are magnified since we are always competing against potential distractions and because attention is fractured online. So while one of the silver linings of online instruction was how easy it was to gather data by, say, asking students to respond in the chat, the challenge was at times that this yielded too much data. The “velocity of data” was often too much for working memory: thirty student responses scrolling upwards across the screen is more than teachers or students can process—and the result was at times everyone chatting and no one able to read or attend to the comments with adequate attentiveness. The video Harley and Clayton: Slowing the Data shows Rachel Harley and Hasan Clayton, two teachers at Nashville Classical Charter School, providing an elegant solution. They ask students to chat their responses only to them, not to “everyone,” then curate a few exceptional answers from the chat stream and post them where the class can read and reflect on them with more deliberate focus. There's really no reason teachers couldn't curate a set of interesting examples from students and present them to guide and inform discussion in a similar manner in brick—and-mortar classrooms as well.