Assessment & feedback · 8 min read
What we got wrong about formative assessment
AfL was a brilliant idea that became a procedural mess. The original research is worth re-reading.
Published 2026-12-06
If you started teaching in the UK any time after 2000, you learned about Assessment for Learning. Probably as a checklist. Learning objectives written on the board. Success criteria displayed and recapped. Children using traffic-light cards or thumbs to indicate confidence. Peer marking with two stars and a wish. Plenaries that referred back to the success criteria.
These are the visible artefacts of formative assessment as it was implemented in primary schools. They are also, in most classrooms, theatre. The procedures get done. Whether they produce learning is another question.
The original research that AfL came from — Paul Black and Dylan Wiliam's 1998 paper *Inside the Black Box* — said something more substantial than the checklist that emerged. It's worth going back to.
What Black and Wiliam actually argued
Black and Wiliam reviewed about 250 studies on classroom assessment. Their conclusion was that formative assessment — checking, in real time, what children understand and adjusting teaching accordingly — produced some of the largest learning gains in education research.
The mechanism wasn't traffic lights or peer marking. It was simpler and harder. Teachers needed to:
1. Find out what children currently understand. 2. Identify the gap between current understanding and the learning goal. 3. Adjust their teaching to close that gap.
This is hard cognitive work. It requires teachers to have a deep model of what 'understanding fractions' looks like at each stage, recognise where each child currently is, and pivot in the moment. It cannot be reduced to a procedure.
But schools needed something they could roll out, train teachers in, and evidence in inspections. So the rich practice of formative assessment got compressed into a set of visible procedures: learning objectives, success criteria, mini-plenaries, traffic lights. The procedures could be ticked off. The cognitive work was harder to mandate.
Where it went wrong
Several things went wrong with the implementation.
**Learning objectives became performative.** A teacher writes 'WALT: identify subordinate clauses' on the board at the start of the lesson, recaps it at the end. The procedure is followed. But the LEARNING objective and the LESSON objective are different things. A learning objective says what children should be able to do at some point in their development. A lesson objective is what happens in this 50 minutes. The conflation produced lessons where children chanted 'I can identify subordinate clauses' before they could actually do it, then chanted again at the end whether or not they actually could.
**Success criteria became a control mechanism.** A useful success criterion clarifies what good work looks like — 'a successful piece will use at least three of these five sentence types.' A useless one is a checklist of features the teacher wanted to see, presented to children in advance, which then constrains rather than guides. Children fill the checklist. The work doesn't get better; it gets more compliant.
**Traffic lights became performance.** Children showed green when they wanted to look confident, regardless of whether they understood. The genuine information — who is actually struggling — was buried under social compliance. A teacher who relied on traffic-light data without checking understanding got a misleadingly rosy picture.
**Peer marking became surface.** Children with no expertise on a topic wrote 'two stars and a wish' on each other's work, often the same comments every week ('use more adjectives'). The peer feedback didn't improve the work; it just took up time.
**Plenaries became a ritual.** 'Did we meet our learning objective?' Children agree. The lesson ends. Whether anyone actually learned anything is left ambiguous. The plenary became a way of WRAPPING UP the lesson rather than actually finding out what got learned.
None of these things were what Black and Wiliam suggested. They were the procedural shadow of the actual ideas.
What the actual ideas were
Read the original research and a different picture emerges. Black and Wiliam emphasised:
**Questioning quality matters more than questioning frequency.** A teacher who asks one good question — 'Why do you think the author chose this word here, rather than another similar word?' — and waits for several children to think and respond, produces more learning than a teacher who fires twenty closed questions. The discussion that follows good questions is the thinking. Quick-fire IRF (Initiate-Response-Feedback) patterns are an enemy of formative assessment, not its friend.
**Wait time is critical.** When teachers ask a question and wait an average of 0.9 seconds for a response (the typical figure), only the fastest hand-raisers participate. When teachers wait 3-5 seconds, more children think, the answers get longer and more sophisticated, and a wider range of children contribute. Wait time is one of the most powerful single levers in formative assessment, and it costs nothing to introduce.
**Feedback should be specific and forward-looking.** Generic comments ('good work') and grades ('B+') don't change behaviour. Comments that tell a child SPECIFICALLY what they did well and what they could do next, with no grade attached, do. The presence of a grade actively undermines the comment — children look at the grade, not the comment. Black and Wiliam found this in study after study. Most schools' feedback policies still produce graded work alongside comments. The comments are wasted.
**Assessment information should change subsequent teaching.** This is the heart of it. If you find out children misunderstand fractions, the next lesson should address that. Not the lesson three weeks later. Not the next half term's review. The information should pivot teaching in real time. This requires the teacher to know enough about the topic to teach a different lesson on the spot — which is harder than following the lesson plan.
Why the procedures replaced the ideas
The ideas in the previous section are demanding. They require:
- Teachers with deep subject knowledge. - Permission to deviate from planned lessons. - Comfort with not finishing what was planned. - Confidence to ask hard questions and tolerate slow responses. - Willingness to give comment-only feedback even when parents want grades.
These are demanding conditions. They take years to develop. They are hard to mandate or check.
By contrast, the procedures — write the LO on the board, get the traffic lights out, do peer marking — are easy to mandate, easy to check, and easy to evidence. So that's what schools rolled out. The shadow replaced the substance.
Twenty years later, most primary schools are doing the procedures and most of the actual formative assessment improvement Black and Wiliam predicted hasn't happened.
What to do about it
You probably can't unilaterally change your school's procedures. They're embedded in policies, observation forms, and parents' evening expectations. But you can practise the underlying ideas inside them.
**Increase wait time deliberately.** Ask a question, count to five silently, then take answers. The first time you do this, the silence will feel agonising. Children will start thinking. After a few weeks, the quality of answers transforms.
**Ask harder questions.** Replace 'what's a synonym for "happy"?' with 'why might an author choose "joyful" rather than "cheerful" — what's different about them?' The second question forces actual thinking. The first one tests memory.
**Find out what children don't yet understand, before deciding what to teach.** Even a 60-second exit ticket — 'write down one question from today's lesson you're not sure about' — can change tomorrow's teaching dramatically. Address the actual confusions, not the planned content.
**Give specific, forward-looking feedback. Drop the grade where you can.** A piece of writing with the comment 'your second paragraph drops the present-tense verbs you used in the first — bring them back, and the piece will hold together' is more useful than the same piece with a level descriptor at the bottom. The school may require the level. You can still focus the comment.
**Pivot teaching in real time when you can.** If you can see, mid-lesson, that the class doesn't understand the prerequisite, abandon the planned activity and reteach. Most schools say they want this but have built systems that punish it. Do it anyway, where you can.
The takeaway
Formative assessment is a brilliant idea that became a procedural mess. The procedures aren't the practice. The practice is the harder, slower work of finding out what children understand and changing your teaching accordingly. None of that is visible on a learning-objective board.
If you've ever felt that AfL added work without adding teaching, you weren't wrong. The procedures are heavy and the substance is light, in most schools. The fix isn't to add more procedures. It's to focus on the few moves that actually change things — wait time, questioning quality, comment-only feedback, in-the-moment pivots — and quietly let the rest become as light as you can get away with.
That's closer to what the original research said. It's worth going back to.
Free bundle for this topic
KS2 Maths Pack
10 maths resources designed for retrieval practice and low-stakes review.
Practical resources for this
Take this further
Printable, classroom-ready resources for the topics in this article.
Going deeper
On formative assessment — going back to the source
Books we'd recommend on the topics raised in this article.
The original research
-
E
Embedded Formative Assessment — Dylan Wiliam
Wiliam's later book setting out what AfL was actually meant to be - E Embedding Formative Assessment: Practical Techniques for K-12 Classrooms — Dylan Wiliam, Siobhán Leahy
- I Inside the Black Box: Raising Standards Through Classroom Assessment — Paul Black, Dylan Wiliam
- M Making Good Progress?: The Future of Assessment for Learning — Daisy Christodoulou
Convenience links to Amazon. As an Amazon Associate we earn from qualifying purchases at no extra cost to you. Read our affiliate disclosure.
Keep reading
Teaching strategy
The Quiet Power of Low-Stakes Quizzing
Frequent low-stakes quizzes might be the single highest-impact, lowest-effort change you can make to your teaching this year.
6 min read
Math
Why kids 'forget' maths overnight (and what's actually happening)
If you've ever taught something on Monday and found half the class can't do it on Tuesday, you're not alone. The 'forgetting curve' isn't really forgetting — it's the gap between performance and learning. Here's the difference, and why it matters.
7 min read