What is CEFR?
Common European Framework of Reference (for Languages)
What does “C1” or “B2” really mean?
Todd (ELLLO.org) talks with Chris Haswell from Lost In Citations and shares behind-the-scenes insights into CEFR, AI level ratings, and the hidden flaws of listening tests. The discussion explores why some students score higher than expected, why good distractors are so hard to write, and how teachers can think more clearly about assessment
Main Points from the Episode
Key ideas and moments from the discussion.
Hear more academic talks about ESL on Lost In Citations.
What this episode is
Lost In Citations introduces Todd (ELLLO.org) and frames the episode as an extended discussion about CEFR (called “SEFR / CEFA” in the conversation).
What was meant to be a short mini-episode became a longer, practical talk on CEFR, testing, and assessment problems.
Why they recorded it
The host explains that this conversation matches the original purpose of the podcast: real talk between teaching professionals, plus advice for anyone asking, “What about CEFR?”
Setting and context
The discussion is recorded during the New Year break (January 2). Both speakers work in Japan, on the island of Kyushu.
The big question: Is CEFR useful or “hogwash”?
They introduce the controversy: some teachers treat CEFR as essential, while others think it is unreliable.
Todd points out a practical problem: AI-generated CEFR ratings often do not match his own professional judgment when reviewing transcripts.
CEFR is not a precise measuring tool
CEFR is explained as a framework of “can-do” statements, not a perfectly objective scoring system.
You cannot “jump levels” just by showing one advanced feature. Learners must consistently meet the core expectations of a level.
The “rubric roadblock”
They argue that CEFR is so detailed that it can overwhelm teachers and assessors.
It becomes a case of paralysis by analysis: too thorough to be practical in real rating situations.
Why CEFR became so large
The host suggests CEFR reflects its origins as a committee-driven, bureaucratic project, which led to “criteria creep.”
In contrast, profit-driven tests like IELTS are designed for efficiency and standardization across locations.
A defense of CEFR
Even with its flaws, they agree that a shared scale is still necessary.
At lower levels especially, clear “can-do” criteria make ratings easier to explain and justify.
Levels A1 to C2 (and the “native speaker” myth)
Todd outlines the basic scale from A1 (beginner) to C2 (top level).
The host challenges the idea that C2 means “native-level,” arguing many native speakers would not meet top-band criteria without being pushed into formal tasks.
Todd’s main complaint about high-level CEFR
Past C1, ratings often shift away from language difficulty and toward ideas, intelligence, and abstraction.
He argues CEFR and AI systems often ignore density and speed: many B2-level features packed tightly together can create C1-level difficulty.
What advanced proficiency really means
The host summarizes high-level performance with two ideas:
- Precision – accurate word choice and structure
- Flexibility – control under time pressure
Why top scores are rare
The host suggests that C2 or IELTS 9 requires not only language skill, but also strong cognitive control and task management.
Todd’s test story
Todd describes scoring very high in writing on the GRE, average in math, and low in logic/reading.
This shows how test design and motivation can distort results.
Four questions every test must answer
- Why are we testing?
- What are we testing?
- How are we testing?
- How are we grading?
If these are unclear, test scores become meaningless.
Why listening tests fail
Todd identifies four factors that distort listening scores:
- Language level
- Processing speed
- Memory
- Attention
A “pure” listening test
The host proposes listening-only, task-based instructions with minimal reading to avoid interference.
Why results look strange
Different learners catch different details.
This explains why teachers sometimes see weak students score higher than strong ones.
The real challenge: good questions
The hardest part of testing is not recording audio, but writing good questions and realistic distractors.
Multiple-choice problems
Bad distractors allow students to succeed without listening.
Good distractors should be:
- Very short
- Equal in length
- Equally plausible
- Aligned with the audio timeline
Memory limits
Asking learners to recall early details at the end of a long listening task is often unrealistic and unfair.
Teaching materials have the same problem
If content is not properly simplified, teachers may unknowingly push difficulty onto students.
Todd notes that only about half of submitted videos truly meet A1 standards.
Wrap-up
The interview ends early due to technical issues, and listeners are directed to ELLLO.org and future episodes.
Hear more academic talks about ESL on Lost In Citations.
Meet the Teacher
My name is Todd Beuckens and I am an ESL teacher in Japan. I created this site to provide teachers and students free audio lessons and learning materials not usually found in commercial textbooks.
About Lost in Citations
The academic website Lost in Citations is hosted by Chris Haswell and Jonathan Schacher. The podcast is for teachers and academics but might be of interest for advanced students. Below are some of the podcasts that might be of interested to ESL teachers and advanced students.
Dr. Chris Haswell
A global model of English
Dr. Marc Helgesen
English teaching and the science of happiness
Dr. Aya Matsuda
English as an international language: A curriculum blueprint.
Dr. Rob Waring
Teaching extensive reading in another language.
Joshua Wedlock
Teaching about Taboo Language in EFL/ESL Classes: A Starting Point.
Dr. Geoff Jordan
We need to talk about course books.
Jonathan Shachten
Teaching reading skills in “surround sound.”
Todd Beuckens
Tech tools for teachers, students, and researchers.
