High-stakes tests offer thin veneer of validity yet we blindly trust them

A retired education professor says tests offer an easy, if unreliable and lazy, way, to judge schools. He says tests are easy to administer (Kids, go to this room and sharpen your pencils), easy to score (Warm up that machine), and easy to compare (This school produces better test scores, so it’s a better school). (Dreamstime/TNS)

Credit: TNS

Credit: TNS

A retired education professor says tests offer an easy, if unreliable and lazy, way, to judge schools. He says tests are easy to administer (Kids, go to this room and sharpen your pencils), easy to score (Warm up that machine), and easy to compare (This school produces better test scores, so it’s a better school). (Dreamstime/TNS)

I recently read that a new wave of researchers has found early human settlements in Senegal, Cameroon and Malawi, contrary to what had been viewed as a settled science of anthropology. No evidence of early human settlement in West and Central Africa was found sooner, according to the article, because few people had looked there.

Researchers tended to focus on low-hanging fruit — areas of the continent where fieldwork was less arduous rather than in hot and humid West and Central Africa. Along with its weather conditions degrading bones and DNA more quickly, the region is thickly forested and the site of long-standing conflicts.

By looking for answers a bit higher up the fruit tree, scientists found a different answer to the question of where human life originated: “Modern humans had lived at a site on the coast of Senegal 150,000 years ago” rather than 30,000 years ago, as previously assumed.

I’m reminded of the adage about looking for the keys where the light is best, known as the streetlight effect or the drunkard’s search.

Peter Smagorinsky

Credit: Contributed

icon to expand image

Credit: Contributed

Speaking of looking for the keys where the light is best, the easiest way to measure educational success is through test scores. Every kid gets tested, and the tests are presumed to be neutral and thus reliable, valid measures of achievement (or progress, or aptitude, or proficiency, or intelligence, or whatever the tests are claimed to measure). They are easy to administer (Kids, go to this room and sharpen your pencils), easy to score (Warm up that machine), and easy to compare (This school produces better test scores, so it’s a better school).

And it takes on the veneer of validity such that a single test tells us all we need to know about education. This validity is available even when people don’t know how to read test data, such as the National Assessment of Educational Progress’ use of terms like “proficient” and the assumption that if everyone’s not above average, we have an educational crisis. Yet as described by a set of reading researchers in a recent article:

“Proficiency levels created a custom-made crisis. Using the 2019 NAEP reading scores, a typical argument goes something like this: ‘Only 34% of fourth-grade students nationally scored at or above the proficient level in reading.’ That sounds alarming, suggesting that only about a third of readers are proficient. Some might even interpret this to mean that two thirds of students are hardly reading at all. But, if ‘basic’ means something closer to ‘average,’ which it does, and readers in that group are combined with ‘proficient’ or above … approximately two thirds of all fourth-grade students are reading at or near grade level, with slight increases over the year.”

The authors of this report are well-versed in reading test data, unlike those who quote the superficial statistics without understanding them. The authors did not just look under the streetlight, instead expanding their search to a darker area of the street corner.

There are other ways of evaluating schools, but they take more time and care than simply waiting for a testing agency to post its latest results. In a study I will publish later this year, my co-author Stacia Long and I followed a teacher through nearly a decade of teaching, including her work in schools where everything was data driven. She found the weekly (or more) data meetings to be useless, both in and of themselves and in the time they detracted from doing her job. The only productive occasions from these meetings came when teachers talked off-task about their teaching with their colleagues.

But who has time to listen to teachers and their insights and concerns?

What about listening to students? Well, that can be time-consuming. But it can be illuminating. Here’s what happened in Hall County when the school administration looked beyond test scores and into the quality of students’ experiences:

“As part of its mental health initiative, (the) school asks students what they think, what they have experienced and what they need. In one setting, students were asked to complete the following sentence: ‘If my teacher really knew me, they would know ... .’ Answers included: ‘They would know how much potential I have if someone just gave me the chance. I could help guide others in the right direction and also myself, but it’s hard when you lack hope.’ Another wrote, ‘They’d know that I’ve been verbally abused all my life and treating me unfairly makes me feel like I can’t do anything.’”

Not only are the school leaders looking for different data, but they are also asking very different sorts of questions. The sort that can lead to a dramatic shift in how the school is organized and how it functions. The light might be murkier, and the time commitment might be greater, but it seems more worthwhile to answer important questions with good information than to use superficial information to address hard questions.

Peter Smagorinsky is a retired professor in the University of Georgia’s College of Education and a 2024 inductee in the Reading Hall of Fame, which recognizes outstanding research and scholarship in literacy for a minimum of at least 25 years.