What Do Increased Test Scores Mean? Perhaps Nothing

by E. Wayne Ross

Signs of positive improvements in Kentucky schools are being widely reported. The number of Kentuckians with a high school diploma is up 10 percent over the past decade; more of the commonwealth’s students are taking the ACT college entrance test, with scores are up over last year; and students’ scores on the Comprehensive Test of Basic Skills (CTBS), a widely used standardized test, have slightly improved.

The general consensus among the education establishment and the media is that recent reports illustrate a slow, steady progress for educational attainment in the state, but that much work remains to be done. For example, even though Kentucky has had the fastest growth in high school graduates over the past 10 years, the state still ranks 49th in the US; the state’s 20.2 ACT composite score lags behind the national average; and CTBS scores place Kentucky third-, sixth-, and eighth-graders at the 63rd, 54th, and 52nd percentiles nationally.

Our judgments of educational improvement (or lack thereof) are more often than not the result of how we interpret numbers like those reported above. These days test scores, in particular, are the coin of the realm in education. And while increasing test scores are not a bad thing, they do not necessarily mean that education is improving.

Take the latest reports on the CTBS for example.  Every media report on Kentucky’s scores that I saw reported, “the national average [on the CTBS] is 50.” This is a subtle but crucial inaccuracy. The CTBS is a “norm-referenced” test or NRT, which means this test compares a student’s score against the scores of a group of students who have already taken the same exam in an effort to rank-order test takers. These tests do not compare all the students who take a test in a given year. Test-makers select a sample from the target population (say third-graders) and the test is “normed” on this sample, which is suppose to represent all third-graders in the nation. Students’ scores are then reported in rank-order in relation to the scores of the norming group.

To make comparisons easier NRTs are created so that most students will score near the middle and only a few will score low or high (the graphed scores form a bell-shaped curve), with the “average” student at the 50th percentile—which means that this student scored higher than 50% of the test-takers in the norming group. In making NRTs it is often more important to choose questions that sort people along the curve than it is to make sure that the content covered by the test is aligned to what is taught in schools. As a result, these tests sometimes emphasize small, meaningless differences among test-takers. In some cases having one more question right (or wrong) can cause a student’s score to jump (or drop) more than ten points.

What do increased scores on the CTBS mean then? It may mean that students know more or it may not. NRTs only ask a small sample of the thousands of questions that could be asked, so test scores are only an estimate at best. No test is perfectly reliable so a score such as the 63rd percentile (the total CTBS score for Kentucky third graders this year) is an estimate, the “true score” for these students is somewhere between the 56th and the 70th percentile (or even further off). Sub-scores on NRTs (e.g., scores for sections on math, language, and reading) are even less precise.

Many mistakes can be made by relying on standardized test scores to make educational and policy decisions and every major test-maker warns schools not to use NRTs for making decisions about retention, graduation or placement. Any one test can only measure a limited part of a subject area. Most NRTs are heavily focused on memorization and routine procedures, which often causes teachers to overemphasize memorization and de-emphasize thinking and application of knowledge. As a result, the curriculum is narrowed and students are deprived of a high quality, challenging education. NRTs support the idea that learning (or intelligence) fits a bell curve. If educators adopt this belief, they are likely to have low expectations for students who score “below average.”

The bottom-line is that scores from the CTBS (or any norm-referenced test) should not be used to make judgments about school improvement. There are often calls for all students’ scores to be above the national average. This is not possible and the CTBS is constructed so that half the population is below the mid-point or “average” score. Expecting all students to be above the 50th percentile is like expecting all teams in a football league to win more than half their games. Tests are used year after year and because schools teach to the test there are times when far more than half the students score “above average,” creating an illusion of increased achievement.

There is no doubt that there is much work to be done to make Kentucky’s schools the best they can be. An important part of that effort should include an increased understanding of what test scores mean, and the dangers of using norm-referenced tests in particular. The media, educators, policymakers, and politicians, in particular, bear important responsibilities for understanding and accurately communicating the meaning of test scores and other measures of educational performance. To do any less is to fail in our efforts to serve the public interest and support real educational improvement.

 

 

       

Next Article

Return to Rouge Forum index