You are here

Too much testing? Finding the signal amongst the noise

Charles Darr and Hilary Ferral
Abstract: 

Measuring the main trends of an individual's progress is always difficult. However, the presence of measurement error makes it harder. Good tests indicate the measurement error associated with test scores. This article describes how this can be done.

Journal issue: 

Too much testing?

Finding the signal amongst the noise

Charles Darr and Hilary Ferral

Investors who continuously measure the value of their stock portfolios are needlessly increasing their stress levels according to Nasim Taleb (Taleb, 2004). He argues that because trends take time to emerge, random fluctuations can lead to short-term losses (and therefore feelings of disappointment), even when the investment is fundamentally sound and bound to pay off over time. Taleb notes that human beings are far more affected by disappointment than success and suggests that such an investor could avoid this kind of emotional roller coaster if he or she limited themselves to monthly, or perhaps even yearly statements.

Taleb calls these random fluctuations “noise”. Irrelevant to the real value of the portfolio, the noise clouds the trend, or what he calls the “signal” that is slowly emerging.

When we attempt to measure an individual’s progress in an educational setting we have a very similar problem. Gains take time to establish themselves and the random error involved in assessing progress can mean an apparent gain (or reversal), particularly over a short time interval, is nothing more than random noise.

Sometimes test scores are understood as being composed of two parts: a true component (the signal), and an error component (the noise). The error component represents the part of the score that can not be relied on. It comes about because of the random things that happen in any test situation to inflate or deflate a student’s score. We can never tell precisely how much error is involved in a test score, but we can acknowledge its presence by representing a score as a range, rather than a precise point, for instance 34 plus or minus 4. This is usually read to mean that we are reasonably certain the student’s true score is somewhere in the range 30 to 38. In educational testing the plus or minus 4 is sometimes referred to as the standard error of measurement (SEM).

Figure 1 below shows the effect of measurement error on a student’s score. This student has scored 25 and the SEM for the test is about 3. The figure shows that we are about 70 percent certain that the student’s true score is somewhere between 22 and 28 and there is a 95 percent chance that it is between 19 and 31.

Image

The presence of measurement error can make it very difficult to work out how much progress an individual student has made since the last time they were tested.

Let us say that we have a fictitious test made up of 50 questions that can be repeated as often as we want without the students becoming aware of the correct answers or disengaged with the testing process. (Of course, no such test exists.) The manual for our test tells us that the SEM is plus or minus three marks; a not unreasonable assumption for a 50-question test. Let us also say that the average annual progress on this test is an improvement of six marks, which is fairly typical for this kind of test. If we imagine a student with an initial score of 25, we expect that after 12 months she will be scoring 31. To keep it simple we will say that the student’s initial score of 25 had no error associated with it—it was her true score. We will also assume that she is making the progress expected and is doing so in a uniform way. That is, every day our student increases her true achievement level by exactly the same amount (0.016 of a mark each day).

Now let us use the same test to measure the student’s progress at different time intervals in one year. Because we know her rate of progress and the amount of error involved in a test score, it is possible to model the possible range of scores she could achieve at each testing session. This has been done, and the table below shows the chances that she will achieve a lower mark on the test because of measurement error.

Image

As can be seen, even after half a year there is a 16 percent chance that our student will achieve a lower mark than her initial score of 25 just because of the randomness involved in testing. In fact, at this rate of progress it will take about 300 days before we can be 95 percent sure random error will not produce a lower score.

Figure 2 shows the situation graphically. This time, the year has been divided into monthly testing sessions. Each curve shows the range of scores the student could achieve on a monthly test as she makes her regular progress. The shaded part of each curve represents the proportion of the time her score is expected to be lower than the score she started with. The dotted line shows the student’s steady overall progress.

Figure 2 also demonstrates a closely associated problem. The size of the possible error in each result makes it very difficult to measure precisely how much gain has been made between two testing sessions.

When we compare two test scores the measurement error involved in each one compounds. Let us take another student who initially scored 30 on the test and a year later has scored 36. On the face of it this is a gain of six marks. However, the real gain could be more, or unfortunately, less. When this student scored 30 we expected his true score to be somewhere between 27 and 33. The second score of 36 means we expect his new, improved true score to be somewhere between 33 and 39. This implies the gain could actually be anywhere between zero marks (no gain at all) and 12 marks (a substantial gain). Like a test score, we have to understand the gain as lying in a possible range rather than as a specific amount. Although we can be reasonably sure that the student has made an improvement, we cannot define it very precisely.

Image

Attempting to measure an individual’s progress over short time intervals will often leave us no wiser as to how much the student has improved. We will see far more noise than signal in the gain scores even though real improvements could be taking place. Unlike the test described above, frequent testing can introduce sources of bias— nonrandom, systematic influences that corrupt our measurements. These include practice effects, where the student becomes familiar with the question style or type of questions, and motivational effects where students become disaffected with the testing process. Assessment should be regular, but it does not need to be focused on measuring progress. Regular assessment should be used to ascertain formative needs and involve a variety of techniques.

Test marks should be confirmed by the presence of other data. The more information we have the more likely it is we will make valid decisions about the student’s learning needs. This information does not have to come from standardised testing.

One score on its own can leave us exposed to the effects of measurement error. A score needs to be understood as a range rather than a precise point. We need to be patient, stay aware of the noise, and look for the signal.

More information about measurement error

•&&&&A good test will indicate the measurement error associated with test scores. Check the teacher’s manual.

•&&&&There is more measurement error associated with high and low test scores than with mid-range scores.

•&&&&Compared to an individual’s score, group averages are not as affected by measurement error (although they can be affected by systematic bias).

•&&&&Tests or subtests based on only a few items will be more affected by measurement error than longer tests.

Reference

Taleb, N. (2004). Fooled by randomness. The hidden role of chance in life and in the markets. London: Penguin Books.

Charles Darr is a senior researcher and manager of the assessment design and reporting team at the New Zealand Council for Educational Research.

Email: charles.darr@nzcer.org.nz

Hilary Ferral is a statistician with the New Zealand Council for Educational Research.

Email: hilary.ferral@nzcer.org.nz