Evaluating differences between CAT4 scores

Evaluating a difference between two scores, whether scores on two different tests or scores on the same test on two occasions, has to be a three-stage process.

Statistical significance of differences

First, it needs to be decided if the difference is large enough to be considered as ‘real’ rather than just a result of having imprecisely measured the two scores. This depends upon the test reliability of each of the two scores and, hence, the ‘noise’ around each one.

The measurement error when calculating a difference between two scores is evaluated using a coefficient called the standard error of measurement difference (SEMdiff).

The SEMdiff for CAT4 scores is approximately seven standard score points. Consequently, if two scores are more than seven SAS points apart, it is 68% likely that they are real, and if they are 11 points apart, the likelihood is 90% that the difference is a real one.

Rarity of differences

Second, if the difference is ‘real’ or statistically significant, then the unusualness or rarity of the difference has to be evaluated. A significant difference can sometimes be very common. For example, if you use a millimetre ruler to measure a boy’s height when he is seven and then again when he is eight, the difference between these two heights can be measured very accurately to within two millimetres. Therefore ‘real’ or statistically significant differences will be very common in a sample of boys because the difference between the heights is likely to be substantially greater than two millimetres in almost all cases.

The spread of difference in scores can be determined either directly from the data or by a formula that takes into account the spread of scores on each test and the correlation between the two sets of scores. If the sample size is large enough, the two methods will produce very similar results; this was the case for the standardisation of CAT4. The formula used is:

where SD1 and SD2 are the standard deviations of the scores on each test and r12 is the correlation between the two tests.

When looking at differences between a child’s scores on the same battery on two occasions (e.g. Verbal in Year 7 and Verbal in Year 8) the table below can be used1. For example, a score increase of 11 SAS points or more will occur with between 10% and 15% of children, but a decrease of 17 or more points will occur with only the most extreme 5%.

When looking at score differences between different batteries (e.g. Quantitative and Nonverbal), this table should be used instead2. The SAS score differences are larger in this situation because the two measures are of different underlying mental processes and so tend to be less highly correlated than two scores on the same test.

1 The figures in the table have assumed a mean correlation of 0.8 between the two occasions.

2 The figures in the table have assumed a mean correlation of 0.7 between pairs of batteries.

Practical significance of differences

Finally, it needs to be remembered that a difference between two batteries which occurs commonly in the general population is not necessarily insignificant. It can indicate a real, albeit common, difference between the development of the cognitive abilities underlying the two battery scores, with implications for the ways in which the student concerned is likely to progress academically. Such differences need to be interpreted in the light of all that is known of a student’s background and educational record. For example, students who have a background of poor socio-economic and educational opportunities who gain higher scores for Nonverbal Reasoning than for Verbal Reasoning may not have any real difference between their abilities to reason with words and with shapes. Instead, they may not have had the chance to acquire the basic reading and word knowledge needed to perform well on the verbal tasks. On the other hand, if they have good socio-economic and educational backgrounds, then the score difference may suggest that there is a genuine difference in abilities to think with words and with shapes.