Reliability of Exact

‘Reliability’ generally refers to the extent to which a test can be expected to give the same results when administered on a different occasion (test-retest reliability) or by a different administrator (inter-rater reliability), or to which the components of a test give consistent results (internal consistency). Note that this is not the same as the validity of the test (see Section 1.4).

Table 3. Reliability coefficients (Cronbach’s alpha) for the tests in Exact shows the coefficients of reliability for each of the Exact tests, calculated using Cronbach’s alpha statistic, which is a measure of the internal test consistency. Note that the reliability coefficients shown in the table are all high (around 0.9) except for the reading comprehension test, where the reliabilities are nearer 0.8. This is because reading comprehension scores are based on a relatively small number of test items. These results show that all the tests in Exact have satisfactory reliability.

Although test-retest reliability is frequently quoted in test manuals, this measure is problematic because students are likely to remember items and answers from the previous assessment, which results in confounding memory factors. However, since Exact comprises two parallel forms, these can be compared in a test-retest situation, which is arguably a more satisfactory method of checking the test reliability since the test content is different in the two forms. To achieve this, Exact reading comprehension and spelling test data were collected from a total of 373 students aged 11-16 attending a large secondary academy in South London. The test-retest correlation coefficients over a period of six months were: spelling 0.757, reading comprehension accuracy 0.614, reading comprehension speed 0.511, these results all being statistically significant at p<0.001. (Word recognition and writing/typing to dictation were not tested in this project.) Given the nature of the reading comprehension test, with five increasingly lengthy and complex texts of different genres and on different topics, together with progressively challenging questions, this result clearly demonstrates satisfactory psychometric and educational integrity of the assessment methods.

