A review of

Computer-based Testing and Validity: a look back into the future


Assessment in Education, Vol. 10, No. 3, November 2003 (Carfax Publishing)



Important points of the content (main issues addressed)


During the last years standardized tests for evaluating the achievements of students, schools and school districts have increased. Using computers is proven cheaper, more effective, and time saving than using paper based tests. The subject of this paper is benefits and costs associated with moving paper-based tests to computers, and especially how validity might be impacted.


In the early era – 1969-1985, a computerized test was likely to represent the first time a test-object used a computer, and therefore the experience of actually using a computer could take away attention from the actual test. At times, technical difficulties and complicated interfaces could also have an impact on the examinee’s performance.


Lee and Hopkins (1985) presented a few factors that could affect the test performance significantly.


  • Computer basted tests must give ability to change answers, skip individual test items and to review past items.
  • Lack of scratchwork space lowers the test score, especially in complex tasks.


In 1986, the American Psychological Association (APA) published Guidelines for Computer-Based Tests and Interpretations. It ncluded the ponts from Lee and Hopkins and added that the examinee should be made aware of any inequalities resulting from the modality of administration (paper- or computer based test).


In contrast to validity studies conducted during the first 15 years of computer-based testing, the focus later shifted to identifying key factors that affect examinee performance:


(a) ability to review and revise responses

Wise and Plake (1989) also mentioned the importance of being able to review and skip, and added the criteria allowing examinees to change answers to items. Research by Mueller and Wasser (1977) concluded that students throughout the total test gain more than they lose by changing answers.


(b) presentation of graphics and text on computer screens

Mazzeo and Harvey (1988) found that tests that require multiscreen, graphical, or complex displays result in modal effects. Graphical displays affect examinee performance.


(c) prior experience working with computers.

Familiarity with computers plays a significant role in test performance. Newer studies suggests that, for some students accustomed to writing on computers, computer-based testing may provide a better mode than traditional paper-and pencil for assessing their writing ability (Russell, 1999; Russell & Haney, 1997; Russell & Plati, 2001, 2002). But it’s the same conversely. If students lack in computer skills, the computer can interfere with their ability to perform their best


Russell, Goldberg and O’Connor concludes: “We suggest that examinees should be able to customise the environment in which they perform the test so that the influence of factors irrelevant to the construct being measured is reduced.”



Reflections on the paper’s contribution to the field of e-assessment


Though this paper was written 12 years ago and is partly based on even older research, I think there are a lot of relevant points to consider today.


I will especially address the conclusion. The goal of standardized tests (or any test at all) is to give an assessment of a student’s performance within a subject. The results from these exams and tests are used in selection processes for schools and jobs – and therefore of great significance for each individual. Therefore they have to be valid.


Giving identical tests to all students in a country should add to the validity. But Russell, Goldberg and O’Connor have an important point: Maybe the environment, available equipment and the form of the test itself already has given some students an advantage (and others a disadvantage)? If the goal is to measure and compare skills in math or a language, shouldn’t all students be able to perform their best? If a test itself makes a difference between students – is it really valid?


Reflections on how to apply the ideas of this paper in a relevant educational context


I sometimes make computer-based summative tests in Fronter for my students. In this paper I found a few things that definitively will make me change the way I give these testes in the future. I work in high school teaching media and communication, and my students are overall experienced with technology. But that doesn’t mean that they necessarily prefer using their computers for tests. Therefore, from now on, I have decided to


  • Provide paper based test (print the test from Fronter) for those who prefers pen and paper.
  • Provide paper for drafts.
  • Go through how the test-environment works and show the students how they can skip and change answers and how to navigate in the test-environment. (Earlier I have assumed that they will figure it out, but haven’t really considered the fact that it may steal time from the actual test).


I’m sure there is a lot more to add. Looking forward to hearing from you!




Email: ellen (a)