To give an indication of the range of possibilities, this section provides descriptions of some innovative assessments. Of course, space limits the number of examples that can be given. Nonetheless, the variety in these examples demonstrates the ability of computerized assessment to evaluate a wide range of human attributes.
Interpersonal skills. Assessing interpersonal skills has proved to be difficult and at times fruitless. Thorndike and Stein (1937) criticized early efforts because they did not provide measures with sufficient reliability. More recent evaluations (e.g., Davies, Stankov, & Roberts, 1998) have found that some measures are adequately reliable, but are confounded with intelligence or personality. Underly ing this problem is that most of these assessments have relied on text-based items. Interpersonal interactions involve more than is conveyed in a terse written description of the context and events. Because people often communicate nonverbally, a transcript of dialogue omits much that is important. Nonverbal communication has been shown to be important in a variety of circumstances (e.g., Motowidlo & Burnett, 1995; Rinella, Ferguson, & Sager, 1970). Moreover, by including descriptions of emotion or attitude, we are calling attention to those cues, rather than relying on the respondent's interpersonal skill to take notice. Furthermore, text-based items prime respondents to rely on intellectual processing, rather than their interpersonal skills: Chan and Schmitt (1997) found that a paper-and-pencil test of work habits and interpersonal skills correlated r = .45 with reading comprehension, but a parallel assessment based on video clips correlated only r = .05.
Olson-Buchanan et al. (1998) developed a computer-administered test of conflict resolution skills in the workplace. The assessment uses video clips to present typical workplace conflicts; the expressions and verbal tone of the actors are salient in each scene. At a critical point in each conflict, the respondent is presented with several potential responses to the problem and asked to pick the best option. With samples from multiple organizations in a variety of industries, Olson-Buchanan assessed the criterion-related validity of the assessment in comparison to tests of verbal and quantitative abilities. They found that the conflict resolution skills assessment correlated significantly with independent ratings of the assessees' performance in resolving conflict in the workplace, whereas the cognitive ability measures were unrelated. This suggests that a video-based simulation of an important interpersonal skill has criterion-related validity that is separate and distinct from intelligence. Subsequent research has found little relation of a similar video-based assessment with personality (Bergman, Donovan, & Drasgow, 2001).
National Board of Medical Examiners case simulation. Candidates for the medical licensing examination for the National Board of Medical Examiners
(NBME) take a computer-based case simulation (Clyman, Melnick, & Clauser, 1999). In this exam, the candidate physician diagnoses and treats a series of virtual patients. Candidates can request the patient's history, order a physical exam, order one or more tests, provide a treatment, or request a consultation, all of which are typically available to a physician. The condition and symptoms of the virtual patient change in response to the actions of the candidate. For example, if the candidate physician orders a test that requires 30 minutes to perform, he or she must move the clock forward 30 minutes to receive the results. Concomitantly, the virtual patient's condition progresses for 30 minutes with the symptoms changing accordingly.
The computer-based simulation is designed to accurately assess a physician's patient care strategy. In real-world situations, diagnosing and treating patients often requires more than a single straightforward decision. A problem-solving process may be required in which possible causes are assessed in an interrelated series of tests. Multiple-choice questions may artificially isolate components of this process and therefore fail to provide an adequate evaluation of the candidate's patient care strategy. Clyman et al. (1999) found a disattenuated correlation of only about .5 between scores on the multiple-choice section of the NBME examination and the case simulation. At the very least, this suggests that the case simulation is assessing an aspect of patient care not covered by the multiple-choice questions. Moreover, from the perspective of the patient, diagnosis and treatment is arguably the most critical aspect of a candidate physician's skill.
The Uniform Certified Public Accountant (UCPA) examination moved to a computer-based format in 2004. In addition to a multiple-choice section, the exam incorporates simulations of typical accounting tasks for entry-level certified public accountants (CPAs). The simulations require examinees to enter values into spreadsheets, conduct research, evaluate risk, and justify conclusions. This last component is particularly interesting: Candidates must examine a searchable version of the authoritative accounting standards to find the regulation that justifies their decision and then copy and paste it into a text box. In other words, rather than simply testing rote memory, the assessment evaluates whether a candidate knows where to look for answers and how to apply them to accounting problems.
Architect Registration Exam (ARE). At first glance, computerized scoring of architectural design would appear to be impossible. After all, beauty is in the eye of the beholder, and it seems improbable that a computer algorithm would be able to capture the most human of abilities, creativity. Consequently, the ARE answers were previously scored holistically by human graders. Nonetheless, the ARE has switched to a computerized format scored via computer.
Rather than trying to emulate human graders, the ARE scoring algorithms were built to produce consistent scores that reflect key features of designs. Here, a single design receives the same score when rescored, and the scoring criteria remain consistent across answers (Bejar & Braun, 1999). To achieve such consistency, the designs are scored based on a microlevel analysis rather than holistically. To allow this type of scoring, design tasks must be constructed according to a detailed set of specifications, which imposed a rigor during item development.
Developing new scoring algorithms for each new design problem would be a huge burden. The ARE research team developed a number of standard design tasks, which they called vignettes. Each vignette has a specific scoring algorithm, which was time consuming and labor intensive to create. However, to develop new tasks, the features of the vignette were changed, although the basic design problem remained the same. These "isomorphs," or clones of a specific vignette, are scored with the same scoring algorithm.
Musical Aptitude. Assessing musical aptitude has been challenging because reproducing acoustic tones accurately has traditionally been dependent on audiocassette players. The poor quality of the cassettes and the degradation of sound quality over time made accurate reproduction unreliable. In cases where the audio clips are played for a group, various factors such as the examinee's seating position relative to the speakers, sneezing or coughing by another examinee, and the acoustic qualities of the examination room can all influence the performance of the examinee.
Multimedia computers are particularly suitable for assessing musical aptitude. They can play audio clips, present text and graphical images that ask about the audio clips, and then record the examinee's responses. Additionally, because digital recordings do not degenerate over time, the sound quality remains constant. Further benefits with computerized assessment include examinees proceeding at their own pace and using headphones that minimize the effects of other noises.
Walter Vispoel has pioneered the development of musical aptitude testing since the early days of personal computers (Vispoel, 1987, 1999). One recent version of Vispoel's (1999) musical aptitude test has the computer play a short musical melody and then repeat the melody. The examinee's task is to determine which note, if any, was changed in the second melody. The assessment is a CAT that uses IRT to determine the next item (i.e., melody) to administer and requires far fewer items than a conventional test to obtain the same measurement precision.
Was this article helpful?