So far I have explored the development of computer-based assessment strategies for clinical decision making, described how narrative programs are developed, and examined their equivalence and accuracy or validity. In this section I provide a summary of limitations of computer-based assessment and indicate some directions that further studies will likely or should go.
Computer-based testing services have not maximally incorporated the flexibility and graphic capabilities in presenting test-based stimuli. Psychologists have not used to a great degree the extensive powers of the computer in presenting stimuli to test takers. Much could be learned from the computer game industry about presenting items in an interesting manner. With the power, graphic capability, and flexibility of the computer, it is possible to develop more sophisticated, real-world stimulus environments than are currently available in computer-administered methods. For example, the test taker might be presented with a virtual environment and be asked to respond appropriately to the circumstances presented.
It is likely that assessment will improve in quality and effectiveness as technology—particularly graphic displays and voice-activated systems—improves in quality. At the present time, the technology exists for computer-based assessment of some complex motor activities, but they are extremely expensive to develop and maintain. For example, airlines use complex flight simulators that mimic the flight environment extremely well. Similar procedures could be employed in the assessment of cognitive functioning; however, the psychotechnology is lacking for developing more sophisticated uses. The computer assessment field has not kept up with the electronic technology that allows developing test administration strategies along the lines of the virtual reality environment. A great deal more could be done in this area to provide more realistic and interesting stimulus situations to test takers. At present, stimulus presentation of personality test items simply follows the printed booklet form. A statement is printed on the screen and the client simply presses a yes or no key. Many response behaviors that are important to test interpretation are not incorporated in computer-based interpretation at present (e.g., stress-oriented speech patterns, facial expressions, or the behavior of the client during testing). Further advancements from the test development side need to come to fruition in order to take full advantage of the present and future computer technology.
Computer-based reports are not stand-alone clinical evaluations. Even after almost 40 years of development, most computer-based interpretive reports are still considered to be broad, generic descriptions rather than integrated, standalone psychological reports. Computer-generated reports should be considered as potentially valuable adjuncts to clinical judgment rather than stand-alone assessments that are used in lieu of an evaluation of a skilled clinician (Fowler, 1969). The reports are essentially listings of the most likely test interpretations for a particular set of test scores—an electronic dictionary of interpretive hypotheses that have been stored in the computer to be called out when those variables are obtained for a client.
Many people would not, however, consider this feature to be a limitation of the computer-based system but actually prefer this more limited role as the major goal rather than development of final-product reports for an instrument that emerge from the computer. There has not been a clamoring in the field for computer-based finished-product psychological reports.
Computer-based assessment systems often fail to take into consideration client uniqueness. Matarazzo (1986) criticized computerized test interpretation systems because of their seeming failure to recognize the uniqueness of the test takers—that is, computer-based reports are often amorphous descriptions of clients that do not tap the uniqueness of the individual's personality.
It is true that computer-based reports seem to read a lot alike when one sees a number of them for different patients in a particular setting. This sense of sameness results from two sources. First, computerized reports are the most general summaries for a particular test score pattern and do not contain much in the way of low-frequency and specifically tailored information. Second, it is natural for reports to contain similar language because patients in a particular setting are alike when it comes to describing their personality and symptoms. For example, patients in a chronic pain program tend to cluster into four or five MMPI-2 profile types—representing a few scales, Hypochondriasis (Hs), Hysteria (Hy), Depression (D), and Psychasthenia (Pt; Keller & Butcher, 1991). Patients seen in an alcohol treatment setting tend to cluster into about four clusters, usually showing Paranoid (Pd), D, Pt, and Hypomania (Ma). Reports across different settings are more recognizably different. It should be noted that attempting to tailor test results to unique individual characteristics is a complex process and may not always increase their validity because it is then necessary to include low base rate or rare hypotheses into the statement library.
The use of computer-based reports in clinical practice might dilute responsibility in the psychological assessment. Matarazzo (1986) pointed out that the practice of having unsigned computer-based reports creates a problem—a failure of responsibility for the diagnostic evaluation. According to Matarazzo, no one feels directly accountable for the contents of the reports when they come from a computer. In most situations today, this is not considered a problem because computer-based narrative reports are clearly labeled professional-to-professional consultations. The practitioner chooses to (or not to) incorporate the information from the report into his or her own signed evaluation report. Computer-based reports are presented as likely relevant hypotheses and labeled as consultations; they are not sold as stand-alone assessment evaluations. In this way, computerized interpretation systems are analogous to electronic textbooks or reference works: They provide a convenient lookup service. They are not finished products.
Computer-based reporting services do not maximally use the vast powers of the computer in integrating test results from different sources. It is conceptually feasible to developing an integrated diagnostic report—one that incorporates such elements or components as
• Behavioral observations.
• Personal history.
• Personality data from an omnibus personality measure such as the MMPI-2.
• Intellectual-cognitive abilities such as those reported by the Wechsler scales or performance on a neuropsycho-logical battery such as the Reitan Neuropsychological Battery.
• Current stressors.
• Substance use history.
Moreover, it would be possible (and some research supports its utility) to administer this battery adaptively (i.e., tailored to the individual client), reducing the amount of testing time by eliminating redundancy. However, although a fully integrated diagnostic system that incorporates different measures from different domains is conceptually possible, it is not a practical or feasible undertaking for a number of reasons. First, there are issues of copyright with which to contend. Tests are usually owned and controlled by different—often competing— commercial publishers. Obtaining cooperation between such groups to develop an integrated system is unlikely. Second, there is insufficient research information on integrated interpretation with present-day measures to guide their integration into a single report that is internally consistent.
The idea of having the computer substitute for the psychologist's integrative function has not been widely proclaimed as desirable and in fact has been lobbied against. (Matarazzo, 1986), for example, cautioned that computerized testing must be subjected to careful study in order to preserve the integrity of psychological assessment. Even though decision-making and interpretation procedures may be automated with computerized testing, personal factors must still be considered in some way. Research by Styles (1991) investigated the importance of a trained psychologist during computerized testing with children. Her study of Raven's Progressive Matrices demonstrated the need for the psychologist to establish and maintain rapport and interest prior to, during, and after testing. These factors were found to have important effects on the reliability and validity of the test data, insofar as they affected test-taking attitudes, comprehension of test instructions, on-task behavior, and demeanor. Carson (1990) has also argued for the importance of a sound clinicianship, both in the development of psychological test systems and in their use.
Tests should not be used for tasks beyond their capability. If a test has not been developed for or validated in a particular setting, computer-based applications of it in that setting are not warranted. Even though computer-based psychological tests have been validated in some settings, it does not guarantee their validity and appropriateness for all settings. In their discussion of the misuse of psychological tests, Wakefield and Underwager (1993) cautioned against the use of computerized test interpretations of the MCMI and MCMI-II, which were designed for clinical populations, in other settings, such as for forensic evaluations. The danger of misusing data applies to all psychological test formats, but the risk seems particularly high when one considers the convenience of computerized outputs—that is (as noted by Garb, 1998), some of the consumers of computer interpretation services are nonpsychol-ogists who are unlikely to be familiar with the validation research on a particular instrument. It is important for scoring and interpretation services to provide computer-based test results only to qualified users.
Research evaluations of computer-based systems have often been slow to appear for some assessment methods. The problems with computer-based assessment research have been widely discussed (Butcher, 1987; Maddux & Johnson, 1998; Moreland, 1985). Moreland (1985), for example, concluded that the existing research on computer-based interpretation has been limited because of several methodological problems, including small sample sizes, inadequate external criterion measures to which one can compare the computer-based statements, lack of information regarding the reports' base-rate accuracy, failure to assess the ratings' reliability across time or across raters, failure to investigate the internal consistency of the reports' interpretations, and issues pertaining to the report raters (e.g., lack of familiarity with the interpretive system employed), lack of expertise in the area of interest, and possible bias secondary to the theoretical orientation of the rater. D. K. Snyder, Widiger, and Hoover (1990) expressed concerns over computer-based interpretation systems, concluding that the literature lacks rigorously controlled experimental studies that examine methodological issues. They recommended specifically that future studies include representative samples of both computer-based test consumers and test respondents and use characteristics of each as moderator variables in analyzing reports' generalizability.
In fairness to computer-based assessment, there has been more research into validity and accuracy for this approach than there has been for the validity of interpretation by human interpreters—that is, for clinical interpretation strategies. Extensive research on some computer-assisted assessments has shown that automated procedures can provide valid and accurate descriptions and predictions. Research on the accuracy of some computer-based systems (particularly those based on the MMPI and MMPI-2, which have been subjected to more scrutiny) has shown promising results with respect to accuracy. However, reliability and utility of computer-based interpretations vary as a function of the instruments and the settings included, as illustrated by Eyde et al. (1991) in their extensive study of the accuracy of computer-based reports.
Computer-based applications need to be evaluated carefully. Computer system developers have not always been sensitive to the requirement of validation of procedures. It is important for all computer-based systems to be evaluated to the extent that MMPI-based programs have been subjected to such evaluation (Butcher, 1987; Fowler, 1987; Moreland, 1985).
It should be kept in mind that just because a report comes from a computer, it is not necessarily valid. The caution required in assessing the utility of computer-based applications brings about a distinct need for specialized training in their evaluation. It is also apparent that instruction in the use (and avoidance of misuse) of computer-based systems is essential for all professionals who use them (Hofer & Green, 1985). There is also a need for further research focusing on the accuracy of the information contained in computer-based reports.
Was this article helpful?