In addition to the basic principles described earlier in this chapter (i.e., the preparation, conduct, and follow-up of the actual assessment), some special issues regard psychological testing. These issues include automated or computerized assessment services, high-stakes testing, and teaching of psychological assessment techniques. Many of these topics fall under the general domain of the testing industry.
Psychological testing is big business. Test publishers and other companies offering automated scoring systems or national testing programs are significant business enterprises. Although precise data are not easy to come by, Walter Haney and his colleagues (Haney, Madaus, & Lyons, 1993) estimated gross revenues of several major testing companies for 1987-1988 as follows: Educational Testing Service, $226 million; National Computer Systems, $242 million; The Psychological Corporation (then a division of Harcort General), $50-55 million; and the American College Testing Program, $53 million. The Federal Reserve Bank suggests that multiplying the figures by 1.56 will approximate the dollar value in 2001 terms, but the actual revenue involved is probably significantly higher, given the increased numbers of people taking such tests by comparison with 1987-1988.
The spread of consumerism in America has seen increasing criticism of the testing industry (Haney et al., 1993). Most of the ethical criticism leveled at the larger companies fall into the categories of marketing, sales to unauthorized users, and the problem of so-called impersonal services. Publishers claim that they do make good-faith efforts to police sales so that only qualified users obtain tests. They note that they cannot control the behavior of individuals in institutions where tests are sent. Because test publishers must advertise in the media provided by organized psychology (e.g., the APA Monitor) to influence their prime market, most major firms are especially responsive to letters of concern from psychologists and committees of APA. At the same time, such companies are quite readily prepared to cry antitrust fouls when professional organizations become too critical of their business practices.
The Center for the Study of Testing, Evaluation, and Educational Policy (CSTEEP), directed by Walt Haney, is an educational research organization located at Boston College in the School of Education (http://wwwcsteep.bc.edu). CSTEEP has been a valuable ally to students who have been subjected to bullying and intimidation by testing behemoths such as Educational Testing Service and the SAT program when the students' test scores improve dramatically. In a number of circumstances, students have had their test results canceled, based on internal statistical formulas that few people other than Haney and his colleagues have ever analyzed. Haney has been a valuable expert in helping such students obtain legal remedies from major testing companies, although the terms of the settlements generally prohibit him from disclosing the details. Although many psychologists are employed by large testing companies, responses to critics have generally been issued by corporate attorneys rather than psychometric experts. It is difficult to assess the degree to which insider psychologists in these big businesses exert any influence to assure ethical integrity and fairness to individual test takers.
Automated testing services and software can be a major boon to psychologists' practices and can significantly enhance the accuracy and sophistication of diagnostic decision making, but there are important caveats to observe. The draft revision of the APA code states that psychologists who offer assessment or scoring services to other professionals should accurately describe the purpose, norms, validity, reliability, and applications of the procedures and any special qualifications applicable to their use (ECTF, 2001). Psychologists who use such scoring and interpretation services (including automated services) are urged to select them based on evidence of the validity of the program and analytic procedures (ECTF, 2001). In every case, ethical psychologists retain responsibility for the appropriate application, interpretation, and use of assessment instruments, whether they score and interpret such tests themselves or use automated or other services (ECTF, 2001).
One key difficulty in the use of automated testing is the aura of validity conveyed by the adjective computerized and its synonyms. Aside from the long-standing debate within psychology about the merits of actuarial versus clinical prediction, there is often a kind of magical faith that numbers and graphs generated by a computer program somehow equate with increased validity of some sort. Too often, skilled clinicians do not fully educate themselves about the underpinnings of various analytic models. Even when a clinician is so inclined, the copyright holders of the analytic program are often reluctant to share too much information, lest they compromise their property rights.
In the end, the most reasonable approach is to use automated scoring and interpretive services as only one component of an evaluation and to carefully probe any apparently discrepant findings. This suggestion will not be a surprise to most competent psychologists, but unfortunately they are not the only users of these tools. Many users of such tests are nonpsychologists with little understanding of the interpretive subtleties. Some take the computer-generated reports at face value as valid and fail to consider important factors that make their client unique. A few users are simply looking for a quick and dirty source of data to help them make a decision in the absence of clinical acumen. Other users inflate the actual cost of the tests and scoring services to enhance their own billings. When making use of such tools, psychologists should have a well-reasoned strategy for incorporating them in the assessment and should interpret them with well-informed caution.
The term high-stakes tests refers to cognitively loaded instruments designed to assess knowledge, skill, and ability with the intent of making employment, academic admission, graduation, or licensing decisions. For a number of public policy and political reasons, these testing programs face considerable scrutiny and criticism (Haney et al., 1993; Sackett, Schmitt, Ellingson, & Kabin, 2001). Such testing includes the SAT, Graduate Record Examination (GRE), state examinations that establish graduation requirements, and professional or job entry examinations. Such tests can provide very useful information but are also subject to misuse and a degree of tyranny in the sense that individuals' rights and welfare are easily lost in the face of corporate advantage and political struggles about accountability in education.
In May, 2001 the APA issued a statement on such testing titled "Appropriate Use of High Stakes Testing in Our Nation's Schools" (APA, 2001). The statement noted that the measurement of learning and achievement are important and that tests—when used properly—are among the most sound and objective ways to measure student performance. However, when tests' results are used inappropriately, they can have highly damaging unintended consequences. High-stakes decisions such as high school graduation or college admissions should not be made on the basis of a single set of test scores that only provide a snapshot of student achievement. Such scores may not accurately reflect a student's progress and achievement, and they do not provide much insight into other critical components of future success, such as motivation and character.
The APA statement recommends that any decision about a student's continued education, retention in grade, tracking, or graduation should not be based on the results of a single test. The APA statement noted that
• When test results substantially contribute to decisions made about student promotion or graduation, there should be evidence that the test addresses only the specific or generalized content and skills that students have had an opportunity to learn.
• When a school district, state, or some other authority mandates a test, the intended use of the test results should be clearly described. It is also the responsibility of those who mandate the test to monitor its impact—particularly on racial- and ethnic-minority students or students of lower socioeconomic status—and to identify and minimize potential negative consequences of such testing.
• In some cases, special accommodations for students with limited proficiency in English may be necessary to obtain valid test scores. If students with limited English skills are to be tested in English, their test scores should be interpreted in light of their limited English skills. For example, when a student lacks proficiency in the language in which the test is given (students for whom English is a second language, for example), the test could become a measure of their ability to communicate in English rather than a measure of other skills.
• Likewise, special accommodations may be needed to ensure that test scores are valid for students with disabilities. Not enough is currently known about how particular test modifications may affect the test scores of students with disabilities; more research is needed. As a first step, test developers should include students with disabilities in field testing of pilot tests and document the impact of particular modifications (if any) for test users.
• For evaluation purposes, test results should also be reported by sex, race-ethnicity, income level, disability status, and degree of English proficiency.
One adverse consequence of high-stakes testing is that some schools will almost certainly focus primarily on teaching-to-the-test skills acquisition. Students prepared in this way may do well on the test but find it difficult to generalize their learning beyond that context and may find themselves unprepared for critical and analytic thinking in their subsequent learning environments. Some testing companies such as the Educational Testing Service (developers of the SAT) at one time claimed that coaching or teaching to the test would have little meaningful impact and still publicly attempt to minimize the potential effect of coaching or teaching to the test.
The best rebuttal to such assertions is the career of Stanley H. Kaplan. A recent article in The New Yorker (Gladwell, 2001) documents not only Kaplan's long career as an entrepreneurial educator but also the fragility of so-called test security and how teaching strategies significantly improves test scores in exactly the way the industry claimed was impossible. When Kaplan began coaching students on the SAT in the 1950s and holding posttest pizza parties to debrief the students and learn about what was being asked, he was considered a kind of subverter of the system. Because the designers of the SAT viewed their work as developing a measure of enduring abilities (such as IQ), they assumed that coaching would do little to alter scores. Apparently little thought was given to the notion that people are affected by what they know and that what they know is affected by what they are taught (Gladwell, 2001). What students are taught is dictated by parents and teachers, and they responded to the high-stakes test by strongly supporting teaching that would yield better scores.
Psychologists teaching assessment have a unique opportunity to shape their students' professional practice and approach to ethics by modeling how ethical issues are actively integrated into the practice of assessment (Yalof & Brabender, 2001). Ethical standards in the areas of education and training are relevant. "Psychologists who are responsible for education and training programs take reasonable steps to ensure that the programs are designed to provide appropriate knowledge and proper experiences to meet the requirements for licensure, certification and other goals for which claims are made by the program" (ECTF, 2001). A primary responsibility is to ensure competence in assessment practice by providing the requisite education and training.
A recent review of studies evaluating the competence of graduate students and practicing psychologists in administration and scoring of cognitive tests demonstrates that errors occur frequently and at all levels of training (Alfonso & Pratt, 1997). The review also notes that relying only on practice assessments as a teaching methodology does not ensure competent practice. The authors conclude that teaching programs that include behavioral objectives and that focus on evaluating specific competencies are generally more effective. This approach is also more concordant with the APA guidelines for training in professional psychology (APA, 2000).
The use of children and students' classmates as practice subjects in psychological testing courses raises ethical concern (Rupert, Kozlowski, Hoffman, Daniels, & Piette, 1999). In other teaching contexts, the potential for violations of privacy are significant in situations in which graduate students are required to take personality tests for practice. Yalof and Brabender (2001) address ethical dilemmas in personality assessment courses with respect to using the classroom for in vivo training. They argue that the student's introduction to ethical decision making in personality assessment occurs in assessment courses with practice components. In this type of course, students experience firsthand how ethical problems are identified, addressed, and resolved. They note that the instructor's demonstration of how the ethical principles are highlighted and explored can enable students to internalize a model for addressing such dilemmas in the future. Four particular concerns are described: (a) the students' role in procuring personal experience with personality testing,
(b) identification of participants with which to practice,
(c) the development of informed consent procedures for assessment participants, and (d) classroom presentations. This discussion does not provide universally applicable concrete solutions to ethical problems; however, it offers a consideration of the relevant ethical principles that any adequate solution must incorporate.
Was this article helpful?