The process of collecting assessment information begins with a formulation of the purposes that the assessment is intended to serve. A clear sense of why an assessment is being conducted helps examiners select tests and other sources of information that will provide an adequate basis for arriving at useful conclusions and recommendations. Additionally helpful in planning the data collection process is attention to several examiner, respondent, and data management issues that influence the nature and utility of whatever findings are obtained.
Psychological assessments are instigated by referrals that pose questions about aspects of a person's psychological functioning or likely future behavior. When clearly stated and psychologically relevant, referral questions guide psychologists in determining what kinds of assessment data to collect, what considerations to address in examining these data, and what implications of their findings to emphasize in their reports. If referral questions lack clarity or psychological relevance, some reformulation is necessary to give direction to the assessment process. For example, a referral in a clinical setting that asks vaguely for personality evaluation or differential diagnosis needs to be specified in consultation with the referring person to identify why a personality evaluation is being sought or what diagnostic possibilities are at issue. Assessment in the absence of a specific referral question can result in a sterile exercise in which neither the data collection process nor the psychologist's inferences can be focused in a meaningful way.
Even when adequately specified, referral questions are not always psychological in nature. Assessors doing forensic work are frequently asked to evaluate whether criminal defendants were insane at the time of their alleged offense. Sanity is a legal term, however, not a psychological term. There are no assessment methods designed to identify insanity, nor are there any research studies in which being insane has been used as an independent variable. In instances of this kind, in order to help assessors plan their procedures and frame their reports, the referral must be translated into psychological terms, as in defining insanity as the inability to distinguish reality from fantasy.
As a further challenge in formulating assessment goals, specific and psychologically phrased referral questions may still lack clarity as a consequence of addressing complex and multidetermined patterns of behavior. In employment evaluations, for example, a referring person may want to know which of three individuals is likely to perform best in a position of leadership or executive responsibility. To address this type of question effectively, assessors must first be able to identify psychological characteristics that are likely to make a difference in the particular circumstances, as by proceeding, in this example, in the belief that being energetic, decisive, assertive, self-confident, and reasonably unflappable contribute to showing effective and responsible leadership.
Then the data collection process can be planned to measure these characteristics, and the eventual report can be focused on using them as a basis for recommending a hiring decision.
The multiple sources of assessment information previously noted include the results of formal psychological testing with standardized instruments; responses to questions asked in structured and unstructured interviews; observations of behavior in various types of contrived situations and natural settings; reports from relatives, friends, employers, and other collateral persons concerning an individual's previous life history and current characteristics and behavioral tendencies; and documents such as medical records, school records, and written reports of earlier assessments. Individual assessments vary considerably in the availability and utility of these diverse sources of information. Assessments may sometimes be based entirely on record reviews and collateral reports, because the person being assessed is unwilling to be seen directly by an examiner or is for some reason prevented from doing so. Some persons being assessed are quite forthcoming when interviewed but are reluctant to be tested; others find it difficult to talk about themselves but are quite responsive to testing procedures; and in still other cases, in which both interview and test data are ample, there may be a dearth of other information sources on which to draw.
There is little way to know before the fact which sources of information will prove most critical or valuable in an assessment process. What collateral informants say about a person in a particular instance may be more revealing and reliable than what the person says about him- or herself, and in some instances historical documents may prove more informative and dependable than either first-person or collateral reports. Behavioral observations and interview data may sometimes contribute more to an adequate assessment than standardized tests, or may even render testing superfluous; whereas in other instances formal psychological testing may reveal vital diagnostic information that would otherwise not have been uncovered.
The fact that psychological assessment can proceed effectively without psychological testing helps to distinguish between these two activities. The terms psychological assessment and psychological testing are sometimes used synonymously, as noted earlier, but psychological testing is only one among many sources of information that may be utilized in conducting a psychological assessment. Whereas testing refers to the administration of standardized measuring instruments, assessment involves multiple data collection procedures leading to the integration of information from diverse sources. Thus the data collection procedures employed in testing contribute only a portion of the information that is typically utilized in the complex decision-making process that constitutes assessment. This distinction between assessment and testing has previously been elaborated by Fernandez-Ballesteros (1997), Maloney and Ward (1976, chapter 3), and Matarazzo (1990), among others.
Nonetheless, psychological testing stands out among the data collection procedures employed in psychological assessment as the one most highly specialized, diverse, and in need of careful regulation. Psychological testing brings numerous issues to the assessment process, beginning with selection of an appropriate test battery from among an extensive array of available measuring instruments (see Conoley & Impara, 1995, and Fischer & Corcoran, 1994; see also chapters 18-24 of the present volume). The chief considerations that should determine the composition of a test battery are the psychometric adequacy of the measures being considered; the relevance of these measures to the referral questions being addressed; the likelihood that these measures will contribute incremental validity to the decision-making process; and the additive, confirmatory, and complementary functions that individual measures are likely to serve when used jointly.
As elaborated by Anastasi and Urbina (1997), in the Standards for Educational and Psychological Testing (AERA, et al., 1999, chapters 1, 2, & 5), and in the chapter by Wasserman and Bracken in this volume, the psychometric adequacy of an assessment instrument consists of the extent to which it involves standardized test materials and administration procedures, can be coded with reasonably good interscorer agreement, demonstrates acceptable reliability, has generated relevant normative data, and shows valid corollaries that serve the purposes for which it is intended. Assessment psychologists may at times choose to use tests with uncertain psychometric properties, perhaps for exploratory purposes or for comparison with a previous examination using these tests. Generally speaking, however, formal testing as part of a psychological assessment should be limited to standardized, reliable, and valid instruments for which there are adequate normative data.
The tests selected for inclusion in an assessment battery should provide information relevant to answering the questions that have been raised about the person being examined. Questions that relate to personality functions (e.g., What kind of approach in psychotherapy is likely to be helpful to this person?) call for personality tests. Questions that relate to educational issues (e.g., Does this student have a learning disability?) call for measures of intellectual abilities and academic aptitude and achievement. Questions that relate to neuropsychological functions (e.g., Are there indications of memory loss?) call for measures of cognitive functioning, with special emphasis on measures of capacities for learning and recall.
These examples of relevance may seem too obvious to mention. However, they reflect an important and sometimes overlooked guiding principle that test selection should be justifiable for each measure included in an assessment battery. Insufficient attention to justifying the use of particular measures in specific instances can result in two ill-advised assessment practices: (a) conducting examinations with a fixed and unvarying battery of measures regardless of what questions are being asked in the individual case, and (b) using favorite instruments at every opportunity even when they are unlikely to serve any central or unique purpose in a particular assessment. The administration of minimally useful tests that have little relevance to the referral question is a wasteful procedure that can result in warranted criticism of assessment psychologists and the assessment process. Likewise, the propriety of charging fees for unnecessary procedures can rightfully be challenged by persons receiving or paying for services, and the competence of assessors who give tests that make little contribution to answering the questions at issue can be challenged in such public forums as the courtroom (see Weiner, 2002).
Incremental validity in psychological assessment refers to the extent to which new information increases the accuracy of a classification or prediction above and beyond the accuracy achieved by information already available. Assessors pay adequate attention to incremental validity by collecting the amount and kinds of information they need to answer a referral question, but no more than that. In theory, then, familiarity with the incremental validity of various measures when used for certain purposes, combined with test selection based on this information, minimizes redundancy in psychological assessment and satisfies both professional and scientific requirements for justifiable test selection.
In practice, however, strict adherence to incremental validity guidelines often proves difficult and even disadvantageous to implement. As already noted, it is difficult to anticipate which sources of information will prove to be most useful. Similarly, with respect to which instruments to include in a test battery, there is little way to know whether the tests administered have yielded enough data, and which tests have contributed most to understanding the person being examined, until after the data have been collected and analyzed. In most practice settings, it is reasonable to conduct an interview and review previous records as a basis for deciding whether formal testing would be likely to help answer a referral question— that is, whether it will show enough incremental validity to warrant its cost in time and money. Likewise, reviewing a set of test data can provide a basis for determining what kind of additional testing might be worthwhile. However, it is rarely appropriate to administer only one test at a time, to choose each subsequent test on the basis of the preceding one, and to schedule a further testing session for each additional test administration. For this reason, responsible psychological assessment usually consists of one or two testing sessions comprising a battery of tests selected to serve specific additive, confirmatory, and complementary functions.
Additive, Confirmatory, and Complementary Functions of Tests
Some referral questions require selection of multiple tests to identify relatively distinct and independent aspects of a person's psychological functioning. For example, students receiving low grades may be referred for an evaluation to help determine whether their poor academic performance is due primarily to limited intelligence or to personality characteristics that are fostering negative attitudes toward achieving in school. A proper test battery in such a case would include some measure of intelligence and some measure of personality functioning. These two measures would then be used in an additive fashion to provide separate pieces of information, both of which would contribute to answering the referral question. As this example illustrates, the additive use of tests serves generally to broaden understanding of the person being examined.
Other assessment situations may create a need for confirmatory evidence in support of conclusions based on test findings, in which case two or more measures of the same psychological function may have a place in the test battery. Assessors conducting a neuropsychological examination to address possible onset of Alzheimer's disease, for example, ordinarily administer several memory tests. Should each of these tests identify memory impairment consistent with Alzheimer's, then from a technical standpoint, only one of them would have been necessary and the others have shown no incremental validity. Practically speaking, however, the multiple memory measures taken together provide confirmatory evidence of memory loss. Such confirmatory use of tests strengthens understanding and helps assessors present conclusions with confidence.
The confirmatory function of a multitest battery is especially useful when tests of the same psychological function measure it in different ways. The advantages of multimethod assessment of variables have long been recognized in psychology, beginning with the work of Campbell and Fiske (1959) and continuing with contemporary reports by the American Psychological Association's (APA's) Psychological Assessment Work Group, which stress the improved validity that results when phenomena are measured from a variety of perspectives (Kubiszyn et al., 2000; Meyer et al., 2001):
The optimal methodology to enhance the construct validity of nomothetic research consists of combining data from multiple methods and multiple operational definitions. . . . Just as effective nomothetic research recognizes how validity is maximized when variables are measured by multiple methods, particularly when the methods produce meaningful discrepancies . . . the quality of idiographic assessment can be enhanced by clinicians who integrate the data from multiple methods of assessment. (Meyer et al., p. 150)
Such confirmatory testing is exemplified in applications of the Minnesota Multiphasic Personality Inventory (MMPI, MMPI-2) and the Rorschach Inkblot Method (RIM), which are the two most widely researched and frequently used personality assessment instruments (Ackerman & Ackerman, 1997; Butcher & Rouse, 1996; Camara, Nathan, & Puente, 2000; Watkins, Campbell, Nieberding, & Hallmark, 1995). As discussed later in this chapter and in the chapters by Viglione and Rivera and by Ben-Porath in this volume, the MMPI-2 is a relatively structured self-report inventory, whereas the RIM is a relatively unstructured measure of perceptual-cognitive and associational processes (see also Exner, 2003; Graham, 2000; Greene, 2000; Weiner, 1998). Because of differences in their format, the MMPI-2 and the RIM measure normal and abnormal characteristics in different ways and at different levels of a person's ability and willingness to recognize and report them directly. Should a person display some type of disordered functioning on both the MMPI-2 and the RIM, this confirmatory finding becomes more powerful and convincing than having such information from one of these instruments but not other, even though technically in this instance no incremental validity derives from the second instrument.
Confirmatory evidence of this kind often proves helpful in professional practice, especially in forensic work. As described by Blau (1998), Heilbrun (2001), Shapiro (1991), and others, multiple sources of information pointing in the same direction bolsters courtroom testimony, whereas conclusions based on only one measure of some characteristic can result in assessors' being criticized for failing to conduct a thorough examination.
Should multiple measures of the same psychological characteristics yield different rather than confirmatory results, these results can usually serve valuable complementary functions in the interpretive process. At times, apparent lack of agreement between two purported measures of the same characteristic has been taken to indicate that one of the measures lacks convergent validity. This negative view of divergent test findings fails to take adequate cognizance of the complexity of the information provided by multimethod assessment and can result in misleading conclusions. To continue with the example of conjoint MMPI-2 and RIM testing, suppose that a person's responses show elevation on indices of depression on one of these measures but not the other. Inasmuch as indices on both measures have demonstrated some validity in detecting features of depression, the key question to ask is not which measure is wrong in this instance, but rather why the measures have diverged.
Perhaps, as one possible explanation, the respondent has some underlying depressive concerns that he or she does not recognize or prefers not to admit to others, in which case depressive features might be less likely to emerge in response to the self-report MMPI-2 methodology than on the more indirect Rorschach task. Or perhaps the respondent is not particularly depressed but wants very much to give the impression of being in distress and needing help, in which case the MMPI-2 might be more likely to show depression than the RIM. Or perhaps the person generally feels more relaxed and inclined to be forthcoming in relatively structured than relatively unstructured situations, and then the MMPI-2 is more likely than the RIM to reveal whether the person is depressed.
As these examples show, multiple measures of the same psychological characteristic can complement each other when they diverge, with one measure sometimes picking up the presence of a characteristic (a true positive) that is missed by the other (a false negative). Possible reasons for the false negative can contribute valuable information about the respondent's test-taking attitudes and likelihood of behaving differently in situations that differ in the amount of structure they provide. The translation of such divergence between MMPI-2 and RIM findings into clinically useful diagnostic inferences and individual treatment planning is elaborated by Finn (1996) and Ganellen (1996). Whatever measures may be involved in weighing the implications of divergent findings, this complementary use of test findings frequently serves to deepen understanding gleaned from the assessment process.
The amount and kind of data collected in psychological assessments depend in part on two issues concerning the examiners who conduct these assessments. The first issue involves the qualifications and competence of examiners to utilize the procedures they employ, and the second has to do with ways in which examiners' personal qualities can influence how different kinds of people respond to them.
There is general consensus that persons who conduct psychological assessments should be qualified by education and training to do so. The Ethical Principles and Code of Conduct promulgated by the APA (1992) offers the following general guideline in this regard: "Psychologists provide services, teach, and conduct research only within the boundaries of their competence, based on their education, training, supervised experience, or appropriate professional experience" (Ethical Code 1.04[a]). Particular kinds of knowledge and skill that are necessary for test users to conduct adequate assessments are specified further in the Test User Qualifications endorsed by the APA (2001). Finally of note with respect to using tests in psychological assessments, the Standards for Educational and Psychological Testing (AERA et al., 1999) identify who is responsible for the proper use of tests: "The ultimate responsibility for appropriate test use and interpretation lies predominantly with the test user. In assuming this responsibility, the user must become knowledgeable about a test's appropriate uses and the populations for which it is suitable" (p. 112).
Despite the clarity of these statements and the considerable detail provided in the Test User Qualifications, two persistent issues in contemporary assessment practice remain unresolved. First, adequate psychological testing qualifications are typically inferred for any examiners holding a graduate degree in psychology, being licensed in their state, and presenting themselves as competent to practice psychological assessment. Until such time as the criteria proposed in the Test User Qualifications become incorporated into formal accreditation procedures, qualification as an assessor will continue to be conferred automatically on psychologists obtaining licensure. Unfortunately, being qualified by license to use psychological tests does not ensure being competent in using them. Being competent in psychological testing requires familiarity with the latest revision of whatever instruments an assessor is using, with current research and the most recent normative data concerning these instruments, and with the manifold interpretive complexities they are likely to involve. Assessment competence also requires appreciation for a variety of psychometric, interpersonal, sociocultural, and contextual issues that affect not only the collection but also the interpretation and utilization of assessment information (see Sandoval, Frisby, Geisinger, & Scheuneman, 1990). The chapters that follow in this volume bear witness to the broad range of these issues and to the steady output of new or revised measures, research findings, and practice guidelines that make assessment psychology a dynamic and rapidly evolving field with a large and burgeoning literature. Only by keeping reasonably current with these developments can psychological assessors become and remain competent, and only by remaining competent can they fulfill their ethical responsibilities (Kitchener, 2000, chapter 9; Koocher & Keith-Spiegel, 1998; Weiner, 1989).
The second persistent issue concerns assessment by persons who are not psychologists and are therefore not bound by this profession's ethical principles or guidelines for practice. Nonpsychologist assessors who can obtain psychological tests are free to use them however they wish. When easily administered measures yield test scores that seem transparently interpretable, as in the case of an elevated Borderline scale on the Millon Multiaxial Clinical Inventory-III (MCMI-III; Choca, Shanley, & Van Denberg, 1997) or an elevated Acquiescence scale on the Holland Vocational Preference Inventory (VPI; Holland, 1985), unqualified examiners can draw superficial conclusions that take inadequate account of the complexity of these instruments, the interactions among their scales, and the limits of their applicability. It accordingly behooves assessment psychologists not only to maintain their own competence, but also to call attention in appropriate circumstances to assessment practices that fall short of reasonable standards of competence.
Assessors can influence the information they collect by virtue of their personal qualities and by the manner in which they conduct a psychological examination. In the case of self-administered measures such as interest surveys or personality questionnaires, examiner influence may be minimal. Interviews and interactive testing procedures, on the other hand, create ample opportunity for an examiner's age, gender, ethnicity, or other characteristics to make respondents feel more or less comfortable and more or less inclined to be forthcoming. Examiners accordingly need to be alert to instances in which such personal qualities may be influencing the nature and amount of the data they are collecting.
The most important personal influence that examiners cannot modify or conceal is their language facility. Psychological assessment procedures are extensively language-based, either in their content or in the instructions that introduce nonverbal tasks, and accurate communication is therefore essential for obtaining reliable assessment information. It is widely agreed that both examiners and whomever they are interviewing or testing should be communicating either in their native language or in a second language in which they are highly proficient (AERA et al., 1999, chapter 9). The use of interpreters to circumvent language barriers in the assessment process rarely provides a satisfactory solution to this problem. Unless an interpreter is fully conversant with idiomatic expressions and cultural referents in both languages, is familiar with standard procedures in psychological assessment, and is a stranger to the examinee (as opposed to a friend, relative, or member of the same closely knit subcultural community), the obtained results may be of questionable validity. Similarly, in the case of self-administered measures, instructions and test items must be written in a language that the respondent can be expected to understand fully. Translations of pencil-and-paper measures accordingly require close attention to the idiomatic vagaries of each new language and to culture-specific contents of individual test items, in order to ensure equivalence of measures in the cross-cultural applications of tests (Allen & Walsh, 2000; Dana, 2000a).
Unlike their fixed qualities, the manner in which examiners conduct the assessment process is within their control, and untoward examiner influence can be minimized by appropriate efforts to promote full and open response to the assessment procedures. To achieve this end, an assessment typically begins with a review of its purposes, a description of the procedures that will be followed, and efforts to establish a rapport that will help the person being evaluated feel comfortable and willing to cooperate with the assessment process. Variations in examiner behavior while introducing and conducting psychological evaluations can substantially influence how respondents perceive the assessment situation—for example, whether they see it as an authoritarian investigative process intended to ferret out defects and weaknesses, or as a mutually respectful and supportive interaction intended to provide understanding and help. Even while following closely the guidelines for a structured interview and adhering faithfully to standardized procedures for administering various tests, the examiner needs to recognize that his or her manner, tone of voice, and apparent attitude are likely to affect the perceptions and comfort level of the person being assessed and, consequently, the amount and kind of information that person provides (see Anastasi & Urbina, 1977; Masling, 1966,1998).
Examiner influence in the assessment process inevitably interacts with the attitudes and inclinations of the person being examined. Some respondents may feel more comfortable being examined by an older person than a younger one, for example, or by a male than a female examiner, whereas other respondents may prefer a younger and female examiner. Among members of a minority group, some may prefer to be examined by a person with a cultural or ethnic background similar to theirs, whereas others are less concerned with the examiner's background than with his or her competence. Similarly, with respect to examiner style, a passive, timid, and dependent person might feel comforted by a warm, friendly, and supportive examiner approach that would make an aloof, distant, and mistrustful person feel uneasy; conversely, an interpersonally cautious and detached respondent might feel safe and secure when being examined in an impersonal and businesslike manner that would be unsettling and anxiety provoking to an inter-personally needy and dependent respondent. With such possibilities in mind, skilled examiners usually vary their behavioral style with an eye to conducting assessments in ways that will be likely to maximize each individual respondent's level of comfort and cooperation.
Two other respondent issues that influence the data collection process concern a person's right to give informed consent to being evaluated and his or her specific attitudes toward being examined. With respect to informed consent, the introductory phase of conducting an assessment must ordinarily include not only the explanation of purposes and procedures mentioned previously, which informs the respondent, but also an explicit agreement by the respondent or persons legally responsible for the respondent to undergo the evaluation. As elaborated in the Standards for Educational and Psychological Testing (AERAet al., 1999), informed consent can be waived only when an assessment has been mandated by law (as in a court-ordered evaluation) or when it is implicit, as when a person applies for a position or opportunity for which being assessed is a requirement (i.e., a job for which all applicants are being screened psychologically; see also Kitchener, 2000, and the chapters by Geisinger and by Koocher and Rey-Casserly in this volume). Having given their consent to be evaluated, moreover, respondents are entitled to revoke it at any time during the assessment process. Hence, the prospects for obtaining adequate assessment data depend not only on whether respondents can be helped to feel comfortable and be forthcoming, but even more basically on whether they consent in the first place to being evaluated and remain willing during the course of the evaluation.
Issues involving a respondent's specific attitudes toward being examined typically arise in relation to whether the assessment is being conducted for clinical or for administrative purposes. When assessments are being conducted for clinical purposes, the examiner is responsible to the person being examined, the person being examined is seeking some type of assistance, and the examination is intended to be helpful to this person and responsive to his or her needs. As common examples in clinical assessments, people concerned about their psychological well-being may seek an evaluation to learn whether they need professional mental health care, and people uncertain about their educational or vocational plans may want look for help in determining what their abilities and interests suit them to do. In administrative assessments, by contrast, examiners are responsible not to the person being examined, but to some third party who has requested the evaluation to assist in arriving at some judgment about the person. Examiners in an administrative assessment are ethically responsible for treating the respondent fairly and with respect, but the evaluation is being conducted for the benefit of the party requesting it, and the results may or may not meet the respondent's needs or serve his or her best interests. Assessment for administrative purposes occurs commonly in forensic, educational, and organizational settings when evaluations are requested to help decide such matters as whether a prison inmate should be paroled, a student should be admitted to a special program, or a job applicant should be hired (see Monahan, 1980).
As for their attitudes, respondents being evaluated for clinical purposes are relatively likely to be motivated to reveal themselves honestly, whereas those being examined for administrative purposes are relatively likely to be intent on making a certain kind of impression. Respondents attempting to manage the impression they give are likely to show themselves not as they are, but as they think the person requesting the evaluation would view favorably. Typically such efforts at impression management take the form of denying one's limitations, minimizing one's shortcomings, attempting to put one's very best foot forward, and concealing whatever might be seen in a negative light. Exceptions to this general trend are not uncommon, however. Whereas most persons being evaluated for administrative purposes want to make the best possible impression, some may be motivated in just the opposite direction. For example, a plaintiff claiming brain damage in a personal injury lawsuit may see benefit in making the worst possible impression on a neuropsychological examination. Some persons being seen for clinical evaluations, despite having come of their own accord and recognizing that the assessment is being conducted for their benefit, may nevertheless be too anxious or embarrassed to reveal their difficulties fully. Whatever kind of impression respondents may want to make, the attitudes toward being examined that they bring with them to the assessment situation can be expected to influence the amount and kind of data they produce. These attitudes also have a bearing on the interpretation of assessment data, and the further implications of impression management for malingering and defensiveness are discussed later in the chapter.
A final set of considerations in collecting assessment information concerns appropriate ways of managing the data that are obtained. Examiners must be aware in particular of issues concerning the use of computers in data collection; the responsibility they have for safeguarding the security of their measures; and their obligation, within limits, to maintain the confidentiality of what respondents report or reveal to them.
Software programs are available to facilitate the data collection process for most widely used assessment methods. Programs designed for use with self-report questionnaires typically provide for online administration of test items, automated coding of item responses to produce scale scores, and quantitative manipulation of these scale scores to yield summary scores and indices. For instruments that require examiner administration and coding (e.g., a Wechsler intelligence test), software programs accept test scores entered by the examiner and translate them into the test's quantitative indices (e.g., the Wechsler IQ and Index scores). Many of these programs store the test results in files that can later be accessed or exported, and some even provide computational packages that can generate descriptive statistics for sets of test records held in storage.
These features of computerized data management bring several benefits to the process of collecting assessment information. Online administration and coding of responses help respondents avoid mechanical errors in filling out test forms manually, and they eliminate errors that examiners sometimes make in scoring these responses (see Allard & Faust, 2000). For measures that require examiner coding and data entry, the utility of the results depends on accurate coding and entry, but once the data are entered, software programs eliminate examiner error in calculating summary scores and indices from them. The data storage features of many software programs facilitate assessment research, particularly for investigators seeking to combine databases from different sources, and they can also help examiners meet requirements in most states and many agencies for keeping assessment information on file for some period of time. For such reasons, the vast majority of assessment psychologists report that they use software for test scoring and feel comfortable doing so (McMinn, Ellens, & Soref, 1999).
Computerized collection of assessment information has some potential disadvantages as well, however. When assessment measures are administered online, first of all, the reliability of the data collected can be compromised by a lack of equivalence between an automated testing procedure and the noncomputerized version on which it is based. As elaborated by Butcher, Perry, and Atlis (2000), Honaker and Fowler (1990), and Snyder (2000) and discussed in the chapter by
Butcher in the present volume, the extent of such equivalence is currently an unresolved issue. Available data suggest fairly good reliability for computerized administrations based on pencil-and-paper questionnaires, especially those used in personality assessment. With respect to the MMPI, for example, a meta-analysis by Finger and Ones (1999) of all available research comparing computerized with booklet forms of the instrument has shown them to be psychometrically equivalent. On the other hand, good congruence with the original measures has yet to be demonstrated for computerized versions of structured clinical interviews and for many measures of visual-spatial functioning used in neuropsychological assessment. Among software programs available for test administration, moreover, very few have been systematically evaluated with respect to whether they obtain exactly the same information as would emerge in a standard administration of the measure on which they are based.
A second potential disadvantage of computerized data collection derives from the ease with which it can be employed. Although frequently helpful to knowledgeable assessment professionals and thus to the persons they examine, automated procedures also simplify psychological testing for untrained and unqualified persons who lack assessment skills and would not be able to collect test data without the aid of a computer. The availability of software programs thus creates some potential for assessment methods to be misused and respondents to be poorly served. Such outcomes are not an inescapable by-product of computerized assessment procedures, however. They constitute instead an abuse of technology by uninformed and irresponsible persons.
Test security refers to restricting the public availability of test materials and answers to test items. Such restrictions address two important considerations in psychological assessment. First, publicly circulated information about tests can undermine their validity, particularly in the case of measures comprising items with right and wrong or more or less preferable answers. Prior exposure to tests of this kind and information about correct or preferred answers can affect how persons respond to them and prevent an examiner from being able to collect a valid protocol. The validity of test findings is especially questionable when a respondent's prior exposure has included specific coaching in how to answer certain questions. As for relatively unstructured assessment procedures that have no right or wrong answers, even on these measures various kinds of responses carry particular kinds of interpretive significance. Hence, the possibility exists on relatively unstructured measures as well that persons intent on making a certain kind of impression can be helped to do so by pretest instruction concerning what various types of responses are taken to signify. However, the extent to which public dissemination of information about the inferred meaning of responses does in fact compromise the validity of relatively unstructured measures has not yet been examined empirically and is a subject for further research.
Second, along with helping to preserve the validity of obtained results, keeping assessment measures secure protects test publishers against infringement of their rights by pirated or plagiarized copies of their products. Ethical assessors respect copyright law by not making or distributing copies of published tests, and they take appropriate steps to prevent test forms, test manuals, and assessment software from falling into the hands of persons who are not qualified to use them properly or who feel under no obligation to keep them secure. Both the Ethical Principles and Code of Conduct (APA, 1992, Section 2.10) and the Standards for Educational and Psychological Testing (AERA et al., 1999, p. 117) address this professional responsibility in clear terms.
These considerations in safeguarding test security also have implications for the context in which psychological assessment data are collected. Assessment data have become increasingly likely in recent years to be applied in forensic settings, and litigious concerns sometimes result in requests to have a psychological examination videotaped or observed by a third party. These intrusions on traditional examination procedures pose a threat to the validity of the obtained data in two respects. First, there is no way to judge or measure the impact of the videotaping or the observer on what the respondent chooses to say and do. Second, the normative standards that guide test interpretation are derived from data obtained in two-person examinations, and there are no comparison data available for examinations conducted in the presence of a camera or an observer. Validity aside, exposure of test items to an observer or through a videotape poses the same threat to test security as distributing test forms or manuals to persons who are under no obligation to keep them confidential. Psychological assessors may at times decide for their own protection to audiotape or videotape assessments when they anticipate legal challenges to the adequacy of their procedures or the accuracy of their reports. They may also use recordings on occasion as an alternative to writing a long and complex test protocol verbatim. For purposes of test security, however, recordings made for other people to hear or see, like third-party observers, should be avoided.
A third and related aspect of appropriate data management pertains to maintaining the confidentiality of a respondent's assessment information. Like certain aspects of safeguarding test security, confidentiality is an ethical matter in assessment psychology, not a substantive one. The key considerations in maintaining the confidentiality of assessment information, as specified in the Ethical Principles and Code of Conduct (APA, 1992, Section 5) and elaborated by Kitchener (2000, chapter 6) involve (a) clarifying the nature and limits of confidentiality with clients and patients prior to undertaking an evaluation; (b) communicating information about persons being evaluated only for appropriate scientific or professional purposes and only to an extent relevant to the purposes for which the evaluation was conducted; (c) disclosing information only to persons designated by respondents or other duly authorized persons or entities, except when otherwise permitted or required by law; and (d) storing and preserving respondents' records in a secure fashion. Like the matter of informed consent discussed previously, confidentiality is elaborated as an ethical issue in the chapter by Koocher and Rey-Casserly in this volume.
Was this article helpful?
The comprehensive new ebook All About Alzheimers puts everything into perspective. Youll gain insight and awareness into the disease. Learn how to maintain the patients emotional health. Discover tactics you can use to deal with constant life changes. Find out how counselors can help, and when they should intervene. Learn safety precautions that can protect you, your family and your loved one. All About Alzheimers will truly empower you.