The Criterion Problem Of Accurate Informant Assessment

In terms of Brunswik's lens model, accurate perception is characterized by the convergence of cue validity and cue utilization. When researchers examine the accuracy of informant assessment, they inevitably face the criterion problem. Kruglanski (1989b) discussed three distinct notions of accuracy criteria used throughout the literature (i.e., the correspondence between a judgment and one or more independent indicators of the psychological construct, interpersonal consensus, and pragmatic utility). The first two meanings of accuracy appear most commonly, although they still leave researchers with the difficult challenge to convincingly justify their choice of criterion. Kenny (1994) proposed a general taxonomy of such correspondence-based criterion measures, which are either implicitly or explicitly used in informant assessment: (a) self-reports, (b) consensus, (c) expert ratings, (d) behavioral observations, and (e) operational criteria.

Self-reports can focus on personality traits, preferences, internal states, and cognitions, etc. Researchers frequently validate self-ratings and informant-ratings against each other and use the convergence of self-ratings and informant ratings

(i.e., self-other agreement) as one indicator of accuracy (e.g., Borkenau & Liebler, 1992; Funder, 1995), whereas others warn against using self-ratings as accuracy criteria, because self-reports may be invalid for several reasons (Kenny, 1994). First, self-reports may be biased because of social desirability and self-enhancement tendencies. Second, in some instances informants may have more privileged access to information than the self, or vice versa. Third, when the informant and the target are acquainted, the target may influence the informant with his or her standing on the trait. Nevertheless, self-ratings may be a valid criterion of informant accuracy if informant assessment is used to determine the subjective self-concept of one's personality.

Consensus refers to the agreement between two or multiple informants and is frequently observed to reach considerable levels (e.g., Ambady & Rosenthal, 1992; Borkenau, Mauer, Rieman, Spinath, & Angleitner, 2004; Borkenau, Riemann, Angleitner, & Spinath, 2001; Kenrick & Stringfield, 1980; Malloy & Albright, 1990; Paunonen & Jackson, 1987). In a general sense, accuracy implies consensus, and some researchers view consensus as a prerequisite of accuracy, rather than accuracy as a prerequisite of consensus. According to Funder (1995), consensus may be a necessary condition of accuracy, if accuracy is conceptualized in realistic terms. If parents, siblings, and teachers agree on an adolescent's level of introversion, for instance, the mean impression of these informants may converge with how this adolescent really behaves with peers or strangers. Although informants may certainly agree on a target, informants may not reach consensus, even though each informant may be partially accurate (e.g., the adolescent may be judged as cool by his peers, whereas his parents see him as irritable and anxious). Although these views are inconsistent, both are accurate in the contexts in which they were observed. Therefore, according to Kenny (1991), consensus is neither a necessary nor a sufficient condition for accuracy.

Expert ratings, obtained by professionals (e.g., teachers, clinicians, superiors, subordinates, colleagues, etc.) are used when a professional person, by definition, is judged to know the true state or disposition of the target under study. However, experts are not necessarily more useful than knowledgeable informants, because expert judgments also need validation. Thus the issue of accuracy of experts is as unresolved as the issue of accuracy of knowledgeable informants; why in a strict sense, expert ratings provide a criterion of consensus rather than accuracy in terms of Brunswik's model. In addition, a single expert might not exist to serve as the perfect criterion, and some experts hold more "expertise" than others, especially when studying a highly domain-specific trait or behavior.

Behavioral observation is often considered as the king's road to estimate a target's true trait, because it relies more on concretely coded or categorized behaviors instead of on vague judgments. The disadvantage of behavior observation as a criterion of accuracy is, however, related to its high costs in terms of time and methodology and its poor retest-consistency (Kenny, 1994). Although behavior observations can be improved by establishing high interrater reliability and the employment of objectively defined rating scales, in the end, behavioral observations strongly depend on situational factors and may be therefore conceived as arbitrary. Nevertheless, some important studies have shown that personality judgments by knowledgeable informants could yield substantial behavioral prediction (e.g., Funder & Colvin, 1991; Moskowitz & Schwarz, 1982). The epistemic relationship between behavior observation and accuracy is different from the relationship between consensus and accuracy: Whereas accuracy generally (albeit not always) implies consensus but consensus does not imply accuracy, the relation is reversed in behavior observation. An informant judgment can certainly hold accuracy regarding a particular observed behavior, but as noticed by Funder (1999, p. 106), a judgment that does not predict a particular behavior may still show accuracy toward predicting other behaviors.

Operational criteria can be useful if the criterion is known directly by definition (e.g., job performance or diagnostic criteria of psychological disorders). Such operational criteria can be also defined through experimental manipulation, as it is often used in lie detection and deception research (DePaulo, Lindsay, Malone, Muhlenbruck, Charlton,

& Cooper, 2003). According to Kenny (1994), operational criteria are less useful in determining the validity of personality ratings because it seems difficult to think of operational criteria. Nevertheless, some progress has been made, for instance, by the act frequency approach of personality, which maintains that personality crystallizes in the frequency of behavioral acts in the past (Buss & Craik, 1983). Extraversión ratings, for instance, could be validated by the number of sociable acts, whereas agreeableness could be reflected in the frequency of conflict at the workplace.

In general, self-other agreement and consensus are the most frequently used strategies of measuring the accuracy of informant assessment. Informant accuracy certainly requires self-other agreement and consensus. In a very strict sense, however, consensus and self-other agreement refer to the consistency of ratings and thus pertain to the issue of reliability that can be increased by the use of multiple informants, which in turn may increase validity in terms of behavior prediction (McCrae, 1994; Moskowitz & Schwarz, 1982). Although there are similarities between consensus and self-other agreement, there are also empirical and theoretical differences (e.g., John & Robins, 1993; Kenny, 1994; Kenrick & Stringfield, 1980). Informant ratings can also be aggregated across multiple informants, which obviously is impossible with self-ratings, and informant ratings are sometimes found to be more predictive of actual behavior than self-ratings (e.g., Kolar, Funder, & Colvin, 1996). It could therefore be argued that—contrary to a naive appreciation of self-ratings as being more valid than other ratings— informants' ratings were generally as, or even more, valid in terms of behavior prediction. Only very few studies have addressed this question and asserted that informant ratings are sometimes more predictive of actual behavior than self-ratings (e.g., John & Robins, 1993; Levesque & Kenny, 1993). Although the evidence is not very strong, informant ratings are slightly more valid if highly evaluative traits are assessed (e.g., physical attractiveness or charm, which are traits that can only be known via impression on others). In contrast, self-ratings may be more predictive regarding inner emotional states, which are only made known to others if the self shares them or accidentally gives a clue about his or her emotion.

