Informant errors may provide a serious threat to the validity of assessment. Potential errors include response sets, reactivity, social desirability, halo effect, implicit personality theory, and so forth. There are procedures that may help to reduce these errors, some of which are related to item construction and treatment of informants, whereas others pertain to the effect of data aggregation. The selection of certain formats (e.g., Q-sorts or forced choice), decreases the effect of individual response sets and the social desirability of items. Another approach is simply to vary the rating format. Guilford (1954) advises the use of blanks instead of numbers. Another way to minimize response sets is to let the informants rate one target at a time on all the items, as opposed to allowing the informants to rate all targets on each item before moving on to the next item (Kenny, 1994). Moreover, training of informants may increase the reliability and validity of assessment (Thornton & Zorich, 1980). Finally, researchers should be aware of motivational and tiring effects and use only a limited number of rating scales.
The calculation of aggregated ratings resulting from the use of multiple informants may also reduce rating errors. When two self-report measures are correlated, for instance, content and method are confounded. But when a self-report measure is correlated with an informant rating, shared method effects are unlikely—which is the obvious benefit of aggregation. The error minimizing effect of aggregation depends not only on the number of informants, but also on the level of consensus and the difference between anticipated and true correlations between the rated items. According to the Spearman-Brown formula, the reliability (and certainly, validity) of informant ratings can be increased by including additional informants whose rater biases are uncorrelated. Sometimes the aggregation across informants is much stronger than aggregation across occasions or test items. Although there may be negligible consensus between single informants, there may be nearly perfect consensus between large samples of informants (Epstein, 1983). Cheek (1982) demonstrated that the correlation between self-rating and informant rating could be considerably increased by aggregating the ratings of three informants instead of using single informants. In another influential study, Moskowitz and Schwarz (1982) showed that the correlation between global informant ratings and behavior could be markedly increased if the behavior is observed for a sufficient length of time, and the ratings are aggregated across multiple knowledgeable informants. The number of knowledgeable informants is limited, but when it is possible to use more than one informant, aggregation across ratings will decrease rating errors. The number of informants necessary to achieve valid composite ratings will depend on the ambiguity of the trait or behavioral construct, the base rates and the variability of the relevant behaviors, and the moderators of informant accuracy discussed above (Funder, 1999; Hayes & Dunning, 1997; Kenny, 1994; Moskowitz, 1986).
In contrast with self-assessments stemming from single self-reports, informant assessments may achieve higher reliability (and perhaps higher validity), because it is possible to obtain them from multiple informants (e.g., peers and family members). It could be argued that self-ratings of personality could be outnumbered and outperformed by the average other rating (i.e., the averaged informant rating; Hofstee, 1994). The aggregation effect possibly results for two reasons: the reduction in error variance and multiple informants having more information to provide than single informants. Taking both into account, Kolar et al. (1996) concluded on the basis of their study that the superiority of multiple informants does not guarantee the validity of single informant ratings, given that single informants usually achieve only slightly better predictive validity than single self-ratings. The most reliable source of information of a target's personality is thus neither to be found in his or her self-ratings, nor is it guaranteed by single informant ratings; rather, it is found in the consensus of the judgments from the community of the target's knowledgeable informants.
Aggregation across multiple informants should nevertheless be conducted with caution. Informants may use very different standards to make their judgments or they may know targets from very different contexts. Because informant judgments are not homogenous in such cases, the effect of aggregation will be small or negligible, revealing that—at least under such circumstances—single informant ratings could be more valid than aggregated ratings. However, in other cases, aggregated informant ratings could simply be more valid because they are more reliable, and appropriate psychometric corrections must be made to take data aggregation into account (Kenny, 1994). Researchers should therefore distinguish between the average correlation or the intraclass correlation between informants (reflecting the reliability of one average informant) and the internal consistency across informants such as coefficient alpha (reflecting the reliability of the average judgment). Some researchers prefer to report the average correlation because it does not depend on the number of informants (Kenny, 1993; Lucas & Baird, this volume, chap. 3).
Was this article helpful?