Statistical issues primarily deal with the calculation of accuracy in terms of self-other agreement or consensus. The potential artifacts identified by Cronbach can be estimated easily when multiple informant ratings are replicated across multiple targets. The social relations model provides unique methods that allow separating the very different components of informant ratings, (i.e., genuine elevation, differential elevation, stereotype accuracy, and differential accuracy; see Figure 4.1). The SRM is only applicable when using a round-robin or mixed block design. With designs in which each group member is an informant and a target at the same time (e.g., in families or peer groups), the SRM is certainly the method of choice. In most cases, however, researchers use a set of informants who rate a set of targets for one or more items or traits, respectively. In these cases, the components cannot be isolated, although they might be controlled. Confounds caused by general response set effects, for instance, can be avoided by using correlational measures of consensus or self-other agreement, whereas artifacts caused by differential stereotype ratings are more difficult to control. As a general rule, intraclass correlations should be used instead of Pearson correlations, especially when informant pairs are interchangeable (Shrout & Fleiss, 1979). All correlational measures of consensus, self-other agreement, and accuracy can be derived from generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972). Two general correlational methods can be distinguished, item-level and profile correlations.
Item-level correlations. Item-level correlations are computed separately for each rating item, across all informant-target (self-other agreement) or targettarget pairs (consensus). The item-level correlation has an advantage in that it removes genuine elevation and stereotype accuracy, although individual differences in response tendencies (e.g., differential elevation) may still lead to complications. Differential elevation can be controlled for by standardizing the data within ratings or by using Q-sort procedures (Bernieri et al., 1994). In addition, there are at least three more complications with item-level correlations. First, item correlations describe con sensus or self-other agreement across either a set of informants or targets rather than the accuracy for individual targets or informants. Therefore, it is difficult to study differences between pairs of informants or differences between self-informant pairs, although moderator effects can be analyzed by moderated multiple regression (see Bernieri et al., 1994), and correlations can be decomposed into individual consistencies (where the correlation can be interpreted as the mean of individual consistencies, i.e., differences between squared ^-scores of raters; see Asendorpf, 1991). Second, with nested designs where each target has a unique "nested" set of informants, the effect of differential stereotype accuracy arises if differential stereotypes of informants are systematically correlated with characteristics of particular targets (Funder, 1999). Also, elevation cannot be removed in nested designs, because the elevation components may vary across groups (Kenny, 1993). Third, assumed similarity, or alternatively, projection, may also lead to artifactual accuracy if informants and targets are similar for genetic or acquaintanceship reasons, which in turn leads informants to judge themselves instead of the targets. Whereas there is no doubt on the emergence of assumed similarity effects, its effects on accuracy measures as either artifactual or valid are still controversially debated (e.g., Funder, 1999; Funder et al., 1995; Neyer et al., 1999; Stinson & Ickes, 1992; Watson et al., 2000).
Profile correlations. The profile correlation assesses the similarity between the complete set of judgments made by one informant and another informant or the self, respectively. This procedure is mostly used with Q-sort data, and typically yields as many correlations (or partial correlations) as informant pairs or informant-target pairs are included in the study. When using profile correlations, however, researchers should be aware of reflection and stereotypes (Kenny, 1993). Reflection can lead to inflated correlations and occurs when researchers fail to reverse negatively poled items within a profile of positive ones. If the rating profiles of neuroticism items are correlated, for instance, each item should be scored consistently (Kraemer, 1984). Whereas genuine or differential elevation effects are negligible with profile correlations, stereotypes may inflate the correlations because the means of the traits are likely to vary. Thus, the correlations between trait profiles become greater to the degree that a particular target has a typical personality profile and the informant is accurately using this prototypical profile.
It is possible to partial out the stereotype profile from the criterion or from the informant rating (e.g., by subtracting the mean across judges from each trait rating, which also corrects the bias that results from failure to reverse items), or by partialling out the mean profiles from each of the informant's rating profiles (Funder, 1999; Kenny & Acitelli, 1994). However, there are several points that need to be considered with partial or semipar-tial correlations. First, because average self-ratings and average informant ratings are likely to be correlated, the issue of partial versus semipartial correlation is usually of little interest. Second, partial correlations stemming from residual scores are less reliable than nonadjusted correlations. Third, partial correlations may remove true information along with error, because stereotypes may at least in part contain valid information. Especially the corrected ratings of targets, whose true scores resemble what one may call the average person, will receive less significant levels of accurate judgments. Therefore, a blind trust in partial correlations is not advisable.
Was this article helpful?