The consistency of differences and thus the reliability and validity of assessment methods can be increased by aggregation (Epstein, 1986; Steyer & Schmitt, 1990). The principle of aggregation is an integral part of lay epistemics and used intuitively in many life domains for neutralizing sources of inconsistency that are deemed irrelevant. Aggregation is used in sports, education, professional evaluation, and democratic elections of political leaders. The logic of aggregation follows directly from multidetermination. If different behaviors are caused partly by a common factor and partly by unique factors, each behavior is a poor measure of the common factor. Averaging behaviors reduces the impact of the unique factors, whereas the impact of the common factor remains the same. The average behavior therefore reflects the common factor more than it reflects any of the unique factors. As a consequence, the average behavior measures the common factor better than it measures the unique factors. This principle is an integral part of Classical Test Theory and the reason why the reliability of tests depends on their length (Brown, 1910; Lord & Novick, 1968; Spearman, 1910).
Choosing appropriate facets of aggregation is a matter of substantive interest. In personality research, we hope to measure individual differences. We want to discriminate on the person facet, whereas differences on other facets are of less substantive interest. Consequently, aggregation across time, situations, types, modes, and methods is appropriate (Epstein, 1986). In general psychology, we want to identify generalized differences between situations. Differences on other facets are irrelevant. Accordingly, aggregation across individuals and other facets is appropriate. The same rationale applies to all other facets of the data box including the methods facet.
Note, however, that the irrelevant facets across which aggregation occurs must not be correlated (confounded) with the facet on which we want to discriminate. Consider the person and the situation facet of our helpfulness example. If we observe neighbor A only in situations where help is easy and neighbor B only in situations where help is effortful, we would overestimate As helpfulness and underestimate B's. Just like confounded factors in experimental and quasi-experimental designs damage their internal validity, confounding diagnosti-cally relevant facets with irrelevant facets damages the construct validity of measures (Messick, 1989; Shadish et al., 2002).
Was this article helpful?