A message in several chapters of this volume is that investigators often have become discouraged with using multiple methods because such measures often do not converge, with measures of the same constructs correlating at low levels with each other and varying in different patterns with external variables. Burns and Haynes (chap. 27, this volume) found that many measures contain more source than trait variance. Indeed, Roberts et al. (chap. 22, this volume) find the lack of convergence between measures to be so discouraging that they suggest that multiple measures have been oversold, and that perhaps we do not need to use them. If one examines the multimeasure studies that exist in the literature, it is clear that measures of the same constructs based on different methods often correlate at disappointing levels. Only when measures depend on the same method do they sometimes correlate at moderate to strong levels. Why, then, do we persist in our insistence that multiple measures are crucial for scientific advances?
One reason, discussed later, for the centrality of multiple measures in our thinking is that every measurement method, even the best ones, possesses substantial shortcomings and limitations. Thus, by using different methods with different limitations, researchers can eliminate specific artifacts from their conclusions because the artifacts are unlikely to influence all the diverse measures they use. Another reason to use multiple measures is that one can better estimate the underlying construct by using several measures, each of which is influenced by that construct but also by other factors as well. For example, if we measure altruism by asking people to donate their plasma to the blood drive, we have a measure that is influenced by altruism, but also influenced by curiosity and interest in medicine, by having a hemophiliac in one's family, and by one's past medical experiences. If, however, we obtain several additional and different measures of altruism, such as helping a person who has dropped her books, donating money to a child welfare fund, and volunteering to work on a Walkathon to collect money for AIDS research, we hope that this aggregate of measures represents the latent construct of altruism. Ahadi and Diener (1989) made this point over a decade ago—that no single behavior ever represents the influence of a single construct—and Schmitt (chap. 2, this volume) forcefully makes this point again.
But what of the fear that our measures will not converge and may even show different patterns with external variables? Our answer is that this can be discouraging at first, but it can be an excellent aid to scientific insight. In this situation we will realize that our concept might be overly simple, or that our measures might be contaminated. Lack of convergence can be disheartening at first, but can represent a wonderful opportunity for scientific progress. Let us examine several examples. Imagine that we measure well-being through people's global reports of their happiness, as well as through the reports of friends, and with an experience-sampling measure in which people report their moods at random moments over 2 weeks. Imagine, too, that we want to determine whether our respondents are happier if they obtain more physical exercise. The outcomes of this study might discourage the investigator because the three types of measures correlate with each other only .30, and only the experience-sampling measure shows a correlation with exercise. At this point the researcher is likely to wish that two of the measures would just disappear. But what we have here is an opportunity to understand something about the well-being measures, as well as the way in which exercise might influence well-being. For example, perhaps exercise influences mood in the short term, but not long term, and the informant and global reports are not sensitive enough to pick up this effect. However, the informant reports might correlate more highly with job performance than the other two measures because they represent how happy the person appears to others. Finally, the global reports of happiness might represent a person's self-concept to a greater degree than the other two measures, and therefore be able to best predict certain long-term choices the person makes. It might take the investigator several more studies to understand this pattern, but think what has been gained—the realization and understanding that happiness is not a monolithic concept, and that different measures capture specific aspects of it.
Let us examine yet another example, this one on heritability. Imagine a researcher who locates a large number of young adult twins who were separated at birth, with both monozygotic and dizygotic twins in the sample. Also suppose that the researcher would like to estimate the heritability of extraversión and does so by administering to all twins a self-report extraversión questionnaire. However, if a heritability of .45 is found for the trait, based on the relative size of correlations for the two types of twins, what does the coefficient mean? It could be that extraversión is heritable at .45, but this coefficient could be contaminated by the heritability of response predispositions such as conformity, number-use tendencies such as avoiding extreme answers, or inherited dispositions related to memory recall. Without other types of extraversión measures, it is impossible to conclude much about the heritability of extraversión per se. Adding other measurement methods such as informant reports of extraversión (Eid, Riemann, Angleitner, & Borke-nau, 2003) can help the researcher—either by converging with the self-report measures, and thus giving strength to the conclusions, or by diverging and thereby showing that the measures reflect influences in addition to extraversión per se. If researchers begin to take a longer-term perspective on their research beyond the findings of single studies, it is evident that the use of multiple measures is likely to enormously aid scientific understanding.
As Cronbach (1995, p. 145) stressed, method variance is not "the serpent in the psychologist's Eden" but a topic of constructive and theory-driven research. The explanation of method effects can enhance validity by suppressing method-specific variance and by detecting moderator variables that might at least guarantee the validity of the scale for a subgroup of individuals. Eid, Schneider, and Schwenkmezger (1999) have shown how the knowledge of the causes of method effects can be used to pinpoint suppressor variables in a multi-method study to enhance validity. They repeatedly assessed mood states and the perceived deviation of mood states from the habitual mood level (Do you feel better or worse than you generally feel?) after the same lecture. Moreover, they asked individuals to judge their general mood level. They found a high but imperfect correlation between two methods measuring the general mood level (mean of repeatedly mood states versus judgments of one's general mood). One possible explanation of this imperfect association was the hypothesis that the situation after a lecture was not representative for one's life, and that therefore the aggregated states were composed of two parts—one being representative for one's life in general and one indicating a systematic deviation of one's mood after the lecture from one's general mood level. Indeed, there was stability in the deviation scores, showing that individuals had a tendency to generally feel better or worse after the same lecture. This general deviation variable was uncorrelated with the global trait assessment but highly correlated with the aggregated states. This indicates that the general mood deviation score can be used as a suppressor variable to suppress the variance in the aggregated state scores that was atypical for one's life in general and only typical for the lecture situation. Consequently, using this suppressor variable significantly increased the convergent validity coefficient for the two methods measuring a mood trait (aggregated states versus a global judgment). Hence, suppressing method-specific variance can help to establish higher convergent validity.
A deeper understanding of method effects can also result in the conclusion that there might be convergent validity for some subgroups but not for others (differential convergent validity). Miller and Cardy (2000), for example, found higher convergence of self- and other reported performance appraisals for low self-monitors than for high self-monitors. Again, theoretical predictions from theories of self-monitoring could enhance our understanding of method effects and could be used to detect differential validity.
Was this article helpful?