We paint a slightly caricatured picture of the field to make a point. Many organizational researchers can, and do, use alternative methods to collect some of their data. For example, many studies of quitting, retirement, and absence use objective organizational reports of such behavior. Other studies of leadership and citizenship behavior, for example, use peer, subordinate, and spousal reports of such behaviors. These studies are stronger for using multisource data; however, it is still the rare study that uses more than one source of data to measure a single construct. Moreover, many key constructs would simply be poorly measured using objective or other reports. Consider job satisfaction, perceptions of organizational justice, fairness, and job withdrawal intentions. It is difficult to obtain assessments other than self-reports of these variables. Thus, when alternative methods are used, they are used sparingly and usually for constructs that lend themselves well to alternative measurement. Nonetheless there have been exceptions, to which we now turn.
Researchers at AT&T in the 1960s applied Campbell and Fiske's (1959) ideas to measure abilities of their managers in what became called the "assessment center" (Bray, 1982; Bray, Campbell, & Grant, 1974). Following on earlier work done to select spies during World War II, these researchers developed a complex and realistic set of exercises in their attempts to assess many components of abilities and motivations that contributed to effective job performance. Each ability or motivation was assessed using multiple methods. For example, a trait was assessed using paper-and-pencil tests, in-basket exercises, ratings by observers of interactions in group discussion, and ratings by interviewers obtained from one-on-one interactions in interviews. Multiple traits of each person going through the assessment process were assessed using multiple independent methods. At least two raters rated each trait within each method. Using such comprehensive measures, it was possible for researchers to generate a true multitrait-multimethod matrix of the sample of managers.
Unfortunately, more than anything else, this study revealed just how difficult it is to assess subjective traits such as interpersonal ability using fallible human raters. The scores raters assigned were often better predicted by their ratings of other individuals than they were by ratings of the same trait by other raters or by other methods or exercises (Robertson, Gratton, & Sharpley, 1987; Sackett & Dreher, 1982; Sackett & Harris, 1988). In other words, the heterotrait-monomethod correlations were consistently stronger than the monotrait-het-eromethod correlations. Scores were consistent within-method but less so across methods within-assessees.
This problem has reemerged in recent years with multirater ("360 degree") assessment systems. Mul-tirater assessment attempts to augment traditional supervisor ratings of performance with ratings from other, operationally independent, observers such as peers, subordinates, and customers. It is a direct attempt to overcome the limitations of single-source data in performance rating. Evidence from this approach indicates small ratee effects coupled with very large rater effects and within-rater correlations (Scullen, Mount, & Goff, 2000), akin to the large method effects for assessment centers noted earlier. Who does the rating of performance matters far more than performance itself.
The assessment center studies and the multirater approach to performance ratings are noteworthy for the problems they highlight. They are vivid examples of the difficulty of obtaining reliable and independent assessments in organizational research. They also suggest that substantially high estimates of scale reliability and "convergent" validity of our measures may be due to shared method variance as much as the consistencies of individuals' standings on constructs.
Research has continued, however, without an adequate solution to the problem of shared method variance. This implies that when our conclusions are based on single-method, single-source data, they may have substantial amounts of correlated error variance. Multimethod and multisource data are valued; they might be argued to be the gold standard in I/O field research. However, the constraints of field research and the biases of I/O researchers have limited the extent to which multiple operations of single constructs appear in the literature.
Was this article helpful?