## Generalizability Theory GT

Generalizability Theory (Cronbach, Rajaratnam, & Gleser, 1963; Cronbach et al., 1972; Gleser, Cronbach, & Rajaratnam, 1965; Shavelson & Webb, 1991) combines Brunswik's request for representative multifacet designs with the true-score model of Classical Test Theory (CTT; Lord & Novick, 1968; Spearman, 1910). Like CTT, GT assumes that each person (or other object of-measurement) has a true score on the measured attribute. In GT, this score is called the universe score. Whereas CTT treats the difference between the true score and the observed score as measurement error that lacks substantive significance, GT proposes to decompose the difference between the universe score and the observed score into psychologically meaningful sources of variance. These sources of variance must be specified on the basis of theoretical and practical considerations as facets of a factorial measurement design. For example, if leniency differences between teachers are assumed to cause grade differences, the design must include a teacher facet. If grades were used to make absolute decisions (only A students get a stipend), the main effect of the teacher facet reduces the absolute generalizability of grades across teachers. If grades were used to make relative decisions (upper 10% of students get a stipend), an interaction between the student facet and the teacher facet then limits the relative generalizability of grades across teachers.

Assuming the use of equivalent interval scales, the universe score of an object of measurement is defined as its expected value on the attribute scale (i.e., the mean of all admissible observations). Relative generalizability is defined as the squared correlation between the observed score variable and the universe score variable (i.e., the ratio of universe score variance to observed score variance). This definition of relative generalizability corresponds directly to the definition of reliability in CTT. Because the universe score is unknown, relative generalizability must be estimated from several observed score variables. The intraclass correlation among conditions provides this estimate. It is an overall index of relative consistency and reflects the degree of interaction between persons (or other measurement objects) and the facets. Coefficients of absolute generalizability are sometimes defined as variance ratios. Their denominator includes variance components attributable to facet main effects and interaction effects. Shavelson, Webb, and Rowley (1989) illustrated the difference between absolute and relative generalizability with simple substantive examples. Marcoulides (1996) showed how variance components can be estimated with structural equation modeling. Hoyt (2000) provides a comprehensive treatment of absolute and relative bias (lack of generalizability) in univariate and multivariate applications of GT.

The first proposal of GT was limited to the one facet case (Cronbach et al., 1963). Gleser et al. (1965) extended GT to the multifacet case and defined generalizability coefficients for several types of two facet designs. Cronbach et al. (1972) offered the most comprehensive version of GT. They introduced additional designs and, more important, multivariate GT. Multivariate GT focuses on the generalizability of attribute profiles (i.e., the joint generalizability of measures for two or more attributes). Whereas univariate GT decomposes the variance of one observed variable into components due to facet main effects, facet interaction effects and person x facet interactions, multivariate GT also decomposes the covariance of two or more observed variables (Hoyt, 2000; Wittmann, 1988).

The models and methods of GT are useful for understanding the psycho-logic and methodo-logic of multimethod approaches, (a) Compared to CTT, GT provides a more comprehensive, differentiated, and flexible conceptualization of reliability, (b) GT contributes to understanding and defining the concepts of convergent and discriminant validity Convergent validity corresponds to the generalizability of interindividual differences in the measured attribute across the method facet. Discriminant validity corresponds to a lack of generalizability of intraindi-vidual differences between two or more theoretically unrelated attributes across the method facet, (c) By combining generalizability studies with decision studies, GT links basic research on the properties of measurement instruments with the usefulness of diagnostic information in applied psychology, (d) Last but not least, measurement designs including nested facets inspired hierarchical linear modeling of multilevel data, a methodological framework that has greatly enriched multimethod research during recent years (Hox & Maas, this volume, chap. 19; Raudenbush & Bryk, 2002).

Before we turn to the last milestone, note that the ideas that were advanced in covariance structure models of multitrait multimethod data and in generalizability theory are also dealt with in multi-component item response models (Rost & Walter, this volume, chap. 18).

0 0