Multilevel models can be especially useful when measures are constructed according to a logic that confers specific characteristics to the measures. We discuss facet design as an example, but other systematic question construction approaches result in similar data. If the measures can be assigned values on specific variables, multilevel models can be used to analyze the effect of both person and question characteristics on the responses. For those question characteristics whose effects vary across persons, residuals or posterior means can be assigned to people as scores on these characteristics. A second area where multilevel models are useful for measurement is when contextual characteristics must be assessed. We discuss the example of pupils rating the school principal. Various multilevel models can be used to assess the reliability and validity of such ratings at specific levels of the hierarchy. Multilevel modeling is useful in generalizability theory only if the design results in mostly nested data sets; data sets with a large number of crossed facets lead to large cross-classified data sets that current multilevel software does not handle well.

The measurement procedures outlined earlier are based on classical test theory, which means that they assume continuous multivariate normal outcomes. Most test items are categorical. If the items are dichotomous, we can use logistic multilevel modeling. If there are two levels, the item level and the person level, multilevel logistic regression is equivalent to a Rasch model (Andrich, 1988; Kamata, 2001; Rost & Walter, chap. 18, this volume).

A nice feature of using multilevel models for measurement scales is that it automatically accommodates incomplete data. If some of the item scores for some of the pupils are missing, this is compensated for in the model. The model results and estimated residuals or posterior means are the correct ones under the assumption that the data are missing at random (MAR). This is a weaker assumption than the missing completely at random (MCAR) assumption required by simpler methods, such as using only complete cases or replacing missing items by the mean of the observed items. The MAR assumption requires that the missing data are missing completely at random, conditional on the available observed data. Because items typically correlate strongly, the assumption that, conditional on the available item scores, any missed items are missing completely at random is reasonable. An interesting application is to assign different subsets of items to different subsets of persons by design. In this case, the missing data can be defined as MCAR, and multilevel analysis provides a straightforward method to estimate the individuals' scores as the personlevel residuals or posterior means for the intercept. The typical estimates in multilevel modeling are empirical Bayes estimates, shrunken toward the overall mean, which are equivalent to the true score in classical test theory (cf. Lord & Novick, 1968; Nunnally & Bernstein, 1994).

Chapter 20

Was this article helpful?

## Post a comment