## Multimethod Item Response Theory

Jürgen Rost and Oliver Walter

Item response theory (IRT) is a framework for an increasing number of statistical models that refer to the same kind of data structure. The data basis for applying IRT models is a matrix of responses of a (large) number of persons on a (small) number of questions, tasks, stimuli, or whatever, called the items. The item responses may be dichotomous (yes-no; correct-incorrect; true-false, etc.), ordinal (strongly disagree, disagree, agree, strongly agree), or nominal without a given order of the categories (for example, hair color: blond, brown, red, black, gray/white). In any case, they represent categorical data that makes IRT models different from most other statistical models that refer to metrical variables, for example, structural equation models (Eid, Lischetzke, & Nussbeck, chap. 20, this volume).

In contrast to the latter, which often are aimed at modeling the covariance structure of the observed variables, IRT models try to model the observed response patterns and their frequencies. For that purpose, some of the IRT models are modeling the distributions of one or more latent variables and, if there are more than one, also their (latent) correlations. But in general, correlations of observed or latent variables are not the primary concern of IRT models. Rather, the focus of IRT models is on the response patterns of the persons who filled out a test or a questionnaire. Moreover, it is the single-item response xvj of a person v on an item i that is to be explained or predicted by an IRT model. Because almost all these models deal with the probabilities of these item responses, and not with their occurrence or absence in a deterministic sense, the typical IRT model is a probabilistic model dealing with p(xvi), that is, the probability of person v to give response x on item i.

Multimethod IRT models refer to an extended data structure. The item responses were gained or assessed with different methods. For example, Item 1 was assessed with Method A, Item 2 with Method B. Or all items of a test were administered once using Method A and another time using Method B. A classic example is a personality questionnaire that has been administered to three different persons: the subject being tested, the subject's (romantic) partner, and a good friend. The three modes of responding to the test items represent the three methods of self-report, partner rating, and good friend rating, so that the data matrix (persons X items) extends to a data cube (persons x items x methods). In general, the data structure of multi-method test data and related IRT models is seen in this chapter as a data cube with methods as the third dimension, in addition to persons and items in ordinary IRT. The aim of multimethod IRT models is to explain p(xvy), that is, the probability that the score x on item i measured by method j is obtained for a particular person v.

Such a three-dimensional data structure (data cube) is not specific for multimethod IRT. A third dimension is also given in the situation of measurement of change, where "time" is the third dimension of the data structure (see Khoo, West, Wu, & Kwok, chap. 21, this volume). Time may be seen as a special kind of "method" that is defined as the time point of test administration. If the test is applied at another time, the situation will also be different, and the repeated test application may be seen as a different '"method." It makes no sense to stress the differences between measurement of change and multimethod methodology. In fact, both can learn from each other, and as far as IRT is concerned, much can be learned from change or learning models that may be relevant for multimethod assessment.

In the first section, the basic models of all models described in this chapter, the Rasch model and latent class analysis, are presented. In the following sections these models are extended to the three-dimensional data structure. First, the Rasch model for multimethod data will be introduced and then generalized. Because there is no single way of generalizing to deal with the structure of a data cube, three different ways will be pursued. These three directions are the interaction between items and methods, the multidimensional extension, and the mixture distribution extension, which is a combination of the Rasch model and latent class analysis. Section 3 deals with more technical aspects like parameter estimation, missing data, measures of accuracy, and model fit. Section 4 presents the results of the application of the described models to the field test data of the German PISA 2003 science test. Section 5 summarizes the models that were applied and points out the directions of further development of IRT.