In cross-sectional research, a major concern is addressing the issue of measurement invariance across groups. Does a set of items measure cognitive ability equally well in African-American and Caucasian populations? Does a standard measure of extroversion or depression capture the same underlying construct in the United States and China? Similar issues can arise in longitudinal research when measures are collected over extended periods of time. Does a standard measure of childhood extroversion assess the same construct at age 12 and age 18? If change over time is to be studied, the same construct must be measured at each time point. Measurement invariance may be established within either (a) the Rasch/IRT or (b) the confirmatory factor analysis approaches.
Measurement invariance implies that the score on the instrument is independent of any variables other than the person's value on the theoretical construct of interest. To illustrate how measurement invariance might fail, consider a test of mathematics ability for intermediate school students. Suppose that the following item were devised: "A baseball player has 333 at bats and 111 hits. What is his batting average?" Although this item clearly reflects mathematical ability, it also reflects knowledge about baseball—knowledge that is more likely to be found in male than female students with the same level of mathematics ability. Such items that exhibit a systematic relationship with group characteristics after controlling for the construct level are said to be functioning differentially across groups. Differential item functioning (DIF) thus contributes to measurement non-invariance across groups. Similarly, if measurement invariance holds across time, then the probability of a set of observed scores occurring is conditional only on the level of the latent construct and is independent of any variable related to time:
P(Y|0,Xi) = JP(Y|0), where Y is the set of observed scores, 6 is the level of latent construct and X( is the set of time-related variables such as age and testing occasion. For example, an item such as, "Did you make your bed this morning?" might be a good measure of the orderliness facet of conscientiousness for college students at the beginning of the semester, but not during exam weeks. Only when measurement invariance over time is established can we conclude that the measurement scale for the underlying construct remains the same. Of importance, measurement invariance allows us to conclude that changes in scores are the result of changes over time on the construct of interest rather than on other characteristics of the instrument or the participants.
Was this article helpful?