Researchers have increasingly recognized the value of longitudinal designs for the study of stability and change, for understanding developmental processes, and for establishing the direction of hypothesized causal effects. Researchers have increasingly gone beyond the minimal two-wave longitudinal design and now often include several measurement waves. These multiwave designs potentially permit the researcher to move beyond traditional analyses such as correlation, regression, and analysis of variance and use promising newer analysis approaches such as the autoregressive, latent state-trait models, and growth curve models presented in this chapter.4 These analyses can potentially provide better answers to traditional questions in longitudinal research. They also permit researchers to raise interesting new questions that were rarely, if ever, considered within the traditional analytic frameworks. For example, latent trait-state models can provide definitive information about the role of states and traits, a classic problem in personality measurement. Growth curve models permit researchers to identify variables that explain individual differences

4Ferrer and McArdle (2003) and McArdle and Nesselroade (2003) provide a review of these and several other recently developed longitudinal models that could not be included in this chapter because of space limitations.

in growth trajectories, a question that was not raised until the development of these models.

Longitudinal researchers, like researchers in many other areas of psychology (see Aiken, West, Sechrest, & Reno, 1990) have often paid minimal attention to measurement issues. And historically, such lack of attention could be justified because the traditional measurement practices were "good enough" to provide adequate tests of the hypotheses. Answering questions within a traditional null hypothesis testing framework about the simple existence of a difference between means or of a correlation does not require sophisticated measurement. Ordinal level measurement provides sufficient information. And statistical methods like ANOVA and regression that were designed for interval-level scales have proven to be relatively robust even when applied to ordinal scales. So long as the assumptions of the procedure (residuals are independent, normally distributed, and have constant variance) are met, the traditional measures produce reasonable answers (Cliff, 1993). And researchers could compensate for the loss of statistical power associated with the use of ordinal measurement by moderate increases in sample size. However, psychologists have begun to ask more complex questions about the size and the form of relationships. What is the magnitude of the effect of treatment? How much do boys versus girls gain in proficiency in mathematics achievement from Grade 1 to 3? Does the acquisition of vocabulary in children between 12 and 24 months show a linear or exponential increase? Proper answers to such questions require more sophisticated measurement.

There is an intimate relationship between theory, methodological design, statistical analysis, and measurement. Many traditional questions about the stability of constructs and the relationship of one construct to another over time can be adequately answered even without achieving interval-level measurement. Some added benefits do come from interval-level measurement: More powerful statistical tests and a more definitive interpretation of exactly what construct is or is not stable (and to what degree) can be achieved. But, in contrast, as psychologists ask increasingly more sophisticated theoretical questions about change over time and use more complex statistical analyses that are capable of providing answers to these questions, interval-level measurement will be required. The exemplary initial demonstrations of the newer statistical models for modeling change have deliberately used interval-level measures. To cite two examples, Cudeck (1996) reported nonlinear models of growth in physical measures (e.g., height) and number of correct responses in learning. McAr-dle and Nesselroade (2003) emphasized growth models using a Rasch-scaled cognitive measure (the Woodcock-Johnson measure of intelligence). As these newer statistical models of growth are applied to current measures of psychological characteristics (e.g., attitudes, traits), the limitations of many current measures will become more apparent. For example, how can researchers distinguish between linear growth and growth to an asymptote if they cannot be confident that measurements have been made on an interval scale? Evidence of measurement quality traditionally cited in reports of instrument developmentâ€”adequate coefficient alpha, test-retest correlation, and correlations with external criteriaâ€”will not be sufficient for longitudinal researchers who wish to model growth using the newer statistical models that demand interval-level measurement.

In this chapter we have emphasized four features of longitudinal measurement for psychological characteristics. These features can be viewed as desiderata that can help ensure that the measurement of constructs over time is adequate for the study of growth and change. These desiderata can be achieved using Rasch or IRT approaches for dichoto-mous or ordered categorical items and confirmatory factor analysis procedures for continuous items.

1. Scales developed to measure the construct of interest should ideally be unidimensional. In cross-sectional studies, the use of scales with more than one underlying dimension has led to considerable complexity in the interpretation of the results of studies using these scales. Although multidimensional scales may be used in longitudinal studies, interpretation will be challenging because each of the underlying dimensions may change at different rates over time.

2. Scales should attempt to achieve an interval level of measurement. The same numerical difference at different points on the scale should indicate the same amount of change in the underlying construct.

3. Measurement invariance over time should be established to ensure that the construct has a stable meaning. Each of the items on the instrument should measure the same construct at each measurement wave. The goal is to produce measures that assess only change on the construct and not differential functioning of items as their meaning changes over time.

4. Measures should use items and response formats that are appropriate for the age or grade level of the participants. The different forms of the measure must be linked and equated onto a single common scale. This practice is commonly used in educational research where procedures for vertical equating of measures containing both different and overlapping items have been well developed. For psychological measures, this issue of externally developing age-appropriate measures will often arise in longer duration studies that cross different developmental periods.

Achieving these desiderata will provide a different degree of challenge for different areas of longitudinal research in psychology. Some existing areas such as the study of physical growth and the growth of cognitive abilities have long used measures that meet these desiderata. Emerging areas will need to ensure that they address these issues as they develop new measurement scales. And in many other existing areas researchers will need to rescale existing instruments to develop measures that more adequately meet these desiderata. But, in each case, there will be a clear payoff. Researchers will have a substantially enhanced ability to ask and properly answer interesting new questions about change in important psychological constructs.

Chapter 22

Was this article helpful?

## Post a comment