Info

Reliability is an index of the dependability of the measurement. Two measures of reliability are currently widely reported in the literature, coefficient alpha and the test-retest correlation.

Coefficient alpha. When the data are collected on a single measurement occasion, Cronbach's (1951) coefficient alpha (a) is typically reported. Conceptually, a can be thought of as the correlation between two equivalent scales of the same length given at the same time.

Coefficient alpha has several little-known properties that may limit its usefulness in application (Cortina, 1993; Feldt & Brennan, 1989; Schmitt, 1996). First, a assumes that all items are equally good measures of the underlying construct, a condition known as essential tau equivalence (see section on homogeneity for a fuller description). If some items should ideally be weighted more heavily in estimating the true score, then a will underestimate the reliability. Second, a is dependent on test length. For example, if a 10-item scale had an a = .70 and another exactly parallel set of 10 items could be identified, then a for the 20-item scale would be .82. Third, a addresses sources of error that result from the sampling of equivalent items and potential variability within the measurement period (e.g., within-test variability in level of concentration). It does not address error resulting from sources that may vary over measurement occasions (e.g., Py.y,, daily changes in mood). Fourth, a high level of a does not indicate that a single dimension has been measured. For example, Cortina showed that if two orthogonal dimensions underlie a set of items, even if the intercorrelations between items within each dimension are modest (e.g., = .30), a will exceed .70 if the scale has more than 14 items. Even higher values of a will be achieved if the dimensions are correlated. Finally, a may differ for measures collected during different periods of a longitudinal study. Both the variance in the true scores and the measured scores may change over time so that a can change dramatically. A measure of IQ collected on a group of children at age 4 will typically have a lower a than the same measure collected on the children at age 10. In later sections, we describe alternative approaches that address several of these issues as well as others that arise in longitudinal measurement contexts.

Test-retest correlations. A second method of estimating reliability is to calculate the correlation between the scores on the same set of items taken at two points in time. Test-retest approaches assume that (a) the participants' true scores do not change on the measure during the (short) interval between Time 1 and Time 2 and that (b) responding to the item at Time 1 has no effect on the response at Time 2 (e.g., no memory for prior responses on an ability test). Green (2003) has recently developed a test-retest version of a. Test-retest a eliminates sources of error that change across measurement occasions (e.g., daily mood changes), but otherwise shares the assumptions and properties of traditional a described earlier.

In longer-term studies, the interpretation of the test-retest correlation changes. It can no longer be assumed that there has been no change in the participants' true scores or that all participants change at the same rate. Children and adults change over time in their abilities, personality traits, and physical characteristics such as height and weight. In this case the test-retest correlation is an estimate of the stability of the measure—the extent to which the (rank) order of the participants at Time 1 is the same as the order of the participants at Time 2. Otherwise stated, the level of the measure (e.g., height) may change over time, but stability is shown to the degree that participants' amount of change is proportional to their initial level on the measure.