## Info

.7 - U .. •» ' •• •» •

Figure 5.8 Plots of predicted values of the DV (Y') against residuals, showing (A) assumptions met, (B) failure of normality, (C) nonlinearity, and (D) heteroscedasticity. Reprinted with permission of Tabachnick and Fidell (2001b), Using multivariate statistics (Boston: Allyn and Bacon).

roughly the same at all values of another continuous variable. Failures of linearity and homoscedasticity of residuals are illustrated in Figure 5.8 (panels C and D).

Heteroscedasticity, the failure of homoscedasticity, occurs because one of the variables is not normally distributed (i.e., one variable is linearly related to some transformation of the other), because there is greater error of measurement of one variable at some levels, or because one of the variables is spread apart at some levels by its relationship to a third variable (measured in the design or not), as seen in Figure 5.9. An example of true heteroscedasticity is the relationship between age (Z1) and income (X2), as depicted in Figure 5.9, panel B. People start out making about the same salaries, but with increasing age, people spread farther apart on income. In this example, income is positively skewed, and transformation of income is likely to improve the ho-moscedasticity of its relationship with age. An example of heteroscedasticity caused by greater error of measurement at some levels of an IV might be weight. People in the age range of 25 to 45 are probably more concerned about their weight than are people who are younger or older. Older

Homoscedasticity with both variables normally distributed

Heteroscedasticity with skewness on X2

Figure 5.9 Bivariate scatter plots under conditions of homoscedasticity and heteroscedasticity. Reprinted with permission of Tabachnick and Fidell (2001b), Using multivariate statistics (Boston: Allyn and Bacon).

and younger people, then, are likely to give less reliable estimates of their weight, increasing the variance of weight scores at those ages.

Nonlinearity and heteroscedasticity are not fatal to an analysis of ungrouped data because at least the linear component of the relationship between the two variables is captured by the analysis. However, the analysis misses the other components of the relationship unless entered by the researcher.

Nonlinearity and heteroscedasticity are diagnosed either from residuals plots or from bivariate scatter plots. As seen in Figure 5.8 (for residuals) and Figure 5.9 (for bivariate scatter plots), when linearity and homoscedasticity are present, the envelope of points is roughly the same width over the range of values of both variables and the relationship is adequately represented by a straight line. Departures from linearity and homoscedasticity distort the envelope over certain ranges of one or both variables. Normalizing transformations improve linearity and homoscedasticity of the relationship and, usually, the results of the overall analysis.

Sometimes, however, skewness is not just a statistical problem; rather, there is a true nonlinear relationship between two variables, as seen in Figure 5.10, panel A. Consider, for example, the number of symptoms and the dosage of drug. There are numerous symptoms when the dosage is low, only a few symptoms when the dosage is moderate, and lots of symptoms again when the dosage is high, reflecting a quadratic relationship. One alternative to capture this relationship is to use the square of the number of symptoms instead of the number of symptoms in the analysis. Another alternative is to recode dosage into two dummy variables (using linear and then quadratic trend coefficients) and then use the dummy variables in place of dosage in analysis. Alternatively, a nonlinear analytic strategy could be used, such as that available through SYSTAT NONLIN.

In panel B of Figure 5.10 two variables have both linear and quadratic relationships. One variable generally gets smaller (or larger) as the other gets larger (or smaller), but there is also a quadratic relationship. For instance, symptoms might drop off with increasing dosage, but only to a point; increasing dosage beyond the point does not result in further change of symptoms. In this case, the analysis improves if both the linear and quadratic relationships are included in the analysis.

Assessing linearity and homoscedasticity through bivari-ate scatter plots is difficult and tedious, especially with small samples and numerous variables, and more especially when

Moderate DOSAGE Curvilinear

High

Moderate DOSAGE Curvilinear

High

Moderate DOSAGE Curvilinear + linear

High

Figure 5.10 Curvilinear and curvilinear plus linear relationships. Reprinted with permission of Tabachnick and Fidell (2001b), Using multivariate statistics (Boston: Allyn and Bacon).

Moderate DOSAGE Curvilinear + linear

High

Figure 5.10 Curvilinear and curvilinear plus linear relationships. Reprinted with permission of Tabachnick and Fidell (2001b), Using multivariate statistics (Boston: Allyn and Bacon).

## Post a comment