## Regression

Correlation provides information only about the strength and direction of association between variables, but often we want to know more than just whether two variables are associated; we want to be able to predict the value of one variable, given a value of the other.

A positive correlation exists between body weight of parents and body weight of their offspring; this correlation exists in part because genes influence body weight, and parents and children share the same genes. Because of this association between phenotypes of parent and offspring, we can predict the weight of an individual on the basis of the weights of its parents. This type of statistical prediction is called regression. This technique plays an important role in quantitative genetics because it allows us to predict characteristics of offspring from a given mating, even without knowledge of the genotypes that encode the characteristic.

Regression can be understood by plotting a series of x and y values. 4 Figure 22.13 illustrates the relation between the weight of fathers (x) and the weight of their offspring (y). Each father - offspring pair is represented by a point on the graph. The overall relation between these two variables is depicted by the regression line, which is the line that best fits all the points on the graph (deviations of the points from the line are minimized). The regression line defines the relation between the x and y variables and can be represented by Weight of father (kg)

22.13 A regression line defines the relation between two variables. Illustrated here is a regression of the weights of fathers against the weights of sons. Each father-offspring pair is represented by a point on the graph: the x value of a point is the father's weight and the y value of the point is the offspring's weight.

In Equation 22.8, x and y represent the x and y variables (in this case, the father's weight and the offspring's weight, respectively). The variable a is the y intercept of the line, which is the expected value of y when x is 0. Variable b is the slope of the regression line, also called the regression coefficient; it indicates how much y increases, on average, per increase in x.

Trying to position a regression line by eye is not only very difficult but also inaccurate when there are many points scattered over a wide area. Fortunately, the regression coefficient and y intercept can be obtained mathematically. The regression coefficient (b) can be computed from the covariance of x and y (covxy) and the variance of x (s;) by

Ixy.

Several regression lines with different regression coefficients are illustrated in 4 Figure 22.14.

After the regression coefficient has been calculated, the y intercept can be calculated by substituting the regression 22.14 The regression coefficient (b) represents the change in y per unit change in x. Shown here are regression lines with different regression coefficients.

coefficient and the mean values of x and y into the following equation:

The regression equation (y = a + bx) can then be used to predict the value of any y given the value of x. __