Summary Statistics

Every experiment will produce a set of data. We denote the original data without any statistical process as raw data. Summary statistical analysis is always the first step of statistical treatment that will give a systemic overview of the features of the data and can guide further presentation and analysis of the data. Moreover, it is often the case that producing both graphical and numerical summaries of a set of data can provide the investigator with important insights that could inform subsequent hypothesis testing (Altman, 1999). An initial summary of the data determines whether parametric or nonparametric statistical methods are most appropriate when one seeks to address issues of inference (see Section "Statistical Inference Methods,'' below).

In general, investigators will want to summarize the data as simply as possible so that they can acquire a general sense of what the data look like. For example, nominal variables such as sex and years of education can be summarized by tabulating the frequency of responses in a given category (i.e., the number of males and the number of respondents with a high school education) or by converting these frequencies into percentages (i.e., the number of males divided by the total number of respondents). These summary data can be presented in a table or bar diagram.

For ordinal or continuous data, it is important to derive point estimates of the center of the distribution, as well as the spread or dispersion of the distribution. The former numerical summary is often called a measure of central tendency, while the latter is called a measure of variability. The most common measures of central tendency are the arithmetic mean (i.e., the sum of all of the observations divided by the number of observations), the median (i.e., the value that comes half way when the data are ranked in order) and the mode (i.e., the most common value observed).

The mean is by far the most common measure of central tendency because it ties in well with most approaches to inferential statistical analysis (Fisher and van Belle, 1993). However, because the mean is sensitive to extreme values in the distribution (sometimes called outliers), it is often the case that the median provides the ''best'' estimate of the center of a distribution.

With regard to summarizing the variability of data, the most common measures are the range (i.e., the smallest value subtracted from the largest), quartiles (i.e., an estimate of the four quarters of the distribution), the interquartile range (i.e., the difference between the third and first quartile), and percentiles (i.e., the value below which a given percentage of values occur). In general these measures of dispersion are useful in getting a general sense of the data, but their mathematical properties do not lend themselves to inferential statistical techniques.

Another way to quantify variability is to calculate and standardize the average distance of each value for the mean of the distribution. This summary measure, the standard deviation, is a point estimate of the variability of values in a given distribution.

Both the mean and standard deviation of a given set of ordinal or continuous data provide the cornerstone for parametric methods of statistical inference. Once an investigator derives a general sense of the data and calculates the appropriate summary measures, he or she is well placed to begin to use these data to test a given hypothesis.

Blood Pressure Health

Blood Pressure Health

Your heart pumps blood throughout your body using a network of tubing called arteries and capillaries which return the blood back to your heart via your veins. Blood pressure is the force of the blood pushing against the walls of your arteries as your heart beats.Learn more...

Get My Free Ebook


Post a comment