## Implications

1. Generally, there is no clear break in the distribution of p values that can reasonably be used as a basis for setting a threshold. In an ideal dataset for which all assumptions are met, p values are a statement of probability. However, based on observed distributions of residuals, microarray experiments may not always give accurate p values. The degree of inaccuracy varies with data quality and the amount of change. Given this fact, we can protect against overly optimistic p values by applying very stringent multiple comparison methods. The simplest multi-test correction method is the Bonferroni correction, which is optimized for a small number of tests. For microarray experiments in which thousands of genes are tested simultaneously, the correction is extraordinarily strict. The Bonferroni threshold can be considered to be the upper bound, in that no genes that pass this threshold are expected to be false positives. Despite the stringency of the Bonferroni correction, this approach has been successful in microarray experiments (Cheng et al. 2002; Wayne and Mclntyre 2002; Wittwer et al. 2002; Bennett et al. 2003; Dow 2003). Several more complex corrections have been developed but are not widely employed because of their computational complexity (Efron and Tibshirani 2002; Westfall et al. 2002; Reiner et al. 2003).

2. If only one or two probesets pass the Bonferroni threshold when many changes are expected, then the experiment was statistically under-powered. If further replication or variance reduction is impractical, then use the p values as a ranking method. Once the probesets are ranked, the most significantly changed probesets can be selected from the top of the list. Although this approach does not address the multiple comparison problem or provide an analytical threshold, it does order the probesets by relevance. In such situations no statement of statistical confidence is possible.

3. The population of p values that pass a threshold is not normally distributed. The majority of values are close to the threshold, which exacerbates the effect of changing the threshold on the population that pass (Figure 3.14). Thus, the probesets that pass the threshold have relatively high p values and are very sensitive to the test used and the number of replicates. The range of p values and the distance from the threshold of a given gene are much more informative.

4. We recommend transforming p values to scores (-log10 (p value)) so that they can be viewed more intuitively. Higher scores reflect higher significance and the differences between 0.1 and 0.001 can readily be seen.

A p-value cutoff B

A p-value cutoff B

< increased confidence * increased confidence

Fig. 3.14 Values that pass a threshold are not normally distributed. (A) Histogram of a normally distributed population to which we apply a threshold. (B) The population that passes the threshold. The newly created list has a majority of probesets having relatively low confidence; that is, most of the probesets are close to the threshold.

< increased confidence * increased confidence

Fig. 3.14 Values that pass a threshold are not normally distributed. (A) Histogram of a normally distributed population to which we apply a threshold. (B) The population that passes the threshold. The newly created list has a majority of probesets having relatively low confidence; that is, most of the probesets are close to the threshold.

56 | 3 Oligo Arrays, Global Transcriptome Analysis 3.2.2.3 Clustering and Classification

Clustering methods have gained wide acceptance in toxicogenomics. Although not rigorous statistically, they do have value in ordering and exploring data. Clustering is extensively reviewed elsewhere (Wang et al. 1999; Retief 2000; Everitt et al. 2001). This section is intended as a guide to using clustering as an exploratory method in conjunction with statistical tests. The clustering techniques described here are un-guided and can be divided into two types: hierarchical and not hierarchical. 'Not hierarchical' refers to a range of techniques including self-organizing maps, k means, and correlation methods. During clustering a second level of normalization is applied. In the MAS5.0 software the arrays are normalized. In the clustering techniques the probesets are normalized.

## Post a comment