## Implications

1. Once RNA samples are mixed, it is impossible to identify outliers or misclassifica-tion events.

2. Unlike standard toxicological endpoints, which provide a single measure per animal, microarrays provide thousands of measures. As a result, subtle environmental effects may generate more outlier transcripts. If you pool you do not observe these outliers, which may confound your results.

3. Pooling reduces variance as long as multiple pools are created for control and treatment conditions. That is, the variability of data from three pools of three mice should be lower than the variability obtained from nine mice. With inbred animals this advantage is small.

4. Pooling introduces bias, as a mixed sample is equivalent to an arithmetic average. An arithmetic error is assumed to be a random constant added to the signal. In reality, the predominant error in microarray experiments is a random constant by which the signal is multiplied. By taking the log signal we make the factor additive, so that the mean is a representative measure of the group. After pooling this kind of transformation is not possible. In pooled data any outlier disproportionately affects the signal. Pooled signals are often higher than the mean of the log signal of individual measurements due to this bias (Figure 3.17).

5. Pooling limits statistical power. The most problematic use of pooling is when no replicate pools are produced and only a single control array is compared to a single treatment. This approach has all the disadvantages of pooling without the key benefit of variance reduction. When all animals are combined into a single pool, the user is forced to select genes on the basis of magnitude alone (Figure 3.18). This approach is sensitive to outliers and precludes statistical testing.

6. Pooling limits the ability to select significantly changed transcripts. It is clear from Figure 3.18 that the genes selected by the different methods are very different. The labels in the figure define the interpretation of those genes if the pooling method was applied. Probable false positives indicate the many genes that have extreme signal log ratios pooled between experiments and controls but are not consistently measured. These false positives are termed 'probably' false, because they were not bench-validated. The cost savings realized by pooling would likely be lost by validating many of these transcripts that are highly changed in trans-

Fig. 3.18 Information lost by pooling. The y axis is the signal log2 ratio between one pool of controls and one pool of treated samples. The horizontal lines represent two-fold and four-fold cutoff values (the log ratio is base 2). If only two pools are available these horizontal lines would be the only method available for finding differentially expressed genes. The x axis is the statistical significance. Here, 'statistical significance' is the p value calculated by a t test

Fig. 3.18 Information lost by pooling. The y axis is the signal log2 ratio between one pool of controls and one pool of treated samples. The horizontal lines represent two-fold and four-fold cutoff values (the log ratio is base 2). If only two pools are available these horizontal lines would be the only method available for finding differentially expressed genes. The x axis is the statistical significance. Here, 'statistical significance' is the p value calculated by a t test of the replicates. By taking the -log10 (p value), the scale becomes equivalent to orders of magnitude. This means that 3 represents a 99.9% confidence level. If replicates exist, this test can be performed and genes are selected by the use of the vertical line representing 99% confidence. This dataset contains many significant but small changes that would be missed by pooling.

cript abundance. By contrast, the probable false negatives are very consistent, reproducible differences between treatment and control that do not meet the two-or four-fold requirement. Changes in transcription of these genes are the more subtle changes that are reliable and potentially meaningful but which are missed by pooling.

## Post a comment