Phenotyping And Genotyping Strategies For Association Testing

As in all other quantitative genetic studies, the success of an association study is heavily dependent on the accurate evaluation of the phenotype of interest. The within population variation observed for genotypes and phenotypes for an association is much greater than that found in most bi-parental mapping populations. While greater variation is preferable while aiming for higher resolution and allele mining, it can pose problems for accurate evaluation of this variation in a meaningful way in a single environment.

The inherent variation observed in phenotypic trait measurement, when combined with the substantial genetic variation included in some association studies, requires careful experimental design to acquire quality data. In addition, evaluations in multiple environments with controls and unbalanced designs may be required. In our experience with maize, we found that evaluating the germplasm in short day environments has facilitated some trait evaluation by reducing photoperiod effects between lines. Additionally, we found that evaluating the germplasm in testcrosses (F1 hybrids) has reduced the phenotypic range into a manageable level. Since each of these approaches interact with the genetic architectures of the traits, future studies will be needed to fully understand the tradeoffs of various study design approaches.

In the association study design, genotyping is required for both inferences on the genotype/phenotype associations and on the population structure and demography. The first aim of querying candidate regions for polymorphisms is best achieved by genotyping SNPs within these candidate regions. The second aim of gathering information on population specific phenomenon like structure, linkage, demography, and kinship can be achieved through genotyping neutral background markers, such as SNPs on non-coding regions or SSRs (simple sequence repeats) distributed evenly throughout the genome.

All genetic markers can be used for investigating association; however, SNPs potentially have the most utility compared to rest of the genetic markers. Various assays were developed for detection of known and unknown SNPs. Some are relatively easy to implement and low in cost, others are developed for high volume screening at substantial cost. As the cost of genotyping reduces, genome-wide scans of all available polymorphisms in a species genome are becoming rapidly feasible and preferable over targeted SNP genotyping approaches. SSR markers have historically been useful in association studies and do have high information content, but they may be difficult to find in candidate gene regions and they are several fold more expensive to score than SNPs.

For the purposes of inferences on the population history, genotype information from a large number of neutral marker loci is required. We are using the term neutral marker loosely here, to indicate the non-candidate loci, i.e. the loci that were not designated as candidate loci that can putatively influence a trait of interest. The density of the markers required should be scaled to provide genome-wide coverage. Simulation studies suggest 100 SSR or 200 SNP markers would suffice to get a reasonable estimate of population structure and relatedness for most crop plants (Yu and Buckler unpublished results).

When targeting candidate loci for association studies, the greatest statistical power is achieved when the marker and QTL have equal allele frequencies (Abecasis et al. 2001) in the study population. This is due to opportunity created for maximal linkage and LD since robust detection of associations requires the marker and trait loci are in phase. If there is no knowledge of the QTL frequency distribution a priori, the best alternative is to choose markers with a wide range of allele frequencies that are likely to mimic the QTL mutation rate. Some SSRs probably mutate faster and have a different frequency distribution than QTL, which may make them less useful for association mapping. SNPs with a wide range of allele frequencies are most likely to be informative. In order to maximize the information content of SNPs, a large number of them can be chosen to scan a particular genomic region, and this can be achieved with numerous algorithms available for choosing SNPs. (Ackerman et al. 2003; Daly et al. 2001; Forton et al. 2005; Gabriel et al. 2002; Halldorsson et al. 2004; Johnson et al. 2001; Ke and Cardon 2003; Patil et al. 2001; Sebastiani et al. 2003; Zhang and Jin 2003).

Wether the phenotype of interest has a binary or quantitative phenotype is also of interest for the association study design. When a binary trait is being investigated, case-control type populations are required for association analysis, where equivalent sized sub-populations of individuals that display the phenotype of interest (cases) and do not display the phenotype of interest (controls) are querried for allelic association of genetic loci with the case and control phenotypes in a statistically significant manner. The statistical test performed is simply a hypothesis test, that asks weather or not the allelic frequency distribution of a locus is the same or different for a given locus between the two sub-populations. Bulk Segregant Analysis (BSA) type (Michelmore et al. 1991) bulked sample genotype screening methods for all the available marker loci may facilitate the candidate gene and association discovery, for binary traits (Shaw et al. 1998). The challange of case-control type studies is to make sure that the case and control groups are comparable in terms of their genetic makeup. Most of the statistical methods aim to detect and correct for the affects of population statification and ancestry differences between the case and control groups (Price et al. 2006; Pritchard et al. 2000b).

Was this article helpful?

0 0

Post a comment