The GEO database archives large volumes of gene expression data generated by the scientific community. Several different approaches to mining GEO data are outlined in this chapter; each of these methods assists biologists to drill down through inherently noisy expression data to genes that are relevant, or behave in a way that is relevant, to their particular area of study.

Making such a large collection of data accessible and analyzable using common interfaces adds a valuable investigative dimension not attained when considering isolated experiments. Through analyzing multiple, independently generated DataSets that examine similar phenomena, it is possible to substantiate interesting gene expression trends that may have been overlooked, or are borderline, in one experiment alone. Researchers can look to see what the preponderance of evidence indicates about the behavior of a gene, or group of genes (7,8). Users can mine GEO for evidence that corroborates laboratory findings, or they may look to GEO for candidate genes worthy of further study in the laboratory. Having sequence information together with expression information can help in the functional annotation and characterization of unknown genes, or in finding novel roles for characterized genes. These data are also valuable to genome-wide studies, allowing biologists to review global gene expression in various cell types and states, to compare with orthologs in other species, and to search for repeated patterns of coregulated groups of transcripts that assist formulation of hypotheses on functional networks and pathways (9-11).

Additionally, integration of GEO data into NCBI's Entrez search engine greatly expands the utility of the data. Entrez is a powerful tool that enables disparate data in multiple databases to be richly interconnected. This can lead to inference of previously unidentified relationships between diverse data types, facilitating novel hypothesis generation, or assisting in the interpretation of available information. Such opportunities for discovery will only increase as the database continues to grow.

The GEO database is under continuous development, so the examples and data presentation strategies described in this chapter may become outdated over time. To keep informed of the latest GEO developments, subscribe to the GEO mailing list at [email protected].

