Microarray technology is one of the most important experimental developments in molecular biology in recent years. Microarrays have enabled researchers to conduct large-scale quantitative assessments of gene expression, defining the transcriptome of a multitude of cellular types and states.

The National Center for Biotechnology Information (NCBI) launched the Gene Expression Omnibus (GEO) database in 2000 to support the public use and dissemination of gene expression data generated by high-throughput methodologies (1,2). The database is populated by material supplied by the scientific community. Most researchers submit to GEO in accordance with grant or journal

*This chapter is an official contribution of the National Institutes of Health; not subject to copyright in the United States.

From: Methods in Molecular Biology, vol. 338: Gene Mapping, Discovery, and Expression: Methods and Protocols Edited by: M. Bina © Humana Press Inc., Totowa, NJ

requirements stipulating that microarray data be made available through a public repository, in compliance with long-established standards of scientific reporting that allow others to judge or reproduce the results. Consequently, most of the data presented in GEO has been analyzed and published. GEO is not intended or suitable for initial analysis of newly acquired data, which is typically the role taken by laboratory information management systems (LIMS).

The GEO database stores molecular abundance data generated by a wide variety of high-throughput measuring techniques. These include microarray-based experiments that measure gene expression or detect genomic gains and losses (comparative genomic hybridization), as well as genomic tiling arrays that are used to detect transcribed regions or single-nucleotide polymorphisms, or to identify protein-binding genomic regions in conjunction with chromatin immu-noprecipitation (ChIP-chip technology). Some non-array-based high-throughput data types are also accepted by GEO, including serial analysis of gene expression (SAGE), massively parallel signature sequencing (MPSS), serial analysis of ribosomal sequence tags (SARST), and some peptide profiling techniques such as tandem mass spectrometry (MS/MS). The data analysis features discussed here are generally applicable to all these technology types, but for the purposes of this chapter the focus is on microarray-generated gene expression data, which currently constitute about 95% of the data in GEO.

At the time of writing, GEO holds over 50,000 submissions, representing approximately half a billion individual molecular abundance measurements, for over 100 organisms, submitted by over 1000 laboratories. These data explore a huge breadth of biological phenomena, for example, mouse models of diabetes, flower development in plants, anthrax sporulation, aging in fruit flies, effect of cigarette smoke on bronchial cells, kidney transplant rejection, toxicological effects of antimalarial drugs, and many others. When one is working with a vast compendium of data, it is important to be able to effectively query the data, focusing on those that are relevant to a specific area of interest. This chapter describes intuitive interfaces and tools that help researchers effectively explore, visualize, and interpret the submitted data. These tools do not require specialized knowledge of microarray analysis methods, nor do they require time-consuming download of large data sets.

0 0

Post a comment