1. Advanced mining tips using Entrez searches.

a. Use "History" in the Entrez tool bar to see your previous queries. Each search is assigned a number and is stored for up to 8 h. Previous queries can be combined to form a new search query.

b. Use the "Display" pull-down menu to find related data in other Entrez resources. For example, let us say that your GEO Profiles search has narrowed down to a list of 100 candidate genes. You next want to check the Gene Expression Nervous System Atlas (GENSAT) database for complementary expression evidence for those genes. (GENSAT is another NCBI resource that contains expression mapping information for genes in the mouse brain at various stages of development.) Instead of checking each of the 100 genes individually in GENSAT, you would simply select "GENSAT Links" from the Display menu in GEO Profiles, and you are immediately directed to GENSAT data that corresponds to your 100 candidate genes.

c. GEO mining tools, together with the Entrez features described in steps a and b above, can be combined to form very powerful searches. For example, consider DataSets GDS214 and GDS563—these are independent experiments performed on different arrays that compare normal muscle tissue with muscle tissue from patients affected with Duchenne muscular dystrophy. A user could perform the following set of maneuvers to identify genes that are upregulated in Duchenne patients, in both DataSets.

i. Use cluster analysis in GDS214 to visually select clusters of genes that are highly expressed in Duchenne Samples compared with control.

ii. Use the "Get Profiles" button to export these genes to Entrez GEO Profiles.

iii. Select "Gene Links" from the "Display" pull-down menu—this retrieves a list of corresponding curated genes from NCBI's "Gene" database. From the History tab, you can see that this search is assigned #1.

iv. Repeat the above three steps for GDS563—from the History tab, you can see that this search is assigned #2.

v. Combine these two searches by querying Entrez Gene with "#1 AND #2." This retrieves a list of common genes that are found to be upregulated in Duchenne patients in two separate DataSets. The fact that these genes appear to be similarly regulated in both DataSets lends confidence to the results. This also demonstrates a way to effectively perform cross-platform analyses.

d. The "MyNCBI" feature allows users to save searches and retrieve them in a later session, or monitor how a prior saved search is modified in the context of the current, updated database content. To use the many features of MyNCBI, the user must first establish a login name in that system.

2. Profile neighbor links are subject to cutoff limit. Thus, if this limit is reached, bear in mind that there are probably more genes in the DataSet that demonstrate similar behavior. In this case, you might consider utilizing cluster analyses or the "Query mean group A vs B" tool, which are not subject to such limitations.

3. It is important to realize that different cluster methods will generate different results. An underlying assumption of clustering is that genes with similar expression patterns are more likely to have similar biological function. Clustering does not provide proof of this relationship, but it does provide suggestions for data interpretation.

4. The gene expression value bars are plotted on the left y-axis. Note that this scale slides to fit the values of a particular profile. This sliding scale allows subtle dif ferences in values to be more clearly visualized. The ranks are plotted on the right y-axis and are always scaled from 0 to 100%.

5. Binned rank information is provided as complementary indication of the relative abundance of a gene compared with all other genes on that array. A rank profile that follows the trend of the corresponding value profile provides additional assurance that the data are properly normalized. Keep in mind that cross-gene rank assessments are made with the assumption that all probes are detecting their target with the same efficiency, which may not always be true.

6. The Samples within any comparable DataSet are assumed to have been processed similarly. You can verify that Sample values are well distributed and normalized with respect to each other (and thus comparable) by viewing the "value distribution" chart that is provided on each DataSet record under the "analysis" button. This presents a box and whisker plot for each Sample within the DataSet, allowing easy visualization of the value median, spread, and overall range.

7. The "Sort" button on GEO Profile charts lets users resort the Samples in the DataSet according to a particular experimental variable. This can assist in clearer visualization of an expression trend in experiments with multiple variables.


1. Barrett, T., Suzek, T. O., Troup, D. B., et al. (2005) NCBI GEO: mining millions of expression profiles—database and tools. Nucleic Acids Res. 33, (Database issue) D562-566.

2. Edgar, R., Domrachev, M., and Lash, A. E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30,207-210.

3. Wheeler, D. L., Barrett, T., Benson, D. A., et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33, (Database issue) D39-45.

4. Schuler, G. D., Epstein, J. A., Ohkawa, H., and Kans, J. A. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol. 266, 141-162.

5. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403-410.

6. Recinos, A. 3rd, Carr, B. K., Bartos, D. B., et al. (2004) Liver gene expression associated with diet and lesion development in atherosclerosis-prone mice: induction of components of alternative complement pathway. Physiol. Genomics 19, 131-142.

7. Wu, X., Li, Y., Crise, B., and Burgess, S. M. (2003) Transcription start regions in the human genome are favored targets for MLV integration. Science 300, 17491751.

8. Zerbini, L. F., Wang, Y., Czibere, A., et al. (2004) NF-kappa B-mediated repression of growth arrest- and DNA-damage-inducible proteins 45alpha and gamma is essential for cancer cell survival. Proc. Natl. Acad. Sci. USA 101, 13618-13623.

9. Rodwell, G. E., Sonu, R., Zahn, J. M., et al. (2004) A transcriptional profile of aging in the human kidney. PLoS Biol. 2, e427.

10. Scott, M. S., Perkins, T., Bunnell, S., Pepin, F., Thomas, D. Y., and Hallett, M. T. (2005) Identifying regulatory subnetworks for a set of genes. Mol. Cell. Proteomics Feb 18 Epub.

11. Haverty, P. M., Frith, M. C., and Weng, Z. (2004) CARRIE web service: automated transcriptional regulatory network inference and interactive analysis. Nucleic Acids Res. 32, (Web Server issue) W213-216.

0 0

Post a comment