The Repository

Navigation (registered account and login required): "My Data" menu: "My Repository" option.

Registered users may save retrieved data and/or cluster analyses in their "repository." Preclustering files, or the output of the cluster analysis, may be saved at any stage of data retrieval or analysis (see Subheadings 3.2.4. and 3.2.5.), in which case a summary of all the options selected in the retrieval process is also saved for later viewing. Alternately, preclustering and cluster files may be uploaded from the user's desktop computer, for use with SMD's tools. Users may control access to items in their repositories, making this a useful tool for collaborative research as well as for storing intermediate or final analysis results. Figure 3 shows a view of one example repository, with the various options for examining or analyzing the data.

From the repository, users have access to a variety of tools.

3.5.1. Reentry into Data Retrieval and Analysis Process Navigation (registered account and login required):

"My Data" menu: "My Repository" option: "Filter" or "to-cluster" icon.

Preclustering files stored in the repository may be filtered or clustered as described in Subheadings 3.2.4. and 3.2.5. This makes it convenient to try clustering with several different metrics, or to use and compare multiple filters for reporter (gene) response to experimental conditions, and so forth.

3.5.2. View Clustering Results

Navigation (registered account and login required):

"My Data" menu: "My Repository" option: Clusterheat-map or "TreeView" icon.

Clustered data may be displayed using either the GeneExplorer (10) or Tree View (11) software.

3.5.3. KNN Impute

Navigation (registered account and login required):

"My Data" menu: "My Repository" option: Two-headed arrow "impute" icon.

Many statistical analysis methods require a complete data set, with no missing values. (One example is singular value decomposition; see Subheading 3.5.4.). However, it is very common for values to be missing in microarray data, owing to poor measurement quality or the combination of different array platforms with disjoint reporters. The K-nearest neighbors (KNN Impute) algorithm replaces missing values by estimating them from the values that are present for the most similar reporters (genes) to the one with the missing value (15). The original file is preserved, and the new one (with imputed values) may be analyzed, downloaded, or saved in the repository.

3.5.4. Singular Value Decomposition Navigation (registered account and login required):

"My Data" menu: "My Repository" option: "SVD" icon.

■m aCGH - prostate samples

Amplification and labeling trials

Breast cancer - basal cells, new gene list

Breast cancer - basal cells, new gene list cluster

Breast tumors, luminal A and B

cell line - stress response

Hypoxia short course

Prostate samples - centered and clustered 1 to 8

ho View the repository of igollus u<T?my coiiubi o

View data description



Homo sapiens Homo sap/ens Homo sapiens Homo sapiens Homo sapiens Homo sapiens Homo sapiens Homo sapiens


Date Type Genes Expts,

09/27/04 PCL 41293 7

09/27/04 PCL 17848 9

09/27/04 PCL 267 10

09/27/04 GDI 267 10

09/27/04 PCL 8389 13

09/27/04 PCL 45290 226

09/27/04 PCL 6160 13

09/27/04 COT 8389 13


5962 kB 2196 kB 55 kB 1803 kB 1737 kB 112001 kB 434 kB 35902 kB

Download data

^^jjj Cluster data Kilter Filter and select data

Perform Singular Value Decomposition Collapse data by genes or synthetic genes g View clusters with GeneXplorer fl View clusters with Java TreeView

View clustered heat map i View clustered spot images

* View adjacent clustered heat map and " clustered spot images



Estimate missing data with KNN Impute

Fig. 3. The Repository. Researchers may store analyses at various stages. The various icons lead to analysis tools, as shown.

Singular value decomposition (SVD) is an unsupervised method for finding underlying patterns in data. It may be used for discovery, or for removing systematic biases from the data (e.g., see refs. 16and 17). Complete data are required; if KNN Impute or some other imputation method has not been employed, SMD's SVD tool will offer to use simple imputation by row averaging. Extensive help documentation is available within SMD.

4. To Learn More

Extensive online documentation, as well as PowerPoint and video copies of SMD beginner and advanced tutorials, is available at SMD (http://smd.stanford. edu/). In addition, those interested in installing a local version of SMD may find useful information on the SMD developers' forum, http://smdforum.stanford. edu/smdforum/.


1. Ball, C. A., Awad, I. A., Demeter, J., et al. (2005) The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res. 33, (Database issue) D580-582.

2. Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467-470.

3. Brazma, A., Hingamp, P., Quackenbush, J., et al. (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365-371.

4. Diehn, M., Sherlock, G., Binkley, G., et al. (2003) SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res. 31, 219-223.

5. Boyle, E. I., Weng, S., Gollub, J., et al. (2004) GO:TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20, 3710-3715.

6. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868.

7. Everitt, B. (1974) Cluster Analysis 122, Heinemann, London.

8. Kohonen, T. (1995) Self-Organizing Maps. Springer, Berlin.

9. Tamayo, P., Slonim, D., Mesirov, J., et al. (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907-2912.

10. Rees, C. A., Demeter, J., Matese, J. C., Botstein, D., and Sherlock, G. (2004) GeneXplorer: an interactive web application for microarray data visualization and analysis. BMC Bioinformatics 5, 141.

11. Saldanha, A. J. (2004) Java Treeview—extensible visualization of microarray data. Bioinformatics 20, 3246-3248.

12. Ashburner, M., Ball, C. A., Blake, J. A., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29.

13. Wang, J., Nygaard, V., Smith-Sorensen, B., Hovig, E., and Myklebost, O. (2002) MArray: analysing single, replicated or reversed microarray experiments. Bioin-formatics 18, 1139-1140.

14. Gentleman, R. C., Carey, V. J., Bates, D. M., et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80.

15. Troyanskaya, O., Cantor, M., Sherlock, G., et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520-525.

16. Alter, O., Brown, P. O., and Botstein, D. (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97,10101-10106.

17. Nielsen, T. O., West, R. B., Linn, S. C., et al. (2002) Molecular characterisation of soft tissue tumours: a gene expression study. Lancet 359, 1301-1307.

0 0

Post a comment