Further Analysis

BEARR was not intended to be an all-in-all one-stop sequence analysis portal. A number of more specialized tools are available for more advanced analysis. Following a BEARR analysis, users should statistically assess the significance of the findings, to see whether they agree with the prior hypotheses. The right statistical tests need to be chosen, or an ad hoc one should be properly devised. More than that, at the sequence level, users can easily download the extracted sequences and analyze those using different tools. Discovery of overrepre-sented motifs, using algorithms like MEME (8), GLAM (9), or MITRA (10), could potentially uncover important novel binding sites. On top of additional in silico analyses, relevant biological assays should be accurately designed to both validate the findings and discover new biology.

5. Summary

BEARR offers ease and convenience for experimental scientists who wish to perform high-throughput large-scale basic cis--regulatory region analysis. Developed as a web-based tool, it is extremely accessible and consumes virtually no processing power at the user end. Its two main modules, sequence extraction and sequence analysis, support automatic rapid extraction of regulatory regions based on gene identifiers and provide the first step of binding site analysis. Results of BEARR are easily construed. The raw output could also be effortlessly subjected to further in silico investigations.


Development of this tool and the computational resource described here are supported by funding from the Biomedical Research Council (BMRC) of the Agency for Science, Technology, and Research (A*STAR) in Singapore.


1. Vega, V. B., Bangarusamy, D. K., Miller, L. D., Liu, E. T., and Lin, C.-Y. (2004) BEARR: batch extraction and analysis of cis-regulatory regions. Nucleic Acids Res. 32, W257-260.

2. Stormo, G. D. (1990) Consensus patterns in DNA. Methods Enzymol. 183, 211-221.

3. Wingender, E., Dietze, P., Karas, H., and Knuppel, R. (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 28, 316-319.

4. Pruitt, K. D. and Maglott, D. R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acid Res. 29, 137-140.

5. Kleene, S. (1956) Representation of events in nerve nets and finite automata, in Automata Studies (Shannon, C. and McCarthy, J., eds.), Princeton University Press, Princeton, NJ, pp. 3-42.

6. Suzuki, Y., Yamashita, R., Nakai, K., and Sugano, S. (2002) DBTSS: DataBase of human transcriptional start sites and full-length cDNAs. Nucleic Acids Res. 30, 328-331.

7. Metropolis, N. and Ulam, S. (1949). The Monte Carlo method. J. Am. Stat. Assoc. 44, 335-341.

8. Bailey, T. L. and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers, in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Stanford University, AAAI Press, Menlo Park, CA, pp. 28-36.

9. Frith, M. C., Hansen, U., Spouge, J. L., and Weng, Z. (2004) Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32, 189-200.

10. Eskin, E. and Pevzner, P. A. (2002) Finding composite regulatory patterns in DNA sequences, in Special Issue Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB-2002). Bioinformatics Suppl. 1, S354-363.

0 0

Post a comment