Setups for the General Procedures

1. To create a database, download the Toolkit from bina/Data.htm. The programs in the Toolkit were written to populate and access the database tables via the DBI module, which is database independent (see Note 3). Other database engines (e.g., Oracle) may be used by installing the correct Perl DBD, for database-dependent modules (see Note 3), and defining the resulting database name in the environment variable DBI_DSN. Subheading 3.5. provides a description of the programs included in the Toolkit. In addition to the programs, the Toolkit includes two Perl modules that contain special-function packages used by the scripts.

2. Place the two Perl modules (Util9mer and mySequence::Experiment) in Perl's site library directory (see Note 5). Put directly in the site/lib directory and put in a subdirectory named mySequence.

3. Download and install the Getopt::Long module used to parse the command line parameters of the Toolkit's programs (see Note 3). The general method for installing Perl modules is demonstrated by the installation of Getopt::Long module. Assuming that the tape archive file (tar) file is in your current working directory, the following procedure is needed for testing and installing the module:

shell> tar -xvf Getopt-Long-2.28.tar shell> cd Getopt-Long-2.28 shell> perl Makefile.PL shell> make shell> make test shell> make install

The last of these commands require administrative (root) access to place the files properly in the Perl directory tree.

4. Download and Install the BioPerl modules in the lib/site subdirectories (see Note 4). BioPerl modules come with an installer, quite similar to the one described in Subheading 3.1., step 3, to place the packages correctly into the Perl directory tree (see Notes 2 and 5). The Toolkit uses several of these modules in the package. BioPerl requires several other modules that you will need to install (see Note 3). The installation for each is similar to the Getopt::Long and BioPerl installations.

5. Set the environment variables required for the Toolkit programs: SEQUENCE_ ARCHIVE (root directory of the promoter sequences to be registered in the database); SEQUENCE_SOURCE (default directory of genomic DNA sequences that the user may want to examine (i.e., for research or publication); STORED_PROCEDURES (default directory for prewritten queries used by the scripts), i.e., DBI_USER (database user name) and DBI_DSN (database interface and database name), e.g., mysql: RFgenomeDB (see Note 6).

6. Download a copy of the genome of interest (e.g., human genome) in FASTA format. The sequence of the chromosomes can be retrieved from the ftp site at the Genome Browser at the University of California at Santa Cruz (UCSC; see Note 7). The page includes links to the listed genomes (see Note 8). Follow the instructions at the UCSC website to download the data that you want. In a previous publication, we analyzed an older version of the sequences of the human chromosomes (4). For human sequences you can find all FTP downloads at ftp://hgdownload.cse. The May 2004 sequence set is at ftp://hgdownload.cse.ucsc. edu/goldenPath/hg17/bigZips; the dataset split up into one file per chromosome is at

7. Download the sequences of promoter regions of interest in a single FASTA-for-matted file (see Note 9). These can be downloaded from the Genome Browser at UCSC (9-11).

3.2. Initializing the Database

In MySQL, as in any other SQL database engine, you use the "CREATE DATABASE" statement to make an empty database (e.g., RFgenomeDB) that can hold the database tables to be populated with records (see Note 10). Subsequently, you grant privileges for access, to yourself (e.g., joe) and other users. The "GRANT" command provides access to the database for all or specified users (see Note 11).

shell> mysql -u root mysql> CREATE DATABASE "RFgenomeDB"

mysql> GRANT ALL PRIVILEGES ON RFgenomeDB.* TO "[email protected]" mysql> quit

0 0

Post a comment