Retrieving Strongly Conserved Noncoding Genomic Intervals With GalaxyUCSC Table Browser

1. Enter the Galaxy portal by pointing your Internet browser to the URL http://www. (Table 1) and clicking on "Galaxy."

2. At the Galaxy portal, you are presented with a few options. The first is to go to the UCSC Table Browser to retrieve any of the rich variety of data recorded there and automatically upload it to Galaxy. However, for phastCons and RP scores, it is more efficient to choose "Galaxy featured datasets" (see Note 1). On the new page, select the genome of the species of interest (e.g., Human) and the desired sequence assembly (e.g., hg17: May 2004) (see Note 2). The available options are specific to the genome assembly; for example, hg17 currently offers:

a. Known regulatory regions [93 regions].

b. phastCons (stringent, top approx 5%) [1,313,584 regions].

c. phastCons (sensitive, 80.2) [26,277,600 regions].

d. Regulatory potential (3way, human-mouse-dog, >0) [5,800,931 regions].

3. Regarding phastCons scores, select option b for regions under intense constraint or option c for increased sensitivity (see Note 3). Then click on the button labeled "Go." The results are added to your history page, which is displayed on your computer.

4. The next step is to retrieve the locations of all exons so that they can be removed from the high phastCons intervals. Users should return to the Galaxy Portal by clicking on "Portal" on the top row of the window in your Internet browser. At the Portal, click on the link to the UCSC Table Browser.

5. To retrieve exons, use the Table Browser pull-down menus to select "Genes and Gene Prediction Tracks" under the category of "group" and "Known Genes," found under "track" (see Note 4). If desired, the query can be limited to a particular genomic interval using the window labeled "position" (see Note 5). Because you entered the Table Browser via Galaxy, the default for "output format" is "send data to Galaxy." Now click on "get output."

6. A window appears that gives you the option to select whole genes, exons, coding exons, and so on. Select "Exons," and click on "Send query to Galaxy" (see Note 6).

7. This returns the user automatically to the Galaxy History Page (see Note 7), where each query appears as a short description (see Note 8) followed by the number of results retrieved.

8. In preparation for performing an operation, you need to select the desired datasets. Select the boxes for the queries of high phastCons scores and exons. Now select "Perform operations like intersection, etc." and click on "Go."

9. On the Query Operations page, the two queries now appear, and you should click on the box next to the operation "Subtraction." The screen automatically refreshes. Use the pull-down menus to determine the order and type of subtraction. In this case, it should be the query for phastCons intervals minus the query for the Known-Genes exons, removing "only overlapping segments." Click on "Go" (see Note 9).

10. The user is returned automatically to the History Page, which will show the number of results when the operation has completed. If the operation is listed as "running," the user should click "Refresh" periodically until the operation is finished. The resulting genomic intervals are noncoding, highly conserved DNA segments, which is one class of candidates for CRMs.

11. Galaxy provides several forms of output, which are accessed by clicking "Get output" followed by "Go." At the Display Options page, select "Genome Browsef' to view each of the returned intervals in the UCSC Genome Browser (see Note 10), or "Raw result file" to obtain a file with the desired genomic intervals. Other options include viewing the results in the Ensembl browser (see Note 11).

3.2. Retrieving High-RP, Noncoding Genomic Intervals With Galaxy/UCSC Table Browser

The procedure for finding high-RP intervals via Galaxy is the same as outlined in Subheading 3.1., except that when using the "Galaxy featured data-

sets" (accessed through the Galaxy portal), the user should choose option D "Regulatory potential (3way, human-mouse-dog, >0) [5,800,931 regions]" (see Note 12).

0 0

Post a comment