Presentation is loading. Please wait.

Presentation is loading. Please wait.

Galaxy for analyzing genome data Hardison October 05, 2010

Similar presentations


Presentation on theme: "Galaxy for analyzing genome data Hardison October 05, 2010"— Presentation transcript:

1 Galaxy for analyzing genome data Hardison October 05, 2010 http://main.g2.bx.psu.edu/

2 Leaders of Galaxy … James Taylor, Anton Nekrutenko Funding from NSF, Huck Institutes of Life Sciences at PSU

3 Browsers vs Data Retrieval Browsers are designed to show selected information on one locus or region at a time. –UCSC Genome Browser –Ensembl Run on top of databases that record vast amounts of information. Sometimes need to retrieve one type of information for many genomics intervals or genome-wide. Access this by querying on the tables in the databases or “data marts” –UCSC Table Browser –EnsMart or BioMart

4 Retrieve all the protein-coding exons in humans

5 Challenges in genomic data analysis We have great browsers and data warehouses –But most lack facilities for performing sophisticated analysis Many useful computational tools have been developed in bioinformatics –But they are not well integrated, they have different user interfaces, different data formats, etc.

6 Some common solutions Glue it all together with Excel –Until you realize Excel cannot handle that much data … Glue it all together with Perl –But that leads to duplication of effort, duplication of bugs, ….

7 A better solution Build a framework that: –Defines a common format to describe interfaces Computational tools Databases –Provides the infrastructure to adapt those interfaces into standard form –Defines common data types and standards for integrating the results

8

9 Two faces of Galaxy A web site where you can easily perform complex analyses integrating various data sources and computational tools A framework to easily build similar sites that integrate your choice of tools and data sources

10 Galaxy: Data retrieval and analysis Data can be retrieved from multiple external sources, or uploaded from user’s computer Hundreds of computational tools, e.g. –Data editing –File conversion –Operations: union, intersection, complement … –Compute functions on data –Statistics: basic to multivariate –EMBOSS tools for sequence analysis –Molecular evolutionary analysis –“Next generation sequence” (NGS) mapping and analysis –Genetic association studies (Rgenetics) Workflows

11 Welcome to Galaxy Welcome screen, changes periodically When tools are invoked, displays information on the tool and allows user to chose parameters

12 Tool choice toggles Info on tool, interface to run it

13 History Titles are toggles; more information is displayed when you click on them Click on the “eye” to see all the data Click on the “pencil” to edit the attributes Click on the “x” to delete Use “options” next to “History” to save, rename, move to or share histories. Must be logged in to do this.

14 Background jobs in Galaxy

15 Galaxy via Table Browser: coding exons

16 2nd page: choose coding exons

17 Coding exons result

18 Compute exon lengths

19 Statistics on exon lengths

20 Make histogram of exon lengths

21 Histogram of exon lengths

22 Sort on exon length

23 Small number of very long exons

24 Longest exon 21,693bp exon in MUC16; average length is 164bp.

25 Get microRNAs and snoRNAs

26 Intersection of coding exons and microRNAs


Download ppt "Galaxy for analyzing genome data Hardison October 05, 2010"

Similar presentations


Ads by Google