Download presentation
Presentation is loading. Please wait.
Published byGloria Garrison Modified over 8 years ago
1
Galaxy for analyzing genome data Hardison October 05, 2010 http://main.g2.bx.psu.edu/
2
Leaders of Galaxy … James Taylor, Anton Nekrutenko Funding from NSF, Huck Institutes of Life Sciences at PSU
3
Browsers vs Data Retrieval Browsers are designed to show selected information on one locus or region at a time. –UCSC Genome Browser –Ensembl Run on top of databases that record vast amounts of information. Sometimes need to retrieve one type of information for many genomics intervals or genome-wide. Access this by querying on the tables in the databases or “data marts” –UCSC Table Browser –EnsMart or BioMart
4
Retrieve all the protein-coding exons in humans
5
Challenges in genomic data analysis We have great browsers and data warehouses –But most lack facilities for performing sophisticated analysis Many useful computational tools have been developed in bioinformatics –But they are not well integrated, they have different user interfaces, different data formats, etc.
6
Some common solutions Glue it all together with Excel –Until you realize Excel cannot handle that much data … Glue it all together with Perl –But that leads to duplication of effort, duplication of bugs, ….
7
A better solution Build a framework that: –Defines a common format to describe interfaces Computational tools Databases –Provides the infrastructure to adapt those interfaces into standard form –Defines common data types and standards for integrating the results
9
Two faces of Galaxy A web site where you can easily perform complex analyses integrating various data sources and computational tools A framework to easily build similar sites that integrate your choice of tools and data sources
10
Galaxy: Data retrieval and analysis Data can be retrieved from multiple external sources, or uploaded from user’s computer Hundreds of computational tools, e.g. –Data editing –File conversion –Operations: union, intersection, complement … –Compute functions on data –Statistics: basic to multivariate –EMBOSS tools for sequence analysis –Molecular evolutionary analysis –“Next generation sequence” (NGS) mapping and analysis –Genetic association studies (Rgenetics) Workflows
11
Welcome to Galaxy Welcome screen, changes periodically When tools are invoked, displays information on the tool and allows user to chose parameters
12
Tool choice toggles Info on tool, interface to run it
13
History Titles are toggles; more information is displayed when you click on them Click on the “eye” to see all the data Click on the “pencil” to edit the attributes Click on the “x” to delete Use “options” next to “History” to save, rename, move to or share histories. Must be logged in to do this.
14
Background jobs in Galaxy
15
Galaxy via Table Browser: coding exons
16
2nd page: choose coding exons
17
Coding exons result
18
Compute exon lengths
19
Statistics on exon lengths
20
Make histogram of exon lengths
21
Histogram of exon lengths
22
Sort on exon length
23
Small number of very long exons
24
Longest exon 21,693bp exon in MUC16; average length is 164bp.
25
Get microRNAs and snoRNAs
26
Intersection of coding exons and microRNAs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.