BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

John Dorband, Yaacov Yesha, and Ashwin Ganesan Analysis of DNA Sequence Alignment Tools.
High Throughput Sequencing
Dale Beach, Longwood University Lisa Scheifele, Loyola University Maryland.
BigBed/bigWig remote file access Hiram Clawson UCSC Center for Biomolecular Science & Engineering.
UCSC Archaeal genome browser Advanced browsing September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe September 19, 2006 David Bernick, Aaron Cozen.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
NGS Analysis Using Galaxy
Whole Exome Sequencing for Variant Discovery and Prioritisation
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Li and Dewey BMC Bioinformatics 2011, 12:323
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
GBS Bioinformatics Pipeline(s) Overview
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
NGS data analysis CCM Seminar series Michael Liang:

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Next Generation DNA Sequencing
VarDetect: a nucleotide sequence variation exploratory tool VarDetect Chumpol Ngamphiw 1, Supasak Kulawonganunchai 2, Anunchai Assawamakin 3, Ekachai Jenwitheesuk.
EDACC Primary Analysis Pipelines Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
TOOLS FOR HTS ANALYSIS Michael Brudno and Marc Fiume Department of Computer Science University of Toronto.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
SHRiMP: The SHort Read Mapping Package Michael Brudno Department of Computer Science University of Toronto 11/09/08.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Lei Kong, Ph.D. Center for Bioinformatics Peking University ABrowse - A General Purpose Genome Browser Framework.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Canadian Bioinformatics Workshops
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Visualizing data from Galaxy
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.
Canadian Bioinformatics Workshops
Galaxy for analyzing genome data Hardison October 05, 2010
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Using command line tools to process sequencing data
Day 5 Mapping and Visualization
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Lesson: Sequence processing
Gil McVean Department of Statistics
Integrative Genomics Viewer (IGV)
Development of an interactive pipeline for Genome wide association analysis Falola Damilare & Adigun Taiwo – Covenant University Bioinformatics research.
NGS Analysis Using Galaxy
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
BF528 - Genomic Variation and SNP Analysis
Canadian Bioinformatics Workshops
Presentation transcript:

BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto

1. what we do, our tools 2. Savant Genome Browser Outline Savant Genome Browser -

WHAT WE DO Savant Genome Browser -

main focus: genomic analysis using output from high- throughput sequencing (HTS) machines high throughput: sequence billions of nucleotides per week poor data quality: “reads” are shorter; error profiles are poorly understood What we do Savant Genome Browser -

HTS Pipeline Savant Genome Browser -

What to do with all these reads? Savant Genome Browser -

1. Assembly Savant Genome Browser - ASSEMBLY: reconstruct the donor’s genome “HapSembler”: specialized for highly polymorphic species

2. Alignment Savant Genome Browser - ALIGNMENT find region in a “reference” genome that matches closely with each read; suggests similar origin from “donor” “SHRiMP”: Short Read Mapping Package

3. Genetic Variation Discovery Savant Genome Browser - GENETIC VARIATION DISCOVERY find differences between two genomes between donor and reference between two samples (e.g. tumour vs. normal) “VARiD”, “MODiL”, and “CNVer”

Genetic Variation Single Nucleotide Polymorphism (SNP): genomes have different nucleotides at corresponding positions VARiD – VARiation IDentification Insertions and Deletions (Indels): genomes have additional sequence put in or sequence taken out at corresponding locations MODiL – Mixtures of Distributions Indel Locator Copy Number Variation (CNV): genomes have a different number of the same sequence CNVer Savant Genome Browser -

Our Bioinformatics Tools READ MAPPING (SHRiMP) SNP DETECTION (VARiD) SNP DETECTION (VARiD) INDEL DETECTION (MODiL) INDEL DETECTION (MODiL) CNV DETECTION (CNVer) CNV DETECTION (CNVer) ASSEMBLY (HapSembler) ASSEMBLY (HapSembler) VISUALIZATION (SAVANT) VISUALIZATION (SAVANT) COMPRESSION

SAVANT GENOME BROWSER Savant Genome Browser -

Genome Browsing, the old way Savant Genome Browser -

Challenge presented by HTS datasets genomic data is generated in high volumes HTS machines generate billions of bases per run interpretation and analysis challenge typical pipeline employs many separate tools for computation and visualization Savant Genome Browser -

Tools for HTS data analysis ToolCostComputationVisualization Read Alignment e.g. Bowtie, BWA FreeYN File Format Conversion e.g. Galaxy, SAMTools FreeYN Other Comand-line Tools e.g. Genetic Variation Discovery, Comparitive Genomics, etc. FreeYN UCSC Genome BrowserFreeNY Integrative Genomics ViewerFreeNY GBrowseFreeNY CLC Genomics Workbench$$$YY Savant Genome Browser - substantial disconnect between the processes of computational analysis and visualization

Tools for Genomic Data Analysis ToolCostComputationVisualization Read Alignment e.g. Bowtie, BWA FreeYN File Format Conversion e.g. Galaxy, SAMTools FreeYN Other Comand-line Tools e.g. Genetic Variation Discovery, Comparitive Genomics, etc. FreeYN UCSC Genome BrowserFreeNY Integrative Genomics ViewerFreeNY GBrowseFreeNY CLC Genomics Workbench$$$YY Savant Genome BrowserFreeYY Savant Genome Browser - substantial disconnect between the processes of computational analysis and visualization

ASIDE: Cytoscape? platform for visual analysis of networks extensive plugin framework Savant Genome Browser - Bader Lab

Savant Genome Browser platform for integrated visual analysis of genomic data feature-rich genome browser computationally extensible via plugin framework Savant Genome Browser -

(Very) Short List of Features FASTA, BED, WIG, GFF, tab-delimited, BAM local and remote Data Format Support pack, squish for BED and GFF tracks mismatch, SNP, matepair modes for BAM tracks Alternative Visualization Modes very fast data access (<1s) small memory footprint (<250 MB) Speed and Interactivity sessions, bookmarking of interesting regions, track locking, data selection Extras ~ none, works on all major operating systems System Requirements Savant Genome Browser -

FEATURE DEMONSTRATION Savant Genome Browser - INTERFACE HTS READ ALIGNMENTS EXAMPLE PLUGIN: SNP FINDER

Power of visual analytics task: find the correct parameter for command-line tool Savant Genome Browser -

Plugin Framework unlocks the potential for performing visual analytics beneficial for both users and tool developers tool developers: simple platform for development and dissemination of work plugin development is easy API contains over a hundred prebuilt functions (e.g. get track data, add bookmarks, draw custom graphics, etc.) Savant Genome Browser -

CONCLUSIONS Savant Genome Browser -

Conclusions Savant is a platform for integrated visualization and analysis of genomic data stand-alone genome browser novel features: e.g. table view, visualization modes, data selection, etc. computationally extensible through plugin framework makes interpretation and analysis of genomic data easier and more efficient Savant Genome Browser -

Acknowledgements RecepAndrewVladMike Brudno YueMarc Vanessa Orion JoeNilgun Paul Vera MiskoYoni

Thanks! Savant Genome Browser -