DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.

Slides:



Advertisements
Similar presentations
Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Advertisements

What is genomesontheCloud ?
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Bioinformatics for DNA-seq and RNA-seq experiments
Next–generation DNA sequencing technologies – theory & practice
DNAseq analysis Bioinformatics Analysis Team
High Throughput Sequencing
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Bioinformatics and the Engineering Library ASEE 2008 Amy Stout.
Dawei Lin, Ph.D. Director, Bioinformatics Core UC Davis Genome Center July 20, 2008, SLIMS (Solexa sequencing.
Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.
High Throughput Sequencing
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
11 © 2009 PerkinElmer © 2010 PerkinElmer November 20, 2012 DNA Services Overview.
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Bioinformatics Tips NGS data processing and pipeline writing
NGS Analysis Using Galaxy
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
Bioinformatics Core Facility Ernesto Lowy February 2012.
OOI CI R2 Life Cycle Objectives Review Aug 30 - Sep Ocean Observatories Initiative OOI CI Release 2 Life Cycle Objectives Review CyberPoPs & Network.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.
Globus Genomics – Science as a Service for large scale NGS analysis
David R. McWilliams, Ph.D. Section of Statistical Genetics, Department of Biostatistical Sciences, Center for Public Health Genomics Bioinformatician IV.
-- Don Preuss NCBI/NLM/NIH
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Data Workflow Overview Genomics High- Throughput Facility Genome Analyzer IIx Institute for Genomics and Bioinformatics Computation Resources Storage Capacity.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Cancer Center Support Grant Site Review Date Cancer Center Support Grant Site Review Date Genomics High-Throughput Facility (GHTF) and Bioinformatics Core.
Genome STRiP ASHG Workshop demo materials
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Personalized genomics
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
JAX: Exploring The Galaxy Glen Beane, Senior Software Engineer.
The StratusLab Distribution and Its Evolution 4ème Journée Cloud (Bordeaux, France) 30 November 2012.
From Reads to Results Exome-seq analysis at CCBR
GIAB: Genome reference material development resources for clinical sequencing Chunlin Xiao 1, Justin Zook 2, Shane Trask 1, Melissa Landrum 1, Marc Salit.
Canadian Bioinformatics Workshops
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Data and Hartwig Medical Foundation
Short Read Sequencing Analysis Workshop
Cancer Genomics Core Lab
Solutions to Clinical Data Visualization and Analysis
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
Cloud based NGS data analysis
Short Read Sequencing Analysis Workshop
Scalable systems.
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Material for today’s workshop is at:
TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction.
Genomic & RNA Profiling Core Facility
Automating NGS Gene Panel Analysis Workflows
Campus and Phoenix Resources
Presentation transcript:

DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak 3, E. T. Geller 1, Y.-C. Hwang 2,4, E. A. Tsai 4,5, A. B. Partch 1,2, G. D. Schellenberg 1, L.-S. Wang 1,2 1) Department of Pathology and Laboratory Medicine, University of Pennsylvania. Philadelphia, PA; 2) Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA; 3) Department of Physics, University of Washington, Seattle, WA; 4) Genomics and Computational Biology Graduate Group, University of Pennsylvania. Philadelphia, PA; 5) Department of Pathology and Laboratory Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA. Next-generation sequencing (NGS) has redefined what big data means in biomedical research. Advances in quality and capacity have led to a declining cost of implementation, allowing NGS to be used in a wide range of experiments at a variety of scales; from a few samples in small laboratories to thousands of samples from multi- institute collaborations. Processing terabytes of data requires a certain level of information technology and bioinformatics expertise, which can be daunting to small laboratories with limited resources. The programs we developed will enable these groups to process DNA-seq data and identify single-nucleotide variants and small insertions and deletions (indels). Introduction Integrates open-source programs to analyze DNA-seq data in a Linux environment GATK ( SAMtools ( BWA ( PICARD ( SnpEff ( Operates on distributed resource management system (Oracle Grid Engine) Job dependency and error checking Available on Amazon Elastic Cloud Computing DRAW: DNA Resequencing Analysis Workflow Acknowledgements We thank the constructive input from members of the Schellenberg and Wang labs, collaborators from the ARRA autism sequencing consortium, Nancy B. Spinner, Samir Wadwahan, Maja Bucan, Chris Stoeckert, and members of the Penn HTS group. Funding: The authors gratefully acknowledge funding from NIMH (R01 MH089004, R01 MH094382, and R01 MH094382), NIA (U24 AG041689, U01 AG032984, P30 AG010124), NINDS (P50 NS053488), and CurePSP Foundation. SneakPeek: Quality Metrics Management System Provides an overview of all samples processed through a dynamic web interface Allows user to assess quality of sequencing data Identify samples with unusual QC metric(s) Identify batch problems DRAW+SneakPeek Availability Released under the MIT license Free for academic and non-profit use Available at the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) ( Source code Amazon Machine Images (AMIs) Install guide, documentation, tutorial Running DRAW One command will run all three phases of DRAW: Phase 5: Import into MySQL tables using in- house scripts Phase 3: Variant and coverage using GATK/snpEff Phase 2: QC using GATK/Picard/Samtools Phase 1: Mapping using BWA Inpu t Demultiplexed FastQ filesAlign reads, Paired ends Mark duplicates, Local realignment, Base quality recalibration Variant detection, filtration, annotation Quality metrics on SneakPeek Read, Base/Depth Coverage, QC metrics Annotated VCF file One flow cell: Illumina Hi-Seq 2000, 100-bp pair-end, ~350 Gb, 34 multiplexed samples using Nimblegen Human Exome v2 Library. 1.1TB data in two days; total cost $528 Running DRAW on Amazon Guide available on NIAGADS.org What Motivates Draw+SneakPeek Features of DRAW Running DRAW on Amazon EC2: A benchmark study Workflow Comparison A comparison of DRAW+SneakPeek with other workflows. Reference Lin CF, Valladares O, Childress DM, Klevak E, Geller ET, Hwang YC, Tsai EA, Schellenberg GD, and Wang LS. DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA- Seq Experiments. Bioinformatics, Oct 1;29(19): Epub 2013 Aug 13.