Genome STRiP ASHG Workshop demo materials

Slides:



Advertisements
Similar presentations
© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
Advertisements

What is genomesontheCloud ?
A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Exploiting SNP polymorphism data Formation Bio-informatique, 9 au 13 février 2015.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
The Phase 1 Variant Set and Future Developments
NGS Analysis Using Galaxy
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
Robust Software Tools for Variant Identification and Functional Assessment (Boston College & University of Michigan) Gabor Marth, Goncalo Abecasis, PIs.
Polymorphism and Variant Analysis Lab
Customized cloud platform for computing on your terms !
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Launch SpecE8 and React from GSS. You can use the chemical analyses in a GSS data sheet to set up and run SpecE8 and React calculations. Analysis → Launch…
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
-- Don Preuss NCBI/NLM/NIH
GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon’s Cloud R. Jay Mashl October 20, 2014.
Esri UC 2014 | Technical Workshop | Designing and Using Cached Map Services Tom Brenneman & Eric Rodenberg.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
FTP Short for File Transfer Protocol, the protocol for exchanging files over the Internet.protocolfilesInternet works in the same way as HTTP for transferring.
Build an Automated Workflow Visual Workflow Creator Discovery Environment.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
Intro to Web Services Dr. John P. Abraham UTPA. What are Web Services? Applications execute across multiple computers on a network.  The machine on which.
Call in: Participant Passcode: Centra: Meeting ID: ICR_WShttp://ncicb.centra.com August 11, 2010 ICR-WS Meeting.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
EValid LoadTest, eV.manger and Validation. Agenda Load Test capability of eValid How to execute load test by using eValid Introduction to eV.manager Validation.
Launch Amazon Instance. Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2) provides resizable computing capacity in the Amazon Web Services (AWS) cloud.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Canadian Bioinformatics Workshops
© 2015 MetricStream, Inc. All Rights Reserved. AWS server provisioning © 2015 MetricStream, Inc. All Rights Reserved. By, Srikanth K & Rohit.
GETTING STARTED WITH AWS AND PYTHON. OUTLINE  Intro to Boto  Installation and configuration  Working with AWS S3 using Bot  Working with AWS SQS using.
DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is a federally funded research.
Inheritance Model testing Andrew Stubbs Dept. Bioinformatics.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
GIAB: Genome reference material development resources for clinical sequencing Chunlin Xiao 1, Justin Zook 2, Shane Trask 1, Melissa Landrum 1, Marc Salit.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
6. Application Server Issues for the Project
Enhancements to Galaxy for delivering on NIH Commons
Canadian Bioinformatics Workshops
University of Chicago and ANL
Call SNPs & Infer Phylogeny (CSI Phylogeny)
Get the Most Out of GoAnywhere: Agents
Tools and Services Workshop Overview of Atmosphere
Introduction to RAD Acropora millepora.
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Figure 2: Make a component
MiSeq Validation Pipeline
OGSA Data Architecture Scenarios
Complete 1z0-161 Exam Dumps - Pass In 24 Hours - Dumps4download.us
Multi-host Internet Access Portal (MIAP) Enhancement Guide
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Haiyan Meng and Douglas Thain
Building a Database on S3
Genomic Formats and the HLA Data Standard
Chapter 1 Introduction(1.1)
Canadian Bioinformatics Workshops
Alignment and CNV analysis in cattle
The Variant Call Format
Presentation transcript:

Genome STRiP ASHG Workshop demo materials Bob Handsaker October 19, 2014

Running Genome STRiP directly on AWS

Cloud demo: Genome STRiP command line StarCluster Cloud Storage Sequencing data Amazon Web Services Genome STRIP

Cloud computing scenarios Why are people interested in Genome STRiP on the cloud? Increase compute and storage capacity for large-scale processing Large genome studies Economical and with short lead time Utilize data sets that are stored in the cloud Public data sets (e.g. 1000 Genomes) Data sharing with collaborators No need to download bulky data to each site

Cookbook recipe: Genotyping in 1000 Genomes Phase 1 Inputs A site VCF file describing the variants (e.g. large deletions) to genotype Outputs Genotype VCF file Plots for quality control 1000 Genomes Data You choose the BAM file location: Cached copy on Amazon S3 storage HTTP from NCBI or EBI StarCluster Uses the StarCluster software from MIT for Amazon EC2 provisioning http://star.mit.edu/cluster

Demo Show input vcf file in local directory starcluster put gs-cluster example.vcf example.vcf starcluster sshmaster gs-cluster ./genotype-sites.sh example.vcf run1 (show output) (log out) starcluster get gs-cluster run1 run1 Show vcf in textedit Show genotyping plot pdf

Cloud computing support in Genome STRiP Remote BAM file access Support for multiple file access protocols in addition to local files HTTP / HTTPS FTP Amazon S3 protocol Pre-computed metadata for 1000 Genomes Phase 1 and Phase 3 Eliminates the need to run Genome STRiP preprocessing Avoids the need to download the 1000 Genomes BAM files Metadata is relatively compact: 5Gb (Phase1) and 13Gb (Phase 3) ftp://ftp.broadinstitute.org/pub/svtoolkit/public_metadata/ Cookbook recipes for common scenarios Genotyping variants in 1000 Genomes samples

Genome STRiP cookbook

Sample genotyping output Standard VCF file with sample genotypes ##fileformat=VCFv4.1 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00100 20 3821195 DEL_2_99615 A <DEL> . . END=3825137 GT:FT:GQ 0/0:PASS:71 0/1:PASS:14 Genotyping plot for visual verification Histogram of normalized read depth Colors indicate confident calls (gray samples are below 95% confidence) Small numbers on plot indicate evidence from read pairs or split reads

Command summary starcluster start gs-cluster -s 1 starcluster put gs-cluster example.vcf example.vcf starcluster sshmaster gs-cluster ./genotype_sites.sh example.vcf run1 starcluster get gs-cluster run1 run1 starcluster terminate gs-cluster Launch Amazon compute cluster Copy input file from local to cloud Log in to remote cluster Run genotyping command script Copy output files from cloud to local Shut down compute cluster

For more information …. Bonus evening session Tonight (Monday) 6:30 – 8:00 PM Room 24, Upper Level Web site http://www.broadinstitute.org/software/genomestrip Support forum (Genome STRiP topic in GATK forum) http://gatkforums.broadinstitute.org/categories/genomestrip AWS Support In Genome STRiP Seva Kashin Poster 603 T (Tuesday afternoon) Multi-allelic copy number variation in humans Early look at upcoming Genome STRiP functionality for duplications and multi-allelic CNVs

Intro Slides for Gabor

Genome STRiP Genome STRucture in Populations Integrates multiple features of sequence data with population-based patterns across many individuals Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat Genet 43, 269-76 (2011)

Genome STRiP Structural variation analysis from sequence data Integrative Combines multiple feature of the sequence data (read pairs, read depth, split reads) Integrative approaches have consistently shown higher accuracy Population-aware Increases power and accuracy Particularly important for low-coverage genomes Modular architecture Discovery of new variants Genotyping of newly discovered variants and/or known variants Includes tools for QC / analysis Initial prototype developed for analyses in 1000 Genomes Project Low false discovery rate and high sensitivity

Demo Slides