What is genomesontheCloud ?

Slides:



Advertisements
Similar presentations
Tableau Software Australia
Advertisements

B. Ramamurthy 4/17/ Overview of EC2 Components (fig. 2.1) 10..* /17/20152.
Amazon Web Services and Eucalyptus
SALSA HPC Group School of Informatics and Computing Indiana University.
Cloud Computing Imranul Hoque. Today’s Cloud Computing.
DNAseq analysis Bioinformatics Analysis Team
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
High Throughput Sequencing
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Installing and running COMSOL on a Windows HPCS2008(R2) cluster
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
Application Development On AWS MOULIKRISHNA KOPPOLU CHANDAN SINGH RANA.
NGS Analysis Using Galaxy
Whole Exome Sequencing for Variant Discovery and Prioritisation
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
Copyright © 2011 Partek Incorporated. All rights reserved. Statistics Visualizations Annotations Start-to-Finish Analysis of Integrated Genomics.
Robust Software Tools for Variant Identification and Functional Assessment (Boston College & University of Michigan) Gabor Marth, Goncalo Abecasis, PIs.
Customized cloud platform for computing on your terms !
BickhartADSA Meeting(1) 2013 Tools to Exploit Sequence data to find new markers and Disease Loci in Cattle D. M. Bickhart, H. A. Lewin and G. E. Liu.
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Launch SpecE8 and React from GSS. You can use the chemical analyses in a GSS data sheet to set up and run SpecE8 and React calculations. Analysis → Launch…
NGS data analysis CCM Seminar series Michael Liang:
GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon’s Cloud R. Jay Mashl October 20, 2014.
SALSA HPC Group School of Informatics and Computing Indiana University.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Cloudbiolinux = Genomics resources for use on cloud platforms. Includes: – An Amazon Machine Image with lots of handy bioinformatics software pre-installed.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
Genome STRiP ASHG Workshop demo materials
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
The iPlant Collaborative
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Personalized genomics
Launch Amazon Instance. Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2) provides resizable computing capacity in the Amazon Web Services (AWS) cloud.
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Scaling bio-analyses from computational clusters to grids George Byelas University Medical Centre Groningen, the Netherlands IWSG-2013, Zürich, Switzerland,
Canadian Bioinformatics Workshops
© 2015 MetricStream, Inc. All Rights Reserved. AWS server provisioning © 2015 MetricStream, Inc. All Rights Reserved. By, Srikanth K & Rohit.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI CloudBroker usage Zoltán Farkas MTA SZTAKI LPDS
Inheritance Model testing Andrew Stubbs Dept. Bioinformatics.
From Reads to Results Exome-seq analysis at CCBR
Million Veteran Program: Industry Day Genomic Data Processing and Storage Saiju Pyarajan, PhD and Philip Tsao, PhD Million Veteran Program: Industry Day.
Computing challenges in working with genomics-scale data
NGS File formats Raw data from various vendors => various formats
Day 5 Mapping and Visualization
Cancer Genomics Core Lab
Hadoop Aakash Kag What Why How 1.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Scaling Up Scientific Workflows with Makeflow
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
EMC Galaxy Course November 24-25, 2014
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Haiyan Meng and Douglas Thain
Webinar # April 2017 Isolates in the Cloud
Introduction to Makeflow and Work Queue
ChIP-Seq Data Processing and QC
Data formats Gabor T. Marth Boston College
Different types of Linux installation
Distributing META-pipe on ELIXIR compute resources
Genotype Imputation with Millions of Reference Samples
Computational Pipeline Strategies
Automating NGS Gene Panel Analysis Workflows
Presentation transcript:

What is genomesontheCloud ? gotCloud is a sequence analysis pipeline Integrative Alignment, QC, Variant Calling, Phasing Seamless Requires only simple configuration files Robust ..against unexpected failures & stops Scalable ..to many thousands of genomes gotCloud also provides… A set of many useful software tools Software library (C++) for sequence analysis Briefly Talk about software tools and library included in GotCloud

How can I use gotCloud? Briefly Talk about software tools and library included in GotCloud

GotCloud Alignment Pipeline & QC Single Sample Many Samples GotCloud Alignment Pipeline qplot & verifyBamID bwa / mosaik dedup & recab Processed BAM (one per sample) FASTQ Raw BAM (one per sample) QC End-to-end analysis Fully automated and parallelized (with quality controls) Requires only a simple fastq list file SAMPLE_ID FASTQ1 FASTQ2 Sample1 Sm1_Run1_1.fastq.gz Sm1_Run2_1.fastq.gz Sample2 Sm2.fastq.gz . Sample3 Sm3_1.fastq.gz Sm3_2.fastq.gz

Running GotCloud Alignment Pipeline & QC Single Sample Many Samples Example command ./gotcloud align --list fastq.txt --out outputDir --numjobs 3 --threads 2 ./gotcloud run GotCloud align alignment pipeline --list fastq.txt list of FASTQs --out outputDir where to write output --numjobs 3 run 3 samples concurrently --threads 2 2 CPU threads per sample CRAM support is in beta testing

What to Expect from GotCloud Alignment Pipeline & QC Single Sample Many Samples Aligned and post-processed BAM files Summary statistics and graphs from qplot Contamination checking from verifyBamID With and/or without external genotype data Base Quality GC-bias Insert Size

GotCloud Variant Calling Pipelines Filtering Haplotyping Deep Genomes Shallow Genomes Targeted Exomes Overview of SNP Calling Pipeline Processed BAM (one per sample) Genotype Likelihood (one per sample) Single Sample Many Samples samtools End-to-end analysis Very efficient Small memory (<1G) Scalable to >1,000s High parallelization Fault-tolerant Requires only list of BAMs glfMultiples infoCollector Filtered VCF Unfiltered VCF SVM beagle / thunder Phased VCF Association Results EPACTS (External)

GotCloud Variant Calling Pipelines SNPs Indels Structural Variants Deep Genomes Shallow Genomes Targeted Exomes Example commands ./gotcloud snpcall --list bams.txt --out outputDir --numjobs 10 ./gotcloud indel --list bams.txt --out outputDir --numjobs 10 ./gotcloud genomestrip --list bams.txt --out outputDir --numjobs 10 ./gotcloud ldefine --list bams.txt --out outputDir --numjobs 10 ./gotcloud mei --list bams.txt --out outputDir --numjobs 10* snpcall/indel/genomestrip/mei variant caller to run ldrefine run beagle/thunderVCF genotype refinement --list bams.txt list of bams per sample --out outputDir where to write output --numjobs 10 run 10 jobs concurrently * Coming soon..

GotCloud for Large-scale Sequencing Variant Calling Variant Filtering Many Samples Study Genome Exome N Populations # SNPs 1000 Genomes ~6x ~40x 2,535 Many 69.1M Type 2 Diabetes ~5x ~80x 2,850 Europeans 26.7M Exome Sequencing Project . 6,916 EUR+AFR 1.92M Sardinian Sequencing ~4x 3,520 Sardinians 23.1M Bipolar Sequencing ~12x 2,825 43.7M Nephrotic Syndrome 464 25.6M Age-related Macular Degeneration 3,000 36.2M HUNT 1,200 Norwegians 23.0M

clinically important variants using gotCloud? Can I detect clinically important variants using gotCloud? Briefly Talk about software tools and library included in GotCloud

Variant Call Examples using GotCloud SNPs Indels Structural Variants Deep Genomes Shallow Genomes Targeted Exomes Examples from APOL1 gene Nephrotic syndrome associated genes Available at http://genome.sph.umich.edu/wiki/GotCloud:_Amazon_Demo SNPs : Nephrotic syndrome risk allele – APOL G1 allele 22 36661906 . A G 18 PASS AN=124;AC=2;AF=0.013827… Indels : Nephrotic syndome risk allele – APOL G2 allele 22 36662041 . AATAATT A 756 PASS AN=114;AC=2;AF=0.017544… Structural Variants nearby APOL1 loci 22 36133488 . C <DEL> . PASS AN=124;AC=2;END=36144254...

Variant Call Examples using GotCloud SNPs Indels Structural Variants Deep Genomes Shallow Genomes Targeted Exomes Examples from APOL1 gene Nephrotic syndrome associated genes Available at http://genome.sph.umich.edu/wiki/GotCloud:_Amazon_Demo SNPs : Nephrotic syndrome risk allele – APOL G1 allele 22 36661906 . A G 18 PASS AN=124;AC=2;AF=0.013827… Indels : Nephrotic syndome risk allele – APOL G2 allele 22 36662041 . AATAATT A 756 PASS AN=114;AC=2;AF=0.017544… Structural Variants nearby APOL1 loci 22 36133488 . C <DEL> . PASS AN=124;AC=2;END=36144254...

How can I use gotCloud on the Cloud? Briefly Talk about software tools and library included in GotCloud

GotCloud on the Cloud Alignment & QC Variant Calling Variant Filtering Haplotyping Deep Genomes Shallow Genomes Targeted Exomes GotCloud supports for many high-computing cluster systems MOSIX SLURM Sun Grid Engine (SGE) Portable Batch System (PBS) GotCloud also supports the Amazon Cloud In 1000G low-pass genomes examples… Average cost is $20 per genome GotCloud Amazon Machine Images (AMI) available

GotCloud on Amazon Cloud Alignment & QC Variant Calling Variant Filtering Haplotyping Deep Genomes Shallow Genomes Targeted Exomes Getting started… GotCloud AMI is publicly available The AMI contains example input files You can run today’s demo on your own Adding computing power for your own analysis… GotCloud AMI supports StarCluster (SGE-compatible) You can add as many nodes as you need. You will need to upload your own files to Amazon Have your data mounted or use wget/curl/lftp

or https://github.com/statgen/gotcloud Thanks! http://gotcloud.org/ or https://github.com/statgen/gotcloud https://drive.google.com/file/d/0B9LqMcsR6cysalhwazB5WkVPOTg/view?usp=sharing Gonçalo Abecasis Adrian Tan Mary Kate Wing Terry Gliedt Hyun Min Kang Tom Blackwell Goo Jun Alan Kwong

GotCloud on Amazon Cloud : Launch Amazon Instance Assuming that you set up your account, security key, and also that you selected GotCloud AMI and machine size (instance type).. Selected Instance Type Click Launch When Ready

GotCloud on Amazon Cloud : Connecting to Your Instance Click “Connect” to Launch the terminal when Ready After Launching, Check if the State is “Running”

GotCloud on Amazon Cloud : Connecting to Your Instance Username (e.g. ubuntu) Path to your key (previously selected)

GotCloud on Amazon Cloud : Running GotCloud List of BAMs per Sample Check if the demo input files are available

GotCloud on Amazon Cloud : Running GotCloud Type a GotCloud command line

GotCloud on Amazon Cloud : Checking Output Files Examine the output directory

GotCloud on Amazon Cloud : Checking Output Files Examine the variant of interest

GotCloud on Amazon Cloud : Don’t Forget to Terminate