Personalized genomics

Slides:



Advertisements
Similar presentations
Data analytics for better patient genetics
Advertisements

DNAseq analysis Bioinformatics Analysis Team
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
Ruibin Xi Peking University School of Mathematical Sciences
Next-generation sequencing
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
High Throughput Sequencing
Tests utilizing read data- Andrew, Yu Under development Use number of reads and proportion of variant reads at a site directly Case-control burden and.
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
Bioinformatics Tips NGS data processing and pipeline writing
The Phase 1 Variant Set and Future Developments
NGS Analysis Using Galaxy
NGS Workshop Variant Calling
Whole Exome Sequencing for Variant Discovery and Prioritisation
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | PowerPoint by Casey Hanson.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
PhenCode Linking Human Mutations to Phenotype. PhenCode Brings the deep information on genotypes and phenotypes in locus specific databases (LSDBs) into.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
RNA-seq workshop ALIGNMENT
Bioinformatics. Sequence information Mapping information Phenotypic information Literature Prediction programs -Gene prediction -Promotor prediction -Functional.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
ChrGeneticist introduction for reviewer Jinlian Wang 10/8/2014.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Introduction to RNAseq
Computational methods for genomics-guided immunotherapy Sahar Al Seesi Computer Science & Engineering Department, UCONN Immunology Department, UCONN Health.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
© 2012 Genomatix GeneGrid finding disease causing variants in NGS data Claudia Gugenmus Genomatix Software GmbH Bayerstrasse 85a
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Canadian Bioinformatics Workshops
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Introduction to Variant Analysis of Exome- and Amplicon sequencing data Lecture by: Date: Training: Extended version see: Dr. Christian Rausch 29 May 2015.
HOMER – a one stop shop for ChIP-Seq analysis
How do we interpret the variants?. Overview How do we prioritize the filtered variants? What filters can be used to identify the causative variants? What.
Identifying disease causal variants Mendelian disorders A. Mesut Erzurumluoglu 1.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Inheritance Model testing Andrew Stubbs Dept. Bioinformatics.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Clinical Interpretation and Implications of Whole-Genome.
From Reads to Results Exome-seq analysis at CCBR
Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG.
SNP and Genomic analysis SNP/genomic signature Clinical sampling Personalized chemotherapy Personalized Targeted therapy Personalized RNA therapy Personalized.
Canadian Bioinformatics Workshops
Data and Hartwig Medical Foundation
Interpreting exomes and genomes: a beginner’s guide
Cancer Genomics Core Lab
CSE 182 Project.
Interpretation Next Generation Sequencing (Bench Clinic)
Introduction to RAD Acropora millepora.
EMC Galaxy Course November 24-25, 2014
Whole-exome sequencing for RH genotyping and alloimmunization risk in children with sickle cell anemia by Stella T. Chou, Jonathan M. Flanagan, Sunitha.
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Validation of a Next-Generation Sequencing Pipeline for the Molecular Diagnosis of Multiple Inherited Cancer Predisposing Syndromes  Paula Paulo, Pedro.
Deep Phenotyping for Deep Learning (DPDL): Progress Report
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing  Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.
Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD
New Statistical Methods for Family-Based Sequencing Studies
Computational Pipeline Strategies
A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease  Damian Smedley, Max Schubach, Julius O.B.
Analysis of protein-coding genetic variation in 60,706 humans
Automating NGS Gene Panel Analysis Workflows
Presentation transcript:

Personalized genomics

Goal Input Output Genomic sequence (WGS) from family Pedigree & affectedness Disease (standard ontology needed) Output Genes/mutations relevant to the disease.

Disease Gene/Mutation Pedigree & Affectedness Genomic Sequence BAM VCF FASTQ GRCh37 Read Mapping known sites BAM prep SNV Calling SV Calling & Validation Merge VCFs dbSNP SeattleSeq Variant annotation HGMD Variant filtering Disease Gene/Mutation

Steps Sequencing Mapping BAM Prep Variant calling (SNV) Variant Calling (SV) VCF manipulation/merging Variant annotation Variant filtering Disease gene association

1. Sequencing Platform Mode HiSeq/MiSeq PacBio Ion proton (life) CGI … Whole Genome Exome RNA-Seq

2. Mapping Map short reads (FASTQ format) to a reference Output a BAM file Mapping tools BWA Bowtie Custom Compute/disk intensive part of the pipeline. WGS file size: ~200Gb per sample.

3. BAM Prep Input: BAM file Output: BAM file Sorting BAM Picard Tools Marking (PCR) Duplicates INDEL Re-alignment GATK Base-Q Covariates & Recalibration Compute intensive part of the pipeline

4. Variant Calling Input: multiple BAMs Output: VCF (loci that differ from the reference) SNVs Broad’s GATK Caller SVs Custom pipelines needed Browsing variant calls Genome Savant Confirming variants via resequencing Compute intensive part of the pipeline. Integrating SVs and SNVs.

5. SV calling & validation Extract FASTQ Bowtie BreakDancer CNVer Reprever Zygosity calling GQL+Genome Savant VCF merging and validation

Push-button pipeline or VM BAM Extract FASTQ Bowtie unpaired BAM GATK GQL BreakDancer CNVer SNPs (VCF) SVs CNVs genome browser VCF to CSV script Known SNPs de novo SNPs ISCA Reprever Recombination Blocks Insertions True positive SNPs Zygosity calls Validation Filters Our Code Deleteriousness (Nitin) Red arrows indicate functional Interesting SVs Interesting SNPs VCF

6. Merging VCFs Given multiple VCF files, merge them (each column corresponds to an individual sample). Can be mostly done by VCFtools. Our goal would be to visualize problematic regions for manual validation, and design primers for confirmation automatically.

7. Variant annotation Input: variant calls (raw VCF) Output: annotation of variants (annotated VCF) Coding Synonymous Splice-variant Regulatory ncRNA Annotating coding variation for deleteriousness SIFT Polyphen GERP SeattleSeq

GERP score

8. Variant Filtering Filtering tools: Input: VCF (annotated) Output: set of relevant variants/genes Filters based on variant annotation deleterious: missense/nonsense/splice Filters based on inheritance patterns Disease model (recessive/dominant/compound het) Filtering tools: Gemini (http://gemini.readthedocs.org/en/latest/) FamAnn (https://sites.google.com/site/famannotation/home)

9. Annotating genes Input: collection of genes with mutations. Output: relevant diseases, functional information Basic Information Genecards Adding pathway Ingenuity Databases of Disease gene links HGMD OMIM ClinVar We are currently using an outdated version of HGMD, but can possibly do better, or just replace it with Step 9.

9. Identifying Disease genes Automated machine learning approach to correlating genes with diseases Standard ontologies for diseases MeSH Disease Ontology Standard vocabulary for gene names ML approach (parse abstracts to make these connections)

Computational Resource Consumption Disk/sample CPU/sample Read Mapping 800 Gb 320 h* BAM prep 150 Gb 140 h SNV & INDEL calling 20 Gb 540 h* SV & CNV calling 200 Gb 30h + 30h Merging VCF 1.5 Gb 1h Variant Annotation 1 h Variant Filtering - Disease/Gene Assoc. ? *amenable to multithread parallelization (up to a point when memory becomes bottleneck)

Gene Prioritization Variant annotation Variant filtering Gene Disease connection

The HPO aims to act as a central resource to connect several genomics datasets with the diseasome. The HPO aims to act as a central resource to connect several genomics datasets with the diseasome. Thus, the HPO can act as a scaffold for enabling the interoperability between molecular biology and human disease. For example, phenotypic abnormalities in genetically modified model organisms can be mapped to human disease phenotypes (2). Sebastian Köhler et al. Nucl. Acids Res. 2014;42:D966-D974 © The Author(s) 2013. Published by Oxford University Press.

Human Phenotype Ontology 10,000 terms describing human phenotypic abnormalities, (7300 human hereditary syndromes). 2741 genes used to create DAG (Disease Associated Genes) 3 independent sub-ontologies mode of inheritance onset and clinical course phenotypic abnormalities The phenotypic terms are cross-linked

Applications of HPO

Differential diagnosis using Phenomizer

Sequencing Whole genome sequencing Exome sequencing Disease associated genome sequencing

Depth of coverage (exome or disease oriented sequencing) At 20X coverage, what fraction of het variants will be called? 15% will be missed

Phenotypic interpretation of eXomes: PhenIX Remove off-target and synonymous variants Test population frequency of other variants frequency score: max(0,1-0.13 exp(100*f)) These are known SNPs Scores from SIFT/Polyphen Most pathogenic score was taken Final variant score: pathogenic score X frequency score Clinical relevance score: semantic similarity between phenotypic abnormalities and 2741 genes. Average (clinical, variant)

Phenotypic interpretation of eXomes: PhenIX Simulated mutation data from HGMD