Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG.

Slides:



Advertisements
Similar presentations
Data analytics for better patient genetics
Advertisements

Huong Le Department of Molecular & Clinical Genetics, Royal Prince Alfred Hospital Click mouse to move to the next slide.
Charles He, Jessica McClendon, Kaelin Priger, and Wangshu Yang Group B2 Genes and Mutations.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
Outline to SNP bioinformatics lecture
Population Genetics (Ch. 16)
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
01/03/2013UK NEQAS UV Participants Meeting 2013 in a quality perspective.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
CATALYST Recall and Review: – What are chromosomes? – What are genes? – What are alleles? How do these terms relate to DNA? How do these terms relate to.
Sample to Insight Alexander Kaplun, PhD Sep PGMD: a comprehensive pharmacogenomic database for personalized medicine and drug discovery.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Personalized genomics
Mutations to Aid in Gene Study By: Yvette Medina Cell Phys
How do we interpret the variants?. Overview How do we prioritize the filtered variants? What filters can be used to identify the causative variants? What.
Identifying disease causal variants Mendelian disorders A. Mesut Erzurumluoglu 1.
Canadian Bioinformatics Workshops
Genetics: Inheritance. Meiosis: Summary  Diploid Cells (2n): Cells with two sets of chromosomes, (aka “homologous chromosomes”)  One set of chromosomes.
Hardy Weinberg Equilibrium. What is Hardy- Weinberg? A population is in Hardy-Weinberg equilibrium if the genotype frequencies are the same in each generation.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Clinical Interpretation and Implications of Whole-Genome.
May 4, What is an allele?. Genotype: genetics of trait (what alleles?) Homozygous: two copies of the same allele –Homozygous dominant (BB) –Homozygous.
Armenian Genome Project
Interpreting exomes and genomes: a beginner’s guide
Canadian Bioinformatics Workshops
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Nucleotide variation in the human genome
Part 2: Genetics, monohybrid vs. Dihybrid crosses, Chi Square
Hardy Weinberg Equilibrium, Gene and Genotypic frequencies
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
Interpretation Next Generation Sequencing (Bench Clinic)
Class meetings: TR 3:30-4:50 MCGIL 2315
Pick a Gene Assignment 4 Requirements
Rare, Low-Frequency, and Common Variants in the Protein-Coding Sequence of Biological Candidate Genes from GWASs Contribute to Risk of Rheumatoid Arthritis 
Intro to Genetics.
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Genetics Definitions Definition Key Word
Validation of a Next-Generation Sequencing Pipeline for the Molecular Diagnosis of Multiple Inherited Cancer Predisposing Syndromes  Paula Paulo, Pedro.
DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders  Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio.
Performance of Common Analysis Methods for Detecting Low-Frequency Single Nucleotide Variants in Targeted Next-Generation Sequence Data  David H. Spencer,
Week 10 Vocab Definitions
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing  Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.
Genomic alterations in breast cancer cell line MDA-MB-231.
Group A1 Caroline Kissel, Meg Sabourin, Kaylee Isaacs, Alex Maeder
Exercise: Effect of the IL6R gene on IL-6R concentration
A Rapid and Sensitive Next-Generation Sequencing Method to Detect RB1 Mutations Improves Care for Retinoblastoma Patients and Their Families  Wenhui L.
Daniel C. Koboldt, David E. Larson, Lori S. Sullivan, Sara J
Our (2006)1 in-a-million man !!!
Jong-Min Lee, Kyung-Hee Kim, Aram Shin, Michael J
CATALYST Recall and Review: How do these terms relate to DNA?
Section 6.4 “Traits & Genes”.
Intro to Genetics.
Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations  Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,
Carrier = an organism that has inherited a genetic trait or mutation, but displays no symptoms X-linked traits = traits that are passed on from parents.
Analysis of protein-coding genetic variation in 60,706 humans
The genomic landscape of a HeLa cell line.
Figure 1 Schematic of the OPA3 gene and OPA3 protein isoform b
Jessica X. Chong, Rebecca Ouwenga, Rebecca L. Anderson, Darrel J
Presentation transcript:

Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG

Medical Re-sequencing Analysis Pipeline (MERAP)

Performance Evaluation Total read length (Gbp) Q20 read length (%) Aligned read length (%) Uniquely aligned read length (%) 21.576774531 97.1927166 96.650051 95.847452 RefSeq_Gene RefSeq_ID Chromosome Exon_Start Exon End Exon Coverage Exon Mean Depth HES4 NM_021170 1 934342 934812 87.13 934906 934993 57.56 935072 935167 111.62 935246 935552 26.04 NM_001142467 934344 87.38 50.36 ISG15 NM_005101 948847 948956 117.93 949364 949919 109.42 AGRN NM_198576 955503 955753 12.09 957581 957842 106.98 970657 970704 45.97 976045 976260 69.2 . RefSeq Gene Coding Length Coding_Coverage Coding Mean Depth Transcript Length Transcript Coverage Transcript Mean Depth BARD1 NM_000465 2334 150.508 2594 141.83 BBS7 NM_018190 2019 120.016 2617 0.977072 107.74 BRAT1 NM_152743 2466 80.403 3013 0.925323 70.3 BSCL2 NM_001122955 1389 109.526 2006 0.967098 89.21 Coding Size (Mbp) Coding Coverage Transcript Size (Mbp) 33.343749 0.9639306 106.157 64.032436 0.6436373 62.265 Coding Coverage (Depth_Cutoff) Transcript Coverage (Depth_Cutoff) 0.9639306 (3X) 0.6436373 (3X) 0.9527975 (10X) 0.5943711 (10X) 0.9312436 (20X) 0.5659747 (20X) 0.9002506 (30X) 0.5380895 (30X) 0.8604523 (40X) 0.5079873 (40X) 0.8129922 (50X) 0.4753254 (50X)

SNVFinder Error sources: library construction, sequencing, image reading, alignment, neighoring indels Correlation between mappability and error rate Biased positions of indel-induced SNV artifacts

IndelFinder

CNVFinder

Variant spectrum identified

Logit score for missense pathogenicity Involve seven predictors Control samples: missense with allele frequency > 10% in 1000Genome and ESP6500 Case samples: disease causing missense from HGMD

Example of MERAP results Field name Detail Example Note Sample Family and patient ID M999_1274 Family ID M999, patient ID 1274 Variant Gene name, RefSeq ID, Protein length, HGNC ID, Gene description, variant coordinate in terms of genome, cDNA and protein HPD(NM_002150,393aa,HGNC:5147,4-hydroxyphenylpyruvate dioxygenase): g.12:122277904G>C,c.1005C>G,p.I335M Multiple isoforms of RefSeq genes are shown together seperated by '|'. Supporting Reads Number of non-redundant reads supporting the variant 76 Allele Percentage Allele percentage of the variant 0,98 Quality Phred-like quality score (0-40) 40 Affected/Cohort subjects in the cohort, incidence of the variant, homozygote frequency, heterozygote frequency 371|3|1|2 In a cohort of 371 subjects the variant is observed three times with once as homozygote and twice as heterozygote. Linkage Interval Definition of linkage interval, LOD score, length of interval, the variant location in the interval Homozygous|3.1|-----*----- In a homozygous interval with LOD score 3.1 and length 5.5Mb, the variant is located at the center of the interval. '-' stands for length unit 0.5Mb and '*' stands for the location of variant and also a length unit 0.5Mb. HGMD HGMD match of the variant DM;HPD;Tyrosinaemia 3;Hum Genet:v.106,p.654,y.2000 There is a match of the variant in HGMD, with the classification of DM (Disease causing), the host gene name is HPD, the associated phenotype is Tyrosinaemia 3, it is reported in Hum Genet (volumn:106, page:654, year:2000) OMIM OMIM match of the variant TYROSINEMIA,TYPE III dbSNP dbSNP match of the variant rs137852868 1000Genome 1000Genome match of the variant HOM_REF:HOM_VAR:HET=1090:0:2;AF=0.0009;AMR=0.01;ASN=0;AFR=0;EUR=0 The incidences of the variant in homozygous wild type, homozygous variant and heterozygous variant are 1090, 0 and 2, respectively. The allele frequency of the variant in population is 0.0009, with ethnic-specifc frequencies of 0.01, 0, 0 and 0 in American population, Asian population, African population and European population, respectively. ESP ESP match of the variant N/A Grantham Grantham score for AA change 10 phyloP phyloP score for base conservation 2.547 GERP GERP score for base conservation 3.25 SIFT SIFT prediction and score damaging(0.000000) PolyPhen2 PolyPhen2 prediction and score probably damaging(0.978) MutationTaster MutationTaster prediction and score disease causing(0.999999) CDD CDD match and score cl14632:Glo_EDI_BRP_like superfamily(0.498727735368957)| Multiple matches are shown together seperated by '|'. Logit Integrated pathogenicity score by Logit modeling 4.172 Pass the cutoff 3.57 where FDR<0.01 LOF If a loss-of-function tolerant gene N N means negative, P means positive. Interaction Interaction partner of the gene HPD<->IKBKG The gene interacts physically with IKBKG. Disease Known diseases caused by the gene Hawkinsinuria;Tyrosinaemia 3 Inheritance Model Proposed inheritance model for the disease Recessive

Variant detection compared with GATK

Thanks!