Interpreting exomes and genomes: a beginner’s guide

Slides:



Advertisements
Similar presentations
Lecture 2 Strachan and Read Chapter 13
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Charles He, Jessica McClendon, Kaelin Priger, and Wangshu Yang Group B2 Genes and Mutations.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Genetic Approaches to Rare Diseases: What has worked and what may work for AHC Erin L. Heinzen, Pharm.D, Ph.D Center for Human Genome Variation Duke University.
Corso di Genomica lezione laurea magistrale Biotecnologia Industriale Giovedì 9 dicembre 2010 aula 6 orario : Martedì ore
Genetics 101/Clinical Significance Camp Sunshine July 22, 2013 Diamond Blackfan Anemia Foundation Diamond Blackfan Anemia Canada.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Next-generation sequencing
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
I inherited What??? You and Your Genes: The Explosive New World of Genetics David Finegold, M.D.
Type 2 Diabetes With type 2 diabetes, your body either resists the effects of insulin — a hormone that regulates the movement of sugar into your cells.
High Throughput Sequencing
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
Next Generation Sequencing – Benefits for Patients Jo Whittaker/ Su Stenhouse.
Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital
Whole Exome Sequencing for Variant Discovery and Prioritisation
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Bringing Genomics Home Your DNA: A Blueprint for Better Health Dr. Brad Popovich Chief Scientific Officer Genome British Columbia March 24, 2015 / Vancouver,
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Understanding Genetic Testing
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
Big Data in Biology: A focus on genomics. Bioinformatics and Genomics O Applications: O Personalized cancer medicines O Disease determination O Pathway.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
How do we interpret the variants?. Overview How do we prioritize the filtered variants? What filters can be used to identify the causative variants? What.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Pharmacogenetics/Pharmacogenomics. Outline Introduction  Differential drug efficacy  People react differently to drugs Why does drug response vary?
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Clinical Interpretation and Implications of Whole-Genome.
From Reads to Results Exome-seq analysis at CCBR
Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG.
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Research Techniques Made Simple: Next-Generation Sequencing:
Lesson: Sequence processing
Dewey et al. Presented By: Natasha Granneman & Christina Tran
Genomic Analysis: GWAS
Week-6: Genomics Browsers
Nucleotide variation in the human genome
Disease risk prediction
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Genetic Testing for the Clinician
Interpretation Next Generation Sequencing (Bench Clinic)
Very important to know the difference between the trees!
Class meetings: TR 3:30-4:50 MCGIL 2315
Making Sense of Uncertainty
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Validation of a Next-Generation Sequencing Pipeline for the Molecular Diagnosis of Multiple Inherited Cancer Predisposing Syndromes  Paula Paulo, Pedro.
DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders  Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio.
Figure 1 The genomic nephrology workflow: genetic diagnosis and clinical application Figure 1 |The genomic nephrology workflow: genetic diagnosis and clinical.
Genomic Medicine Centre Overview
Type 2 Diabetes With type 2 diabetes, your body either resists the effects of insulin — a hormone that regulates the movement of sugar into your cells.
Genomic Medicine Centre Overview
Exome and genome sequencing for inborn errors of immunity
Group A1 Caroline Kissel, Meg Sabourin, Kaylee Isaacs, Alex Maeder
Novel approach to genetic analysis and results in 3000 hemophilia patients enrolled in the My Life, Our Future initiative by Jill M. Johnsen, Shelley N.
CATALYST Recall and Review: How do these terms relate to DNA?
Genomic Medicine Centre Overview
The Content of the Genome
BF528 - Whole Genome Sequencing and Genomic Variation
Analysis of protein-coding genetic variation in 60,706 humans
Figure 2 Distribution of DEPDC5 variants in patients and controls
Next Generation Sequencing Market. Report Description and Highlights According to Renub Research market research report “Next Generation Sequencing (NGS)
Development of a Novel Next-Generation Sequencing Assay for Carrier Screening in Old Order Amish and Mennonite Populations of Pennsylvania  Erin L. Crowgey,
Presentation transcript:

Interpreting exomes and genomes: a beginner’s guide Daniel MacArthur Analytic and Translational Genetics Unit Massachusetts General Hospital Broad Institute of Harvard and MIT www.macarthurlab.org Twitter: @dgmacarthur

Overview Fundamentals of next-generation sequencing Genomes, exomes and targeted panels Genomic diagnosis: how do we filter causal variants from a patient’s entire genome? Major challenges for NGS diagnosis

Next-generation sequencing Many different technologies Can chop up DNA and read bits of fragment all at the same time – massively parallel sequencing Illumina Pacific Biosciences Oxford Nanopore

Sequencing yields billions of reads per run TTTGAACTTTCATAG CGTTACGGCAGACG GGGACATATTCGAAAT ACGGGATGTACG TAGACATAGACGACT GGGATGTACGAA GTACTGACCAG GACCAGTAGAC GACATAGACGACT CCAGTAGACATA ACGAGCCGTAGCTA TTTGACGGGATG GGGATGTACGA What does the data “look like” The machines generate fragments of DNA sequence – depending on the application these can be 75 to 150bp long Our reads are paired so we can read in from each end of the library fragment CGAGCCGTAGCTA AGACGACTTTGAC ATAGACGACTTTGA GGGATGTATGAG GGGATGTACGAG TACGAGCCGTA TGTACGAGCCGTA

Compare the reads to a reference genome GTACTGACCAGTAGACATAGACGACTTTGACGGGATGTACGAGCCGTAGCTA ACGGGATGTACG TAGACATAGACGACT GGGATGTACGAA GTACTGACCAG GACCAGTAGAC GACATAGACGACT CCAGTAGACATA ACGAGCCGTAGCTA TTTGACGGGATG GGGATGTACGA As part of our data processing we then compare these reads to a reference genome – human or any other reference that is applicable CGAGCCGTAGCTA AGACGACTTTGAC ATAGACGACTTTGA GGGATGTATGAG GGGATGTACGAG TACGAGCCGTA TGTACGAGCCGTA

C -> T Challenges: Mapping short reads Variable coverage NGS allows us to sample the sequence position many times over GTACTGACCAGTAGACATAGACGACTTTGACGGGATGTACGAGCCGTAGCTA TAGACATAGACGACT ACGGGATGTATG GTACTGACCAG GGGATGTATGA TTTGACGGGATG ATGAGCCGTAGCTA GACCAGTAGAC GTACGAGCCGTA CCAGTAGACATA TGAGCCGTAGCTA GACATAGACGACT GGGATGTATGAG GGGATGTACGAG ATAGACGACTTTGA AGACGACTTTGAC TACGAGCCGTA TGTACGAGCCGTA C -> T (5 C / 5 T) Challenges: Mapping short reads Variable coverage Base calling quality Tend to be worse for insertions and deletions compared to SNPs With NGS we are able to sample the position many times, so here we have many looks at this mutation and this kind of information gives us confidence in the call.

Percent of Genome Sequenced Which technology to choose? Technology Percent of Genome Sequenced Cost Depth of Coverage Whole Genome Sequencing >95% Whole Exome Sequencing ~1.5% (protein-coding regions) Targeted Sequencing 0.005% - 0.1% (100s – 1000s of genes) High level overview of the types of sequencing WGS = complete DNA sequence of the person/organism Exome = all mutations in all exons PLUS other variations (such as small insertions/deletions) Targeted = all mutations in all targeted exons PLUS other variations (such as small insertions/deletions) – use on a large collection of genese

Targeted sequencing

Targeted sequencing

The problem with exome data Clinically and genetically heterogeneous conditions x 30,000 rows

Sifting signal from noise in exomes Every genome contains many rare, potentially functional variants ~500 rare missense variants ~100 LoF variants: ~20 homozygous, ~20 rare ~100 rare variants in known disease genes 5-10 recessive disease-causing mutations 1-2 de novo coding mutations sequencing errors In Mendelian disease patients we need to find 1-2 true causal mutations amidst this “noise”

How do we find pathogenic variants? Is the variant a known pathogenic variant? How much evidence supports the claim of pathogenicity? Is the variant rare? Is it predicted to have a functional impact (change a protein sequence)? Does it segregate with disease? Is the gene associated with the disease?

Making sense of one genome requires tens of thousands of genomes vs More than 500K exomes and 50K genomes have been sequenced worldwide but these data are siloed by project and inconsistently processed

Exome Aggregation Consortium (ExAC) Latino African European South Asian East Asian Other 1000 Genomes ESP ExAC exac.broadinstitute.org

Value of reference databases Provide variant frequency in a large population (either healthy, or “reference” i.e. population sample) Provide frequency across multiple human populations Allow us to assess how many variants we see in a particular gene Provide an unbiased estimate of variant penetrance

Lessons from ExAC Many “healthy” people carry apparently disease-causing variants over 20,000 reported disease variants are seen in our “healthy” samples average ~2/person after filtering What’s causing this? carriers of recessive variants some undiagnosed disease cases lots of false positive variants (20-25%)

Databases of disease mutations Drawn from literature collected over decades with variable standards Five years ago: no large frequency databases, = any rare protein-altering variant is causal New databases more careful about evidence

xBrowse: Rapid exploration of multiple inheritance patterns https://atgu.mgh.harvard.edu/xbrowse/

xBrowse: Filtering by function and frequency https://atgu.mgh.harvard.edu/xbrowse/

xBrowse: Digestible information for all candidate variants and genes https://atgu.mgh.harvard.edu/xbrowse/

xBrowse: Digestible information for all candidate variants and genes exac.broadinstitute.org

xBrowse: Following up candidate genes with external resources https://atgu.mgh.harvard.edu/xbrowse/

The big (largely) unsolved challenges NGS data still misses a non-trivial number of genetic variants, also has errors Our reference databases are still missing many populations Uncertainty even about “known” pathogenic variants in databases For many variants, penetrance is not robustly established Huge difference between interpretation in “healthy” and “disease” samples