Identifying disease causal variants Mendelian disorders A. Mesut Erzurumluoglu 1.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Lecture 45 Prof Duncan Shaw. Applications - finding genes Currently much interest in medical research, in finding the genes causing disease Sometimes.
Mapping analysis software Dr Ian Carr PhD. MCSD. Leeds Institute of Molecular Medicine St Jamess University Hospital.
Association Tests for Rare Variants Using Sequence Data
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Tutorial #1 by Ma’ayan Fishelson
Genetic Approaches to Rare Diseases: What has worked and what may work for AHC Erin L. Heinzen, Pharm.D, Ph.D Center for Human Genome Variation Duke University.
Genetics 101/Clinical Significance Camp Sunshine July 22, 2013 Diamond Blackfan Anemia Foundation Diamond Blackfan Anemia Canada.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Some terms Consanguineous marriage: between related individuals Proband, or propositus: index case or case that originally attracts attention of the geneticist.
Genome Annotation BCB 660 October 20, From Carson Holt.
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
Autosomal recessive inheritance Risks to children where a parent is affected: the basics a tutorial to show how the genes segregate to give the typical.
Autosomal dominant inheritance Risks to children where both parents are affected: the basics a tutorial to show how the genes segregate to give the typical.
NGS Analysis Using Galaxy
An informatics approach to analyzing the incidentalome J.Berg et al. Genetics in Medicine Presented by Li Changjian.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Guideline for ClinLabGeneticist tool Jinlian Wang
DR. ERNEST K. ADJEI FRCPath. DEPARTMENT OF PATHOLOGY SMS-KATH
Guideline for ClinLabGeneticist tool Jinlian Wang
01/03/2013UK NEQAS UV Participants Meeting 2013 in a quality perspective.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Name of the topic Author, group, General information about protein (we suggest to use the websites
Allele. Alternate form of a gene gene variant autosome.
Bioinformatics. Sequence information Mapping information Phenotypic information Literature Prediction programs -Gene prediction -Promotor prediction -Functional.
Genomes and Genomics.
ANNOUNCEMENTS Homework Quiz: Take out your HW
Sample to Insight Alexander Kaplun, PhD Sep PGMD: a comprehensive pharmacogenomic database for personalized medicine and drug discovery.
ChrGeneticist introduction for reviewer Jinlian Wang 10/8/2014.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Chapter 3 Genes in Pedigrees Mendelian Pedigree Patterns:
© 2012 Genomatix GeneGrid finding disease causing variants in NGS data Claudia Gugenmus Genomatix Software GmbH Bayerstrasse 85a
Genetic Screening and Counselling
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Personalized genomics
Sickle Cell Andrew Novoa and Thea De Guzman 2/1/10 Per. 3.
Genetic Disorders and Genetic Testing © 2010 Project Lead The Way, Inc.Medical Interventions.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
GENETICS OF DEAFNESS.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
How do we interpret the variants?. Overview How do we prioritize the filtered variants? What filters can be used to identify the causative variants? What.
Gene350 Animal Genetics Lecture 5 3 August Last Time Study chromosomes – The normal karyotypes of animals – Chromosomal abnormalities – Chromosomal.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Bayesian Risk Analysis Workshop Questions Shuji Ogino, M.D., Ph.D. AMP Training and Education Committee Brigham and Women’s Hospital Dana-Farber Cancer.
Patterns of single gene inheritance Mahmoud A. Alfaqih BDS PhD Jordan University of Science and Technology School of Medicine Department of Biochemistry.
Inheritance Model testing Andrew Stubbs Dept. Bioinformatics.
Preparing published variants with Mutalyzer webservices Gerard C.P. Schaafsma Department of Human Genetics.
Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG.
LRP6: Non-syndromic Oligodontia PMID:
Interpreting exomes and genomes: a beginner’s guide
Genetics Journal Club Robert C. Bauer October 22nd, 2015.
Genomic Analysis: GWAS
Monogenic Disorders Genetic Counselling
Using RNA-seq data to improve gene annotation
Frances Bond West Midlands Regional Genetics Laboratory 12/04/10
Interpretation Next Generation Sequencing (Bench Clinic)
Unit 3.
Sequence, SNP and Mutation Databases
Linkage analysis & Homozygosity mapping
Class meetings: TR 3:30-4:50 MCGIL 2315
Visualization of genomic data
Different mode and types of inheritance
BGN: X-linked Spondyloepimetaphyseal Dysplasia PMID:
REEP6: Retinitis Pigmentosa PMID:
Daniel C. Koboldt, David E. Larson, Lori S. Sullivan, Sara J
Linkage Analysis Problems
Welcome - webinar instructions
Analysis of protein-coding genetic variation in 60,706 humans
Presentation transcript:

Identifying disease causal variants Mendelian disorders A. Mesut Erzurumluoglu 1

Contents Whole process Data formats Identifying candidate genes Analysis ◦ Finding candidate regions  Consanguineous ◦ Finding causal variant Practical 2

Whole process 3 Denis (Day 3) Hash (Day 3) Me

Published review Erzurumluoglu et al. Mar ◦ BioMed Research International 4

VCF file FASTA file ◦ We are 99.9% similar Only variants with relation to a reference genome (e.g. hg19, hg38) are included 5 Link:

VEP annotated data Consequences of variants 6 See link for meaning of each SO term:

Several consequences for one mutation? 7 ? See link for annotation options: ?

Alternative splicing 8 Transcript 1 Transcript 2 X X Source URL:

Different Transcripts Same mutation, different effect ‘Canonical’ transcript ◦ Longest transcript ◦ Will be fine to use for most genes Reporting variants: ◦ See HGVS nomenclature guidelines ◦ Transcript ID:Nucleotide change:Protein change ◦ e.g. NM_ :c.2525C>T:p.(S842F) ◦ Check using Mutalyzer  Position converter (example: chr1:g.12345A>T)  Name Checker 9

Canonical transcript – for most genes… 10 Source URL:

Understand your disease! Mode of inheritance ◦ Autosomal recessive ◦ Autosomal dominant ◦ X-linked Prevalence Known genes/variants Any complications? ◦ Genetic heterogeneity ◦ Incomplete penetrance ◦ Pleiotropy 11

*Candidate genes Literature ◦ e.g. Latest review on disorder Disease specific databases ◦ e.g. Ciliome database ◦ LOVD 12 List 1 List 2

Filtering - Autozygosity Consanguineous individuals ◦ Mostly first cousins ◦ Elevated risk of AR diseases Autozygous regions ◦ Long runs of homozygosity 13 This slide is relevant to data obtained from consanguineous individuals only!

AutoZplotter 14 Erzurumluoglu et al., BioMed Research International Homozygous Heterozygous

Filtering – Variant status Autosomal recessive ◦ Consanguineous: check autozygous regions (IBD) ◦ Unrelated (could be IBD or IBS) Autosomal dominant ◦ Inherited – affected parent has to possess variant ◦ De novo X-linked ◦ Recessive ◦ Dominant 15

Filtering - MAF Calculating your threshold ◦ HWE: p 2 + 2pq + q 2 = 1 (where p + q = 1)  q: frequency of disease causal mutation  e.g. if AR disease is 1 in million, then q is ◦ Disease causal mutation cannot be common! 1000 Genomes Project ◦ 1092 samples (Phase I) ◦ Incorporated by VEP Exome variant server (EVS) ◦ 6503 samples ◦ Incorporated by VEP ExAC ◦ 60,706 samples ◦ Download via FTP 16

Filtering – Consequence to protein Not predicted to be high impact mutations: ◦ Coding  Synonymous ◦ Noncoding  Upstream and Downstream of genes  Intron  5’ and 3’ UTRs 17

*Building Evidence – Known variants OMIM – Mendelian diseases HGMD ◦ Public – All reported mutations but 3 years behind  Incorporated by VEP  Variant position ◦ Paid – All mutations ClinVar ◦ All clinically relevant mutations ◦ Download from FTP link 18

*Building Evidence – Mutation effect prediction Most probably ‘loss of function’ mutations: ◦ start losses ◦ splice acceptor/donor ◦ stop gains (especially NMD) ◦ frameshifting indels ◦ missense mutations Predicting effect of Missense mutations: ◦ FATHMM-MKL & CADD (all variants, including non-coding) ◦ SIFT & Polyphen-2 19 (General) Probability of being functionally disruptive

*Building Evidence - Conservation GERP++ ◦ Download ‘Tracks Data’ - Elements (hg19) Local sequence alignment ◦ UniProt  BLAST  Align 20

Building Evidence – Animal models Check literature Mouse knockouts ◦ Other model organisms Functional studies ◦ In vitro ◦ In vivo 21

Building Evidence – Gene expression Which tissues is the protein expressed in? ENCODE data ◦ Tonnes of expression data for tens of cell lines ◦ Load track via UCSC Genome browser ◦ Ensembl Genome browser GeneCards ◦ Integrative webpage 22

*GeneCards 23

Building Evidence – Replication Gold standard but not always possible Traditional: LOD score of 3 (p≤ 0.001) Very rare disorders ◦ Parents and unaffected siblings ◦ Other affected siblings/cousins ◦ Check in other affected families ◦ Genotype variant in local population 24

Simple analysis pipeline Create files: ◦ PHI_SO_terms.txt  List of ‘most probably’ causal consequences ◦ Candidate_genes.txt  List of candidate genes Example: grep -f PHI_SO_terms.txt file.vep | grep -f Candidate_genes.txt | grep CANONICAL | grep HOM | grep _[A-Z]/ | cat | less -S 25 Rare variants (absent in 1000GP) Homozygous variants Canonical transcripts Candidate genes Severe consequences

26

VEP annotated data Consequences of variants 27 See link for meaning of each SO term:

Learning objectives Making sense of VEP annotated data ◦ Different transcripts and mutation effects How to create and use candidate list(s) How to look for causal variants ◦ Filtering ◦ Setting threshold for MAF Building evidence for variants Reporting variants (e.g. for papers, databases) 28

Thank You Any questions? Please look back at the slides again once you complete the short-course(s) 29

Practical Proband is affected by Primary ciliary dyskinesia ◦ Hint 1: Autosomal recessive ◦ Hint 2: Prevalence is ~ 1 in ◦ Hint 3: Genetically heterogeneous 30 PCD is characterised by abnormal cilia function and/or structure which consequently leads to chronic sino-pulmonary infections

Exercise 1- Create list of candidate genes (max: 15 mins) Ensembl IDs in txt file 2- Find causal variant (in Practical_file_Mesut.txt) 3- Backup variant with evidence ◦ Conservation ◦ ‘Model’ organisms ◦ Literature 4- Report causal variant in HGVS format 31

Additional exercise A sibling of PCD proband is diagnosed with Papillon-Lefevre syndrome (PLS) ◦ Hint 1: PLS is autosomal recessive ◦ Hint 2: PCD affected sibling is not affected by PLS Find causal variant 2- Build-up evidence for causal variant 3- Report causal variant in HGVS format

To-do list Create PCD candidate gene list Find PCD causal variant in file Backup variant with evidence Report variant in HGVS format 33 Find PLS causal variant in file Backup variant with evidence Report variant in HGVS format

Answers – Known PCD causal genes 34

PCD candidate genes 35

Answers – PCD causal variant Autosomal recessive ◦ Filter sex chromosome variants Autosomal recessive ◦ Filter heterozygous variants PCD is rare (~1/20000) ◦ Filter common variants (GMAF ≥ 1%) Screen known PCD causal genes Answer: 19_ _C/A 36

Building evidence for PCD causal variant 37

38

Building evidence for PCD causal variant Already identified gene and variant ◦ Alsaadi and Erzurumluoglu et al, Hum Mut. ◦ Highly conserved (e.g. GERP score, see paper) ◦ Concrete evidence! Animal models link CCDC151 to PCD ◦ Jerber et al, Hum Mol Genet. HGVS Answer: NM_ :c.925G>T:p.(E309*) 39

Answers – PLS causal variant There is 50% probability that the PCD affected sibling will be a carrier for the PLS causal variant PLS is caused by mutations in CTSC gene PLS is rare Answer: 11_ _C/T Answer: NM_ :c.899G>A:p.(G300D) 40

Building evidence for PLS causal variant 41