DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders  Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio.

Slides:



Advertisements
Similar presentations
Alternative Splicing QTLs in European and African Populations Halit Ongen, Emmanouil T. Dermitzakis The American Journal of Human Genetics Volume 97, Issue.
Advertisements

The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
Comprehensively Evaluating cis-Regulatory Variation in the Human Prostate Transcriptome by Using Gene-Level Allele-Specific Expression  Nicholas B. Larson,
In Silico Proficiency Testing for Clinical Next-Generation Sequencing
Recurrent CNVs Disrupt Three Candidate Genes in Schizophrenia Patients
CYP3A Variation and the Evolution of Salt-Sensitivity Variants
Jacek Majewski  The American Journal of Human Genetics 
Annotation of Sequence Variants in Cancer Samples
Genomic DNA Methylation Signatures Enable Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental Syndromes  Erfan Aref-Eshghi,
Michael H. Duyzend, Xander Nuttle, Bradley P
Annotation of Sequence Variants in Cancer Samples
Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing  Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.
Exonic Mosaic Mutations Contribute Risk for Autism Spectrum Disorder
Exome and genome sequencing for inborn errors of immunity
SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data  Di Zhang, Linhai Zhao,
Daniel Greene, Sylvia Richardson, Ernest Turro 
Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder  Christopher S. Poultney, Arthur P. Goldberg,
DMRT1 mutations are rarely associated with male infertility
Reliable Identification of Genomic Variants from RNA-Seq Data
Haplotype Estimation Using Sequencing Reads
Exome Sequencing and Functional Analysis Identifies BANF1 Mutation as the Cause of a Hereditary Progeroid Syndrome  Xose S. Puente, Victor Quesada, Fernando G.
Daniel C. Koboldt, David E. Larson, Lori S. Sullivan, Sara J
Walking the Interactome for Prioritization of Candidate Disease Genes
Mutations in CEP78 Cause Cone-Rod Dystrophy and Hearing Loss Associated with Primary-Cilia Defects  Konstantinos Nikopoulos, Pietro Farinelli, Basilio.
Tuuli Lappalainen, Stephen B. Montgomery, Alexandra C
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
Zheng-Zheng Tang, Dan-Yu Lin  The American Journal of Human Genetics 
Alternative Splicing QTLs in European and African Populations
Double Heterozygosity for a RET Substitution Interfering with Splicing and an EDNRB Missense Mutation in Hirschsprung Disease  Alberto Auricchio, Paola.
Gene-Expression Variation Within and Among Human Populations
Biased Gene Conversion Skews Allele Frequencies in Human Populations, Increasing the Disease Burden of Recessive Alleles  Joseph Lachance, Sarah A. Tishkoff 
Assessment of the Clinical Relevance of BRCA2 Missense Variants by Functional and Computational Approaches  Lucia Guidugli, Hermela Shimelis, David L.
Genotyping Microarray for the Detection of More Than 200 CFTR Mutations in Ethnically Diverse Populations  Iris Schrijver, Eneli Oitmaa, Andres Metspalu,
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data  Gao T. Wang, Bo Peng, Suzanne M. Leal  The.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Kristina Allen-Brady, Peggy A. Norton, James M
Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting  Caroline F. Wright, Ben West,
Secondary Variants in Individuals Undergoing Exome Sequencing: Screening of 572 Individuals Identifies High-Penetrance Mutations in Cancer-Susceptibility.
Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project  Paul L. Auer, Alex.
Biased Allelic Expression in Human Primary Fibroblast Single Cells
CYP3A Variation and the Evolution of Salt-Sensitivity Variants
Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next- Generation Sequencing Panel Testing  Wenbo Mu, Hsiao-Mei Lu, Jefferey.
Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A
Patterns of Genetic Coding Variation in a Native American Population before and after European Contact  John Lindo, Mary Rogers, Elizabeth K. Mallott,
Ivan P. Gorlov, Olga Y. Gorlova, Shamil R. Sunyaev, Margaret R
Alkes L. Price, Gregory V. Kryukov, Paul I. W. de Bakker, Shaun M
Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations  Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,
Structural Architecture of SNP Effects on Complex Traits
Mamoru Kato, Yusuke Nakamura, Tatsuhiko Tsunoda 
Jon Wakefield  The American Journal of Human Genetics 
Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History  Charla A. Lambert, Caitlin F.
Yu Jiang, Yujun Han, Slavé Petrovski, Kouros Owzar, David B
Estimating Genetic Effects and Quantifying Missing Heritability Explained by Identified Rare-Variant Associations  Dajiang J. Liu, Suzanne M. Leal  The.
Daniel Greene, Sylvia Richardson, Ernest Turro 
Dominique J. Verlaan, Adrian M. Siegel, Guy A. Rouleau 
Spatial Clustering of de Novo Missense Mutations Identifies Candidate Neurodevelopmental Disorder-Associated Genes  Stefan H. Lelieveld, Laurens Wiel,
L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals  Xiaowei Wu, Mary Sara McPeek 
Development and Validation of a Computational Method for Assessment of Missense Variants in Hypertrophic Cardiomyopathy  Daniel M. Jordan, Adam Kiezun,
Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data  Zihuai.
Large Sample Size, Wide Variant Spectrum, and Advanced Machine-Learning Technique Boost Risk Prediction for Inflammatory Bowel Disease  Zhi Wei, Wei Wang,
Arun Kumar, Satish C. Girimaji, Mahesh R. Duvvari, Susan H. Blanton 
Analysis of protein-coding genetic variation in 60,706 humans
Figure 2 Distribution of DEPDC5 variants in patients and controls
Genomic DNA Methylation Signatures Enable Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental Syndromes  Erfan Aref-Eshghi,
External model validation of binary clinical risk prediction models in cardiovascular and thoracic surgery  Graeme L. Hickey, PhD, Eugene H. Blackstone,
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Hannah R. Elliott, David C. Samuels, James A. Eden, Caroline L
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
Brian C. Verrelli, Sarah A. Tishkoff 
Presentation transcript:

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders  Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio Alessandro Di Gioia, Andrea Superti-Furga, Carlo Rivolta  The American Journal of Human Genetics  Volume 101, Issue 4, Pages 623-629 (October 2017) DOI: 10.1016/j.ajhg.2017.09.001 Copyright © 2017 American Society of Human Genetics Terms and Conditions

Figure 1 Rationale and General Design of DOMINO (A) A typical exome analysis identifies 20,000 variants, when compared to the human reference genome. After filtering by rarity in the general population (minor allele frequency, or MAF, < 1%) and by functional impact of each variant, approximately 400 DNA changes remain. These impact 300–400 genes, heterozygously (red dots), and 5–10 genes when they are present as homozygous or compound heterozygous variants (blue dots). (B) Workflow of DOMINO methodology, showing the different steps of gene selection, annotation, and scoring. (C) Details of the LDA algorithm. Relevant features are first preselected and then removed, replaced or added iteratively to the model, with specific acceptance criteria. 10 × 10-fold cross-validation is performed at each iteration. (D) Performance of the model as a function of the iterations performed. AUCs of the training, testing and validation sets, as well as the number of features at each iteration are shown. The cut-off value retained corresponded to the 14th iteration and a set of 8 features. The model converges starting from the 36th iteration. (E) ROC curves for the complete training, testing and validation sets, displaying AUC values of 0.912, 0.908, and 0.920, respectively. (F) Features composing the selected model. Average values for AD and AR genes of the training set are shown, along with their relative weight. Units are as follows: for STRING entries, number of interactions;17 for ExAC-pRec, probability of being intolerant to homozygous but not heterozygous loss-of-function variants;18 for ExAC-missense Z score, value with respect to a distribution of expected number of missenses;18 PhyloP, average PhyloP score with respect to a 1,000-bp window centered on the TSS;19 ExAC-don./syn., number of variants at the donor splicing site, normalized to the number of synonymous variants in the coding sequence;20 mRNA half-life, 0 if ≤ 10 hr or 1 if > 10 hr.21 The American Journal of Human Genetics 2017 101, 623-629DOI: (10.1016/j.ajhg.2017.09.001) Copyright © 2017 American Society of Human Genetics Terms and Conditions

Figure 2 Distributions of LDA Scores and Probabilities of Being Dominant, P(AD), for Genes in the Training and Validation Sets (A) Density plots of LDA score for AD (red) and AR (blue) genes of the training set. Continuous lines refer to raw values, whereas dashed lines to their normal approximations. (B–F) Histograms of P(AD) for: (B) AD genes of the training set, (C) AR genes of the training set, (D) AD genes of the validation set, (E) AR genes of the validation set, (F) Genes known to behave as false positives in NGS experiments, containing rare, non-pathogenic variants. The American Journal of Human Genetics 2017 101, 623-629DOI: (10.1016/j.ajhg.2017.09.001) Copyright © 2017 American Society of Human Genetics Terms and Conditions

Figure 3 Distributions of P(AD) for Genes with at Least Two De Novo Mutations in Different Individuals with Intellectual Disability or Epilepsy Histograms of P(AD) for (A) 82 genes carrying de novo mutations in 1,010 individuals with intellectual disability or (B) 19 genes carrying de novo mutations in 532 individuals with epilepsy, as extracted from denovo-db. The American Journal of Human Genetics 2017 101, 623-629DOI: (10.1016/j.ajhg.2017.09.001) Copyright © 2017 American Society of Human Genetics Terms and Conditions