DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio.

Slides:

Advertisements

Similar presentations

Alternative Splicing QTLs in European and African Populations Halit Ongen, Emmanouil T. Dermitzakis The American Journal of Human Genetics Volume 97, Issue.

Advertisements

The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,

Comprehensively Evaluating cis-Regulatory Variation in the Human Prostate Transcriptome by Using Gene-Level Allele-Specific Expression Nicholas B. Larson,

In Silico Proficiency Testing for Clinical Next-Generation Sequencing

Recurrent CNVs Disrupt Three Candidate Genes in Schizophrenia Patients

CYP3A Variation and the Evolution of Salt-Sensitivity Variants

Jacek Majewski The American Journal of Human Genetics

Annotation of Sequence Variants in Cancer Samples

Genomic DNA Methylation Signatures Enable Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental Syndromes Erfan Aref-Eshghi,

Michael H. Duyzend, Xander Nuttle, Bradley P

Annotation of Sequence Variants in Cancer Samples

Molecular Diagnosis of Autosomal Dominant Polycystic Kidney Disease Using Next- Generation Sequencing Adrian Y. Tan, Alber Michaeel, Genyan Liu, Olivier.

Exonic Mosaic Mutations Contribute Risk for Autism Spectrum Disorder

Exome and genome sequencing for inborn errors of immunity

SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data Di Zhang, Linhai Zhao,

Daniel Greene, Sylvia Richardson, Ernest Turro

Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder Christopher S. Poultney, Arthur P. Goldberg,

DMRT1 mutations are rarely associated with male infertility

Reliable Identification of Genomic Variants from RNA-Seq Data

Haplotype Estimation Using Sequencing Reads

Exome Sequencing and Functional Analysis Identifies BANF1 Mutation as the Cause of a Hereditary Progeroid Syndrome Xose S. Puente, Victor Quesada, Fernando G.

Daniel C. Koboldt, David E. Larson, Lori S. Sullivan, Sara J

Walking the Interactome for Prioritization of Candidate Disease Genes

Mutations in CEP78 Cause Cone-Rod Dystrophy and Hearing Loss Associated with Primary-Cilia Defects Konstantinos Nikopoulos, Pietro Farinelli, Basilio.

Tuuli Lappalainen, Stephen B. Montgomery, Alexandra C

Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante

Zheng-Zheng Tang, Dan-Yu Lin The American Journal of Human Genetics

Alternative Splicing QTLs in European and African Populations

Double Heterozygosity for a RET Substitution Interfering with Splicing and an EDNRB Missense Mutation in Hirschsprung Disease Alberto Auricchio, Paola.

Gene-Expression Variation Within and Among Human Populations

Biased Gene Conversion Skews Allele Frequencies in Human Populations, Increasing the Disease Burden of Recessive Alleles Joseph Lachance, Sarah A. Tishkoff

Assessment of the Clinical Relevance of BRCA2 Missense Variants by Functional and Computational Approaches Lucia Guidugli, Hermela Shimelis, David L.

Genotyping Microarray for the Detection of More Than 200 CFTR Mutations in Ethnically Diverse Populations Iris Schrijver, Eneli Oitmaa, Andres Metspalu,

Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes Matthieu Deschamps, Guillaume Laval,

Variant Association Tools for Quality Control and Analysis of Large-Scale Sequence and Genotyping Array Data Gao T. Wang, Bo Peng, Suzanne M. Leal The.

A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants Andrew.

Kristina Allen-Brady, Peggy A. Norton, James M

Assessing the Pathogenicity, Penetrance, and Expressivity of Putative Disease-Causing Variants in a Population Setting Caroline F. Wright, Ben West,

Secondary Variants in Individuals Undergoing Exome Sequencing: Screening of 572 Individuals Identifies High-Penetrance Mutations in Cancer-Susceptibility.

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project Paul L. Auer, Alex.

Biased Allelic Expression in Human Primary Fibroblast Single Cells

CYP3A Variation and the Evolution of Salt-Sensitivity Variants

Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next- Generation Sequencing Panel Testing Wenbo Mu, Hsiao-Mei Lu, Jefferey.

Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A

Patterns of Genetic Coding Variation in a Native American Population before and after European Contact John Lindo, Mary Rogers, Elizabeth K. Mallott,

Ivan P. Gorlov, Olga Y. Gorlova, Shamil R. Sunyaev, Margaret R

Alkes L. Price, Gregory V. Kryukov, Paul I. W. de Bakker, Shaun M

Characteristics of Neutral and Deleterious Protein-Coding Variation among Individuals and Populations Wenqing Fu, Rachel M. Gittelman, Michael J. Bamshad,

Structural Architecture of SNP Effects on Complex Traits

Mamoru Kato, Yusuke Nakamura, Tatsuhiko Tsunoda

Jon Wakefield The American Journal of Human Genetics

Highly Punctuated Patterns of Population Structure on the X Chromosome and Implications for African Evolutionary History Charla A. Lambert, Caitlin F.

Yu Jiang, Yujun Han, Slavé Petrovski, Kouros Owzar, David B

Estimating Genetic Effects and Quantifying Missing Heritability Explained by Identified Rare-Variant Associations Dajiang J. Liu, Suzanne M. Leal The.

Daniel Greene, Sylvia Richardson, Ernest Turro

Dominique J. Verlaan, Adrian M. Siegel, Guy A. Rouleau

Spatial Clustering of de Novo Missense Mutations Identifies Candidate Neurodevelopmental Disorder-Associated Genes Stefan H. Lelieveld, Laurens Wiel,

L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals Xiaowei Wu, Mary Sara McPeek

Development and Validation of a Computational Method for Assessment of Missense Variants in Hypertrophic Cardiomyopathy Daniel M. Jordan, Adam Kiezun,

Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data Zihuai.

Large Sample Size, Wide Variant Spectrum, and Advanced Machine-Learning Technique Boost Risk Prediction for Inflammatory Bowel Disease Zhi Wei, Wei Wang,

Arun Kumar, Satish C. Girimaji, Mahesh R. Duvvari, Susan H. Blanton

Analysis of protein-coding genetic variation in 60,706 humans

Figure 2 Distribution of DEPDC5 variants in patients and controls

Genomic DNA Methylation Signatures Enable Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental Syndromes Erfan Aref-Eshghi,

External model validation of binary clinical risk prediction models in cardiovascular and thoracic surgery Graeme L. Hickey, PhD, Eugene H. Blackstone,

Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms Carl A. Anderson, Fredrik H. Pettersson,

Hannah R. Elliott, David C. Samuels, James A. Eden, Caroline L

The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,

Brian C. Verrelli, Sarah A. Tishkoff

Presentation transcript:

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders Mathieu Quinodoz, Beryl Royer-Bertrand, Katarina Cisarova, Silvio Alessandro Di Gioia, Andrea Superti-Furga, Carlo Rivolta The American Journal of Human Genetics Volume 101, Issue 4, Pages 623-629 (October 2017) DOI: 10.1016/j.ajhg.2017.09.001 Copyright © 2017 American Society of Human Genetics Terms and Conditions

Figure 1 Rationale and General Design of DOMINO (A) A typical exome analysis identifies 20,000 variants, when compared to the human reference genome. After filtering by rarity in the general population (minor allele frequency, or MAF, < 1%) and by functional impact of each variant, approximately 400 DNA changes remain. These impact 300–400 genes, heterozygously (red dots), and 5–10 genes when they are present as homozygous or compound heterozygous variants (blue dots). (B) Workflow of DOMINO methodology, showing the different steps of gene selection, annotation, and scoring. (C) Details of the LDA algorithm. Relevant features are first preselected and then removed, replaced or added iteratively to the model, with specific acceptance criteria. 10 × 10-fold cross-validation is performed at each iteration. (D) Performance of the model as a function of the iterations performed. AUCs of the training, testing and validation sets, as well as the number of features at each iteration are shown. The cut-off value retained corresponded to the 14th iteration and a set of 8 features. The model converges starting from the 36th iteration. (E) ROC curves for the complete training, testing and validation sets, displaying AUC values of 0.912, 0.908, and 0.920, respectively. (F) Features composing the selected model. Average values for AD and AR genes of the training set are shown, along with their relative weight. Units are as follows: for STRING entries, number of interactions;17 for ExAC-pRec, probability of being intolerant to homozygous but not heterozygous loss-of-function variants;18 for ExAC-missense Z score, value with respect to a distribution of expected number of missenses;18 PhyloP, average PhyloP score with respect to a 1,000-bp window centered on the TSS;19 ExAC-don./syn., number of variants at the donor splicing site, normalized to the number of synonymous variants in the coding sequence;20 mRNA half-life, 0 if ≤ 10 hr or 1 if > 10 hr.21 The American Journal of Human Genetics 2017 101, 623-629DOI: (10.1016/j.ajhg.2017.09.001) Copyright © 2017 American Society of Human Genetics Terms and Conditions

Figure 2 Distributions of LDA Scores and Probabilities of Being Dominant, P(AD), for Genes in the Training and Validation Sets (A) Density plots of LDA score for AD (red) and AR (blue) genes of the training set. Continuous lines refer to raw values, whereas dashed lines to their normal approximations. (B–F) Histograms of P(AD) for: (B) AD genes of the training set, (C) AR genes of the training set, (D) AD genes of the validation set, (E) AR genes of the validation set, (F) Genes known to behave as false positives in NGS experiments, containing rare, non-pathogenic variants. The American Journal of Human Genetics 2017 101, 623-629DOI: (10.1016/j.ajhg.2017.09.001) Copyright © 2017 American Society of Human Genetics Terms and Conditions

Figure 3 Distributions of P(AD) for Genes with at Least Two De Novo Mutations in Different Individuals with Intellectual Disability or Epilepsy Histograms of P(AD) for (A) 82 genes carrying de novo mutations in 1,010 individuals with intellectual disability or (B) 19 genes carrying de novo mutations in 532 individuals with epilepsy, as extracted from denovo-db. The American Journal of Human Genetics 2017 101, 623-629DOI: (10.1016/j.ajhg.2017.09.001) Copyright © 2017 American Society of Human Genetics Terms and Conditions