Deep Phenotyping for Deep Learning (DPDL): Progress Report

Slides:



Advertisements
Similar presentations
Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking One of the principal goals of biomedical research is to elucidate.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
The Diagnostic Laboratory ……the ideal system……. Molecular Genetics Diagnostic Laboratory Exciting area of medical pathology Need to continually up-date.
Data analytics for better patient genetics
Rockefeller Phenotyping Initiative Translational Key Function Committee 8/3/2010 Laboratory of Blood and Vascular Biology Laboratory of Human Genetics.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Primary Immunodeficiency Disease (PID) PhenomeR (An integrated web-based ontology resource towards establishment of PID E-clinical decision support system)
Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies Signs, Symptoms and Findings: Towards an Ontology for clinical Phenotypes.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Benchmarking Methods for Identifying Causal Mutations Tal Friedman.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Multiple Examples of tumor tissue (public data from Whitehead/MIT) SVM Classification of Multiple Tumor Types DNA Microarray Data Oracle Data Mining 78.25%
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer Utilizing cancer sequencing in the clinic: Best.
CellFateScout step- by-step tutorial for a case study Version 0.94.
Translational Genomics Research Institute | The Sarcoma Data Portal: Making High Content Sarcoma Datasets Available For All Users Jonathan.
University of Crete Department of Computer Science ΗΥ-561 Web Data Management XML Data Archiving Konstantinos Kouratoras.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
A collaborative tool for sequence annotation. Contact:
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
1 A text-mining analysis of the human phenome Marc A van Driel 1, Jorn Bruggeman 2, Gert Vriend 1, Han G Brunner *,3 and Jack AM Leunissen 2 European Journal.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
© 2012 Genomatix GeneGrid finding disease causing variants in NGS data Claudia Gugenmus Genomatix Software GmbH Bayerstrasse 85a
Personalized genomics
Progress on TripalBIMS Breeding Information Management System in Tripal Sook Jung, Taein Lee, Chun-Huai Chen, Jing Yu, Ksenija Gasic, Todd Campbell, Kate.
How do we interpret the variants?. Overview How do we prioritize the filtered variants? What filters can be used to identify the causative variants? What.
From Reads to Results Exome-seq analysis at CCBR
Challenges in interpreting and counseling of Next Generation Sequencing (NGS) results Sara Taghizadeh PhD student of medical genetic in Genetics Research.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Interpreting exomes and genomes: a beginner’s guide
Diagnostics Scientific Committee
CSE 182 Project.
Disease risk prediction
NGS Analysis Using Galaxy
Regulatory Genomics Lab
THE ROLE OF NEXT GENERATION SEQUENCING IN CLINICAL PRACTICE
Breeding Information Management System
An Artificial Intelligence Approach to Precision Oncology
The Human-Mouse: Disease Connection in MGI (BETA)
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Department of Genetics • Stanford University School of Medicine
Pearson Lanka (Pvt) Ltd.
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Major Databases/Portals
ID Mapping tools: Converting Accessions between Databases
CSc4730/6730 Scientific Visualization
An ecosystem of contributions
Face2Gene- DPDL integration
Updates and Future Direction
OpenCRAVAT.
A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease  Damian Smedley, Max Schubach, Julius O.B.
Genome Database for Rosaceae:
TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction.
Regulatory Genomics Lab
How to Effectively Search and Download Data in CottonGen
Project BCHB697.
Lab 2: Information Retrieval
Knowledge-Guided Sample Clustering
Regulatory Genomics Lab
A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease  Damian Smedley, Max Schubach, Julius O.B.
Analysis of protein-coding genetic variation in 60,706 humans
Automating NGS Gene Panel Analysis Workflows
Presentation transcript:

Deep Phenotyping for Deep Learning (DPDL): Progress Report Tzung-Chien Hsieh Institut für Genomische Statistik und Bioinformatik Universität Bonn July 2018

Schilbach-Rott Syndrome Deep Gestalt Facial analysis framework proposed by FDNA which utilizes computer vision and deep learning to quantifies similarities to genetic syndromes by training with over 26,000 patient photos. Schilbach-Rott Syndrome Syndrome Rank 1 First of all, I would like to introduce a facial image analysis technique which is deep gestalt. It is a next-generation phenotyping technique which is proposed by FDNA, and it enables the facial image analysis on patient with rare Mendelian disorders. It utilize deep learning to train the model with more than 26000 patients’ photo and to quantify the similarity to the genetic disorder. Therefore, We can obtain the similarity scores to genetic disorders by uploading our photo.

Support vector machine PEDIA approach Prioritization of Exome Data with Image Analysis Phenomizer P-Score Support vector machine Symptoms Feature Match Symptom Analysis F-Score How can we utilize this technique? In our previous PEIDA study, we integrate the phenotype and genotype information for exome prioritization. The pedia score is the result of a machine learning approach that integrates multiple layers of information, such as symptoms, photo and exome sequecning data. Currently we work with phenomizer, feature scores that are derived from the clinical description of a patient based on HPO terminology. Similarity scores from image analysis come from face2gene. On the molecular level. now we work with the CADD score by annotating the VCF file. We further integrates different scores by Support vector machine. The output is our PEDIA scores. In the end we can simply sort the pedia scores. And We further make our diagnosis based on the rank of pedia scores. Photo Gestalt Match Pattern Recognition G-Score PEDIA Score Exome Variant Filter Variant Scoring CADD Diagnosis

DPDL a framework enabling PEDIA and much more case-based search capabilities for disorders genes features mutations phenotype space exploration To perform PEDIA approach, we need to integrate different layers of information. Therefore, we implemented a database which stores all the similarity scores and genomic data. In order to extend the feature of data or add the different types of data in the future, we have data flexibility. The compiled data, is organized in a way that maximizes it’s searchability. Therefore, here we proposed a way to store the phenotype genotype information in a case-based database. We would like to store the data on case level. It is because we don’t want to lose the clinical feature frequency by aggregating the data to disorder level.

Database structure Phenotypic Scores: The similarity scores of disorders. For example: Gestalt scores, Phenomizer scores Mutation Scores: We annotated the CADD score in VCF file, and store the CADD score of each mutation. Annotations: We store the external annotation database such as Clinvar for variant classification and dbSNP. Disorder to Gene: We connect the disorder to gene relationship by importing the mapping from OMIM. Features: Human Phenotype Ontology (HPO) For example: Intellectual Disability, Seizures HPO term Features Disorder Score Phenotypic Scores Case ID Name Cases Disorder Gene Disorder_to_Gene Once we obtain the dataset. We need to store them into database. Now, I will introduce our database design. At first we will store the Human phenotype ontology annotation in features table for each patient. Mutation Gene Score Mutation Scores Mutation Clinvar dbSNP Annotations

Application (1) – www.dpdl.org Web-based exome prioritization platform Report Case submitter HPO terms F2G LABS Report DPDL Phenotype information VCF How can we use it? First you sent the patients photo and annotate the HPO term in F2g, and we will retrieve the data and store into our databse. At the same time, you will send the blood sample to the NGS facility. Once the sequencing is finished, we will also store the VCF into our databse. Then we can perform our pedia approach. The analysis team will further analyze the prioritization results to generate a report and send back to our databse. Then you can download it on our website. HPO NGS Facility VCF PEDIA score Sample Life&Brain Analysis Team

Application – www.dpdl.org Here we will give a quick demo of our website. Our website is one of the project founded by translate namse. At first you can see your patient list, and also review the patient’s data. Also you will find the results from PEDIA, the genes in top 10 ranks and also the manhatton plot as another visualization. Moreover, you could go into the VCF file to check the mutations in these genes. If you are interested in specific gene, for eample this is a missence variant. you could go into the mutation to review the information from the other external database. We could also follow ACMG guildline and make report in our database. The variant which you classified by ACMG guildline will be store in dpdl and extend the knowledeq base.