DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.

Slides:



Advertisements
Similar presentations
HL7 Clinical Genomics SIG Jan 22, 2004 Usha Reddy, PhD IBM Life Sciences.
Advertisements

What is RefSeqGene?.
PubMed Review Medical Library Association Annual Meeting May 20 – 22, 2007 Philadelphia.
Introduction to PubMed® (pubmed.gov)
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Recommendations from HL7 Clinical Genomics & Anatomic Pathology Workgroups, NCBI, and LOINC/Lister Hill Center at NLM To the College of American Pathologists.
Medical Genetics 2 nd February 2010 Carrie Iwema, PhD, MLS Information Specialist in Molecular Biology Health Sciences Library System University of Pittsburgh.
Biomedical Informatics Some Observations on Clinical Data Representation in EHRs Christopher G. Chute, MD DrPH, Mayo Clinic Chair, ICD11 Revision, World.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Automatic methods for functional annotation of sequences Petri Törönen.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
PhenCode Linking Human Mutations to Phenotype. PhenCode Brings the deep information on genotypes and phenotypes in locus specific databases (LSDBs) into.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
Copyright OpenHelix. No use or reproduction without express written consent1.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
Improving Curation Efficiency: User Contributions and Textpresso-Based Semi-Automation SAB 2008 WormBase Literature Curators Textpresso.
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Copyright OpenHelix. No use or reproduction without express written consent1.
NCBI Literature Databases: PubMed
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
You can request PRO terms by using the SourceForge PRO tracker (Fig 3A) or by directly contributing to PRO by providing the information in the RACE-PRO.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
PhenCode Connecting Genotype and Phenotype. HbVar: Hemoglobin variants and thalassemia mutations Began as Prof. Titus Huisman’s Syllabus of Hemoglobin.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Copyright OpenHelix. No use or reproduction without express written consent1.
Michael Feolo Outline  What is dbGaP  How to get your study registered  How to submit data  Not Covered  SRA Submission.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Introduction to PubChem BioAssay
Genomic Definition by Reference Mapping
Melissa Landrum, Ph.D. BD2K All Hands Meeting 2016 Nov. 30, 2016
ClinVar A system for maintaining medically relevant variation data
Role of Genetic Databases in Risk Assessment and Resources for Scientists, Clinicians, Genetic Counselors, and Patients Donna Maglott, Ph.D. Senior Staff.
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Major Databases/Portals
Biological Databases BI420 – Introduction to Bioinformatics
Presentation transcript:

DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI

OPPORTUNITIES The medical genetics group is a relatively recent addition to the suite of resources at NCBI, and manages the NIH Genetic Testing Registry (GTR), ClinVar, and MedGen. These databases share the need to standardize representation of genes, proteins, small molecules, variation, conditions, and phenotypes, not only with respect to explicit terms, but also the relationships among those terms. This presentation will focus on opportunities for utilization of PRO in the NCBI’s Medical Genetics group.

CASE STUDIES MEDICAL GENETICS: CLINVAR, GENE, GTR, MEDGEN

A QUICK TOUR From the home page…

USING THE RESOURCE SECTIONS

TRY ALL SECTIONS

MAJOR DOMAINS OF INFORMATION ConceptNCBI database/ResourceUsed in Diseases and their defining features MedGen ( Diseases, Findings… ) ClinVar, dbVar, Gene, GTR, PheGenI, dbGaP DrugsMedGen ( Pharmacologic Substance ) ClinVar, GTR Genes and gene products Gene, Nucleotide, Protein, HomoloGene, RefSeq ClinVar, dbSNP, dbVar, GTR … Biological processes, cellular components, molecular functions ---Gene Interactions and pathways Biosystems, Gene VariationClinVar, dbSNP, dbVarClinVar, dbSNP, dbVar… Records connected by reciprocal, generic links via database identifiers

SOME TALKING POINTS Except for RefSeq, curation minimal RefSeq-based with pointers to UniProtKB Use ontologies to acquire and represent standard terms Point to ontologies, but not used to support node-based query interfaces Capturing primary data that can be used to drive development of ontologies Some user communities think in terms of nucleotide only Data being submitted with uncertain significance Look for opportunities for adding value to NCBI’s databases and tools

GENE AND DATA STANDARDS Name of the gene (nomenclature committees) Names of protein products Primary product (Swiss-Prot) Isoforms (RefSeq) Names of associated conditions (multiple) Descriptions of pathways (submitters) Biological processes, cellular components and molecular functions (GO) HIV interactions (NIAID)

HUMAN MISMATCH REPAIR

RESTRICT TO THOSE REPORTED TO BE DISEASE-CAUSING

Summary Bibliography Interactions Pathways Gene Ontology General protein information Reference sequences Locus-specific databases Phrase found in:

Titles of pathways Descriptions of interactions

GENE PROTEIN

HOMOLOGENE

DISEASES AND PHENOTYPES MEDGEN: UMLS, HPO, OMIM, ORDO, GTR

WHY MEDGEN? A stable node of identifiers within NCBI for disease names, their clinical features, and pharmacological substances Built on the foundation of a subset of UMLS, with supplements from HPO, OMIM (between UMLS releases), and submissions to GTR and ClinVar Primarily automated, but some overview by M.D.s and genetic counselors on staff, and feedback from the community

TERMS FROM UMLS/OMIM/GTR/CLINVAR

HIERARCHIES: CURATED BY GTR STAFF Guided by OMIM’s clinical series and user feedback

HIERARCHIES: COMPUTED FROM NODES IN UMLS

Hierarchy from DNA Repair Deficiency Disorders

USING HPO FOR CLINICAL FEATURES Partial display Organized by top nodes of the ontology Each specific term supports a link to disorders manifesting that feature

CLINVAR: REPORTED VARIATION- PHENOTYPE RELATIONSHIP

Submitter archive (not curated) Variant Disease and/or phenotypes Interpretation Confidence

SUBSET OF A DETAILED RECORD Gene name and symbol Sequence ontology for molecular and functional consequences Diseases Identifiers and links Observed phenotypes (as distinct from those reported to be characteristic of the diagnostic term) Protein change from the variant

DATA SOURCES AND GROWTH

SUBMISSIONS FROM UNIPROT Summarize submissions by genes, diseases, and phenotypes

CURRENT STATUS: CLINGEN-RELATED Diseases Genes Variants Predictions Conserved sequence Conserved domains Pathways

‘PHENOTYPE’ AND CLINGEN/CLINVAR Working group on phenotype Make distinctions among Disease category (body system, metabolic perturbation, cancer) Diagnosis Characteristic features General or gene-specific Diseases targeted by drugs for which the response is genetically determined Observed phenotypes HPO PhenoDB Indications for testing Standardization One ontology or many? Relationship to OMIM

VARIATION AND CLINGEN/CLINVAR Sequence Ontology for variant location and effect Coordinate with PharmGKB for pharmacogenomics Description of haplotypes No discussion yet about authorities for pathways, conserved domains, post-translational modifications

CURRENT STATUS: NCBI Working with UMLS to improve representation of terms and relationships Mapping concepts Reporting relationships Supplement current UMLS with HPO, Orphanet (ORDO, in progress), and recent data from OMIM Working with Clinical Pharmacogenetics Implementation Consortium (CPIC) and PharmGKB Representation of haplotypes/star alleles Drug responses/Disease target Consumer of ontologies to standardize terminology, with definitions Link to resource site Provide attribution Support term-specific queries

CURRENT STATUS: NCBI Queries currently term by term, not by node Some relationships based on links in Entrez Gene disease Disease clinical feature Variation gene Some relationships explicit Genome->transcript->protein Nucleotide change->protein change Some relationships reported as hierarchies GTR MedGen (MeSH) ORDO (in progress)

CURRENT STATUS: NCBI Maintenance primarily automatic Some curatorial review by staff of ClinVar and NIH Genetic Testing Registry (GTR) Expect expanded review from the ClinGen group Data freely available by ftp or E-utilities ftp://ftp.ncbi.nih.gov/pub/clinvar/ ftp://ftp.ncbi.nih.gov/gene/ ftp://ftp.ncbi.nih.gov/pub/GTR/ ftp://ftp.ncbi.nih.gov/pub/medgen/

ACKNOWLEDGEMENTS Slava GorelenkovMedGen Melissa LandrumClinVar Jennifer LeeGTR, ClinVar Terence MurphyGene Lon PhandbSNP/dbVar Kim PruittRefSeq Wendy RubinsteinGTR, MedGen Ming WarddbSNP and all their staff