ClinVar A system for maintaining medically relevant variation data

Slides:



Advertisements
Similar presentations
Support.ebsco.com Nursing Reference Center Tutorial.
Advertisements

The Diagnostic Laboratory ……the ideal system……. Molecular Genetics Diagnostic Laboratory Exciting area of medical pathology Need to continually up-date.
What is RefSeqGene?.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
PhenCode: Connecting genome to phenotype Belinda Giardine Cathy Riemer Ross Hardison Webb Miller Jim Kent PSU and UCSC.
SMART/FHIR Genomic Resources An overview... For latest see
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
PubMed/How to Search, Display, Download & (module 4.1)
Introductory Overview
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
LexEVS 6.0 Overview Scott Bauer Mayo Clinic Rochester, Minnesota February 2011.
Copyright OpenHelix. No use or reproduction without express written consent1.
Integrating and managing your Engaging Networks data Top ten data features.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
Copyright OpenHelix. No use or reproduction without express written consent1.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Copyright OpenHelix. No use or reproduction without express written consent1.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Copyright OpenHelix. No use or reproduction without express written consent1.
PhenCode Connecting Genotype and Phenotype. HbVar: Hemoglobin variants and thalassemia mutations Began as Prof. Titus Huisman’s Syllabus of Hemoglobin.
Copyright © 2010 Pearson Education, inc. or its affiliates. All rights reserved. Texas Assessment Management System.
Copyright OpenHelix. No use or reproduction without express written consent1.
Michael Feolo Outline  What is dbGaP  How to get your study registered  How to submit data  Not Covered  SRA Submission.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
GIAB: Genome reference material development resources for clinical sequencing Chunlin Xiao 1, Justin Zook 2, Shane Trask 1, Melissa Landrum 1, Marc Salit.
SciENcv: a Federal biosketch tool NIH Regional Meeting October 2016 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
database of Genotype and Phenotype
T3/Tutorials: Data Submission
Melissa Landrum, Ph.D. BD2K All Hands Meeting 2016 Nov. 30, 2016
Getting GO annotation for your dataset
Genetic Testing in Human: The Clinician’s Perspective ncbi
Role of Genetic Databases in Risk Assessment and Resources for Scientists, Clinicians, Genetic Counselors, and Patients Donna Maglott, Ph.D. Senior Staff.
Using ArrayExpress.
Tutorial support.ebsco.com.
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
Critical Reading of Clinical Study Results
PubMed Database Interface (Basic Course Module 4 Part A)
Department of Genetics • Stanford University School of Medicine
Mental Health Data Alliance, LLC (MHData) June 7th , 2018
Functional Annotation of the Horse Genome
Gene Expression Omnibus (GEO)
Major Databases/Portals
Deep Phenotyping for Deep Learning (DPDL): Progress Report
ID Mapping tools: Converting Accessions between Databases
Welcome to the Gene and Allele Database Tutorial
Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings  Ira M. Lubin, Nazneen Aziz,
Welcome to the Quantitative Trait Loci (QTL) Tutorial
Updates and Future Direction
Gramene’s Ontologies Tutorial
Metadata The metadata contains
PubMed.
TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction.
PubMed Database Interface (Basic Course: Module 4)
Databases and Information Management
Welcome - webinar instructions
Community-Engaged Partnership Database: VCU’s Commitment to Community Engagement
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

ClinVar A system for maintaining medically relevant variation data Donna Maglott, Ph.D Senior Staff Scientist, NCBI

Why develop ClinVar? Capacity for identification of human variation has increased Not all variation is published in the literature Evaluation and review is facilitated by centralization and standardization Programmed access to data by sequence coordinates, in addition to text, is critical ClinVar is designed to facilitate access to published and unpublished sequence variation. Data are freely accessible, and records can be retrieved by text-based well as sequence-based searching.

Current interpretation Why develop ClinVar? Variant calls There are many questions to be asked and variables to be considered between calling variations, and identifying enough about what has been reported about a variation to generate a reliable current interpretation. Current interpretation

Current interpretation Why develop ClinVar? Variant calls ClinVar (screen shot of a preliminary report page) is designed to ease that path. Current interpretation

Why develop ClinVar at NCBI? Integrate with dbSNP and dbVar and dbGaP Integrate with Genomic Reference Consortium (GRC) Integrate with NIH Genetic Testing Registry (GTR) Integrate with variation analysis tools And PubMed, Gene, MedGen, RefSeqGene, RefSeq… NCBI has multiple databases and established tools to support the information being accumulated around submissions to ClinVar. These include the databases that manage information about the locations of sequence variation (dbSNP for single nucleotide variations and length variations less than about 50 nucleotides) and dbVar for longer variations), dbGaP ( controlled access to variation associated with phenotype), tools to understand the current reference (GRC) and alternate assemblies of the human genome, the NIH Genetic Testing Registry (GTR), which maintains a voluntary registry of tests offered by laboratories doing genetic testing, and all the other databases NCBI has long offered such as PubMed, multiple sequence databases especially RefSeq, Gene, and NCBI’s new resource to manage phenotypic terms and information connected to those terms, namely MedGen.

What is ClinVar? A central registry of variation observed in phenotyped individuals A repository of the aggregate data necessary to (re)interpret the significance of that variation relative to a disorder A versioned archive of interpretation of significance of single variants and combinations of variants Freely accessible, curated and interpreted information layered on top of dbGaP, dbSNP, and dbVar A bridge among terminology, publications, and the genome A data distributor: browser/ftp/API ClinVar is a registry of data provided by submitters, connecting information about human variation observed in phenotyped individuals. ClinVar does not interpret the data, other than to calculate the molecular consequence of a sequence change at a particular location (e.g. missense, nonsense) , but processing will be done to standardize terminology and report of location of the variation in multiple sequence coordinate systems. Submitters own their data, and are encouraged to update their records as new data accumulate or interpretations change. In that case, a new version is assigned. ClinVar will also calculate connections among records in the database, other databases at NCBI, and external databases. Data in ClinVar is available without restriction by multiple mechanisms, including a website, ftp files, and automated programming interfaces (API).

Data Elements Variation Alleles as submitted, but converted to HGVS to report as differences relative to reference nucleotide sequences Allele frequencies Identifiers in dbSNP and dbVar Identifiers from submitting databases (e.g. OMIM, LSDB, PharmGKB) Official and alternate names Genomic location (mapped to new genome assemblies as required) Phenotype Specificity: measured traits->specific diagnostic names-> indications for testing Qualifiers: age of onset, age of observation Terminology: Terms integrated from multiple authorities, grouped in concepts based on Unified Medical Language System® (UMLS®) with supplements such as the Human Phenotype Ontology (HPO). Consolidated as identifier in MedGen The next few slides provide an overview of the types of information ClinVar is structured to manage. For full details of ClinVar’s data dictionary, please review the documentation on ClinVar’s ftp site and help documents on the website.

Data Elements Evidence Study elements: sample size, family structure, ethnicity, methods for data capture, methods for data analysis, animal models, heritability, tissues sampled Observations: number of occurrences , with co-occurrences Primary data or a meta-analysis Interpretation For alleles being assessed in a genetic model: Pathogenicity in the context of mode of inheritance and maternal/paternal origin Risk / predisposition Molecular consequence (missense/nonsense/splice site) Functional consequence (loss of function/gain of function/dominant negative) Date last evaluated Validation level: from single study , multiple studies, to practice guideline

NCBI Data flows ClinVar Testing laboratories, LSDB, Authors… Expert review Novel variations submitted to ClinVar will be submitted to dbSNP/dbVar ClinVar provides a layer of submitted interpretation not provided by dbSNP/dbVar ClinVar ClinVar is structured to exchange data with other variation and phenotype-specific resources within NCBI as well as external databases. dbSNP <~50 bp dbGaP dbVar (>~50 bp NCBI

Review of ClinVar records Review by domain experts is facilitated by accessing information from ClinVar ClinVar Expert review ClinVar does not provide the expert review, but serves as a data feed to groups reviewing evidence, and a repository for the results of that review. Result of the review accessioned for reporting from ClinVar

Accessions and versions maintain historical interpretations per submitter Accession.version Example SCV000000001.1 Submitter A reports NM_000000.1:c.5A>C is pathogenic for condition A SCV000000002.1 Submitter B reports NM_000000.1:c.5A>C is a variation of unknown significance for condition A SCV000000002.2 Submitter B provides an update, now reporting that NM_000000.1:c.5A>C is benign with respect to condition A Aggregation by NCBI or submitters RCV000000001.1 Combines data from SCV000000001.1 and SCV000000002.2, indicating that reports of clinical significance are not consistent RCV000000002.1 Authoritative group evaluates all evidence and provides the consensus report of the clinical significance of NM_000000.1:c5A>C for condition A This slide summarizes how ClinVar assigns accessions and versions. There are two types of accessions assigned by ClinVar. The type beginning with SCV represents what the submitter provided. The unit record is the (set of) variations observed in a set of individuals with the same phenotype, with an optional interpretation of the clinical significance. The set of variations is not the complete genotype of the individual, but rather the alleles, because of their rarity and gene association, are being interpreted for relationship to a phenotype. The RCV accession is an aggregate based on the set of submitters’ accessions (SCV) with the same set of phenotypes and alleles. The RCV accession can either be generated by NCBI, when NCBI aggregates data from multiple groups, or by a submitter that represented an authoritative body submitting practice guidelines.

Accessing the data

Accessing ClinVar via Variation Reporter Select the assembly Upload a file or paste into a box Data in ClinVar is used in, and can be accessed by, multiple tools at NCBI. One is Variation Reporter, which allows you to upload a file of variant calls (VCF, GVF, BED, HGVS) and generate a downloadable report of what might be known about those locations on the genome, and, if a sequence change is reported, the molecular consequence of that allele. http://www.ncbi.nlm.nih.gov/variation/tools/reporter/

http://www.ncbi.nlm.nih.gov/variation/tools/reporter/ Processing summary Links to Help documents, FAQ, and an API are provided at the upper right. If there is a report that there are submissions in this dataset from ClinVar, you can see the data in the download, or click on Yes to see a popup like this summarizing the phenotype (in this case a diagnosis) and links to databases within NCBI and external to NCBI.

Accessing ClinVar data by ftp There is a link from ClinVar to vcf files for the subset of data in ClinVar that is linked to data in dbSNP. There is a backlog of data in ClinVar that has not yet been mapped to nucleotide sequence coordinates; these will not be reported in the vcf file.

vcf files www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf Common alleles with no known clinical significance Common alleles with clinical significance Variations submitted from groups with clinical interest (clinical channel) ClinVar subset updated weekly The path at the top of the slide is the source of documentation describing the vcf files. These files are designed to help filter out common variation that is not known to be associated with a phenotype (common_no_known_medical_impact) as well as provide a report of the variation in ClinVar, pathogenic or not, that has been assigned rs numbers (clinvar).

Accessing ClinVar data by ftp XML Monthly extraction of the database Comprehensive weekly updates

Accessing ClinVar by browsing ClinVar is currently searchable by multiple fields at the preview site. http://preview.ncbi.nlm.nih.gov/clinvar

Data from SCRP Query by submitter 902 results A sample query for data from one submitter, in this case the ‘sharing clinical report project’ SCRP. Some of these data overlap with data in ClinVar based on allelic variations in OMIM. In this case, the interpretation in the RCV accession is reported as conflicting, because according to the rules provided by OMIM, their records are treated as ‘risk factors’ not ‘pathogenic’ It is this type of conflict in reporting that needs review.

Data only from SCRP An expanded result showing the summary data that are included in the result set. Plans are underway to report this as a table, and to add the latest date the significance was reviewed.

Current detailed display The current record-specific overview includes a genome view, indicating the location of the variation, and summary data about each submission. An ‘Evidence’ tab will be added shortly, summarizing the observations from all submitters. A submission-specific full display will also be provided, based on clicking on the SCV accession (e.g. in this case SCV00000035180) Overview information provided in a tabbed format

We thank our submitters! Testing laboratories Number of variations Curated /Published Sources Number of Genes Ambry 10 ClinSeq 36 ARUP 1328 GeneReviews 238 Correlagen/LabCorp 955 LSDB 83 Dana Farber 382 NCBI (partial) 42 ISCA (nstd37) 1167 OMIM 2061 Partners Health Care/Harvard 590 UniProtKB 27 SCRP (BRCA1/BRCA2) 902

Acknowledgements ClinVar/GTR staff dbSNP/dbVar Alex Astashyn Shanmuga Chitipiralla Vichet Hem Douglas Hoffman Wonhee Jang Brandi Kattmann Ken Katz Melissa Landrum Jennifer Lee Adriana Malheiro Michael Ovetsky George Riley Wendy Rubinstein Ray Tully Ricardo Villamarin George Zhou Deanna Church John Garner Tim Hefferon John Lopez Rama Maiti Lon Phan David Shao Ming Ward Submitters Ambry ARUP BIC ClinSeq Correlagen/LabCorp Partners Health Care GeneReviews/LSDB/UniProtKB

Try it and give us your feedback preview.ncbi.nlm.nih.gov/clinvar/ www.ncbi.nlm.nih.gov/clinvar/ www.ncbi.nlm.nih.gov/variation/tools/reporter/ www.ncbi.nlm.nih.gov/medgen/