What is RefSeqGene?.

Slides:



Advertisements
Similar presentations
PubMed/How to Search, Display, Download & (module 4.1)
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
On line (DNA and amino acid) Sequence Information Lecture 7.
Integrating dbSNP with P. falciparum genome resources.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
How to access genomic information using Ensembl August 2005.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
A Study of Cystic Fibrosis Using Web-Based Tools Anuradha Datta Murphy Graduate Student, Dept. of Molecular and Integrative Physiology, University of Illinois.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
PubMed/How to Search, Display, Download & (module 4.1)
NGS Analysis Using Galaxy
On line (DNA and amino acid) Sequence Information
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 1.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
PhenCode Linking Human Mutations to Phenotype. PhenCode Brings the deep information on genotypes and phenotypes in locus specific databases (LSDBs) into.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
SAGExplore web server tutorial for Module II: Genome Mapping.
Locus Reference Genomic (LRG) Sequences Raymond Dalgleish Department of Genetics University of Leicester.
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Copyright OpenHelix. No use or reproduction without express written consent1.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
SAGExplore web server tutorial for Module I: Genome Explore.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Sackler Medical School
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Introduction to Genes and Genomes with Ensembl
ClinVar A system for maintaining medically relevant variation data
Role of Genetic Databases in Risk Assessment and Resources for Scientists, Clinicians, Genetic Counselors, and Patients Donna Maglott, Ph.D. Senior Staff.
The NCBI Annotation Pipeline
Figure 1. Number of CCDS IDs and genes represented in the human (A) and mouse (B) CCDS releases. The X-axis indicates the year in which a CCDS dataset.
gene-CENTRIC database
Searching the NCBI Databases
BIOL 3020 GENOMIC BIOLOGY LABORATORY 05A
TAMU Bovine QTL db and viewer
Basic Local Alignment Search Tool
Gene Safari (Biological Databases)
Welcome - webinar instructions
Presentation transcript:

What is RefSeqGene?

Introductions RefSeqGene http://www.ncbi.nlm.nih.gov/refseq/rsg Donna Maglott, Ph.D. Worked on NCBI RefSeq since 1998 Worked on NCBI Gene (via LocusLink) and RefSeq since 1998 Worked on RefSeqGene since 2007 (thank you Dr. Gulley) Increasing emphasis on resources for medical genetics (including PheGenI, ClinVar, and the Genetic Testing Registry, GTR) RefSeqGene http://www.ncbi.nlm.nih.gov/refseq/rsg http://www.ncbi.nlm.nih.gov/nuccore/?term=refseqgene

Outline What is RefSeqGene? How can I find a RefSeqGene sequence of interest? How does a RefSeqGene record help me? What tools are available? Using RefSeqGene in a sample workflow  

What is RefSeqGene? Gene-specific genomic sequence Starting about 5 kb upstream and ending about 2 kb downstream of a gene Coordinates do not change when the genome is re-assembled Snippet of a display of http://www.ncbi.nlm.nih.gov/nuccore/NG_008720.1?report=graph

What is RefSeqGene?: A versioned sequence Each sequence is assigned an accession (NG_008720) and a version (1) . If the sequence changes in any way, the version is incremented (2, 3, …), If text changes (e.g. a citation is added), the version does NOT change. Thus, to report a sequence explicitly, the accession AND version must be provided. NG_008720.1

What is RefSeqGene? Both sequence and annotation Locations of the gene the exons the coding region exons represented in other RefSeqs Variation: All Subset from clinical sources Subset associated with publication

Zooming in to view the sequence and annotation… At a location, set a marker or mouse over a box to see more details such as HGVS expressions, identifiers in dbSNP, links to more tools

What is RefSeqGene? Collection of curated gene-specific genomic sequences requested from, or with feedback, from Locus-Specific Databases (LSDB) May differ from current reference genome to represent the allele accepted as the reference May include sequence not currently represented in the genome May retain multiple differences from the genome if an accepted standard Partner in Locus Reference Genome (LRG) collaboration Exact nucleotide-nucleotide match between a version of a RefSeqGene and an LRG accession Exact nucleotide-nucleotide match between RefSeq cDNAs (NM_) and transcripts (t) annotated on the LRG Exact amino acid match between RefSeq proteins (NP_) and proteins (p) annotated on the LRG http://lrg-sequence.org

A RefSeqGene that is also an LRG NG_0011906.1 LRG_8 NM_006920.4 t1 NP_008851.3 p1

Outline What is RefSeqGene? How can I find a RefSeqGene sequence of interest? How does a RefSeqGene record help me? What tools are available? Using RefSeqGene in a sample workflow  

Query NCBI refseqgene (no spaces)

Query Gene or Nucleotide by refseqgene AND genesymbol (e. g Query Gene or Nucleotide by refseqgene AND genesymbol (e.g. HFE AND refseqgene) In Gene (only a subsection displayed here) In Genomic regions, transcripts, and products, select RefSeqGene In the Links section, follow the link RefSeqGene to Nucleotide

From the RefSeqGene site: http://www.ncbi.nlm.nih.gov/refseq/rsg

RefSeqGene: Browse, with an option to apply filters Displays gene symbol, full name, GeneID, LRG accession, RefSeqGene accession and version, MIM numbers, and associated disorders, with links

RefSeqGenes as standards in LSDB

Outline What is RefSeqGene? How can I find a RefSeqGene sequence of interest? How does a RefSeqGene record help me? What tools are available? Using RefSeqGene in a sample workflow

How does a RefSeqGene record help me? Provides standard genomic sequence for reporting variation Because genomic, reporting of location is not affected by splice variants or identification of the start codon Reporting of location is independent of re-assembling of the human genome Used within NCBI to anchor information about common and rare variation Provides links to records in LSDB Provides links to the literature via GeneReviews, OMIM, PubMed Provides links to variation currently being tested Reported in tools, such as BLAST, Clinical Remap, and Variation Reporter

How does a RefSeqGene record help me? May be referenced in practice guidelines Excerpt from recent correspondence requesting an LRG accession based on a RefSeqGene record: …We are now revising the best practice guidelines, and would ideally like to be able to reference an LRG for the … locus…

HGVS expressions using RefSeqGene (cut from variation pages in dbSNP) Within a coding region CFTR Intronic SLC46A1 Flanking and intronic SEPT9

Outline What is RefSeqGene? How can I find a RefSeqGene sequence of interest? How does a RefSeqGene record help me? What tools are available? Using RefSeqGene in a sample workflow  

RefSeqGene BLAST Use BLAST to find the RefSeqGene matching your query and determine if any differences correspond to known variation Submit one or more sequences Display results on the graphical display Compare any differences in your sequences to annotated variation

Tools: Clinical Remap www.ncbi.nlm.nih.gov/genome/tools/remap Interconverts chromosome and RefSeqGene coordinates

http://www.ncbi.nlm.nih.gov/variation/tools/reporter/

Variation Viewer http://www.ncbi.nlm.nih.gov/sites/varvu?gene=CFTR Accessed from Gene dbSNP Annotation

Outline What is RefSeqGene? How can I find a RefSeqGene sequence of interest? How does a RefSeqGene record help me? What tools are available? Using RefSeqGene in a sample workflow  

Using RefSeqGene as your standard obtain a genomic sequence result, find the RefSeqGene standard, and compare http://blast.ncbi.nlm.nih.gov/blast/ Scroll down (Also found from RefSeqGene home page)

Submit obtained sequence REMINDERS: more than one sequence can be analyzed at a time Each should be submitted as FASTA, starting with the > symbol

Review the RefSeqGene sequences that were matched (in this case only one)

Review the alignments, and focus on areas where there are differences The alignments are shown in grey, with mismatches marked in different colors

Zoom in, review, identify HGVS expression and follow links to PubMed and variation records for more details

Summary RefSeqGene is a mature resource ~4500 records Part of international collaboration with Locus Reference Genomic/LRG Provides stable reference standard for reporting sequence variation Tightly coupled with RefSeq mRNA and protein sequences Used by public variation databases (dbSNP/dbVar) to report both common variation and rare variation Used to anchor reports of publications and interpretations of clinical significance Part of key tool set for analyzing variants a few at a time (BLAST, sequence viewer) or in large sets (Clinical Remap, Variation Reporter)

Acknowledgements RefSeqGene/LRG staff Ray Tully, Ph.D. Alex Astashyn Andrei Shkeda RefSeq and CCDs curators Staff of dbSNP/dbVar Domain experts

More information http://www.ncbi.nlm.nih.gov/refseq/rsg/ Provides links to RefSeqGene-related tools and documentation rsgene@ncbi.nlm.nih.gov Contact us http://www.youtube.com/ncbinlm Videos on use of our sequence tools and more… http://www.ncbi.nlm.nih.gov/clinvar Aggregating information about medically important human variation. Will be a distinct web resource later this year. http://www.ncbi.nlm.nih.gov/gap/PheGenI Association studies, including expression QTLs

Calculating HGVS expressions open graphical display of a genomic sequence, enter a location , feature, or sequence to which you want to zoom, and press enter (Search and go) http://www.ncbi.nlm.nih.gov/nuccore/238776813?report=graph

Click on the line of the report that indicates your search result, and your display will focus there

Right click on in the region of the display with the ruler, at a point of interest to you and click on Set New Marker at Position

When the marker is exactly where you want it (point or range), right click and select Marker Details A display like this will appear and allow you to read the location in mRNA coordinates (r.), from the AUG (c.) and the protein (p.)