Sequence Tracking Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 Understanding your sequence context.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Why NCBI Tools are important for breeding plants studies genetically modified organism: the impossibility of intergenic crosses caused by the genetic incompatibility.
Updating the human reference assembly V.A. Schneider, P. Flicek, T. Graves, T. Hubbard & D.M. Church for the Genome Reference Consortium
NCBI Genome Resources Using NCBI Resources for Gene Discovery Kim D. Pruitt Transcriptome 2002 National Center for Biotechnology Information (NCBI) National.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
The Consensus CoDing Sequence (CCDS) Database
GRC Workshop ASHG 22 Oct Outline Reference Assembly Basics GRC: Assembly management and dataflow GRCh38 Accessing the assembly and data
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
hg19 (GRCh37) vs. hg38 (GRCh38) Human Genome Reference Comparison
Tutorial 7 Genome browser. Free, open source, on-line broswer for genomes Contains ~100 genomes, from nematodes to human. Many tools that can be used.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Biological databases.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Genome Assembly and Annotation Erik Arner Omics Science Center, RIKEN Yokohama, Japan
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
Genome Browsing with the UCSC Genome Browser
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
File formats Wrapping your data in the right package Deanna M. Church
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
26th International Mammalian Genome Conference 2012 Bioinformatics Workshop Sunday, October 21, – Location: Tarpon #IMGC2012.
Organizing information in the post-genomic era The rise of bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010.
Databases at UCSC It just *looks* like 200,000 columns.
The EST database is a collection of short single-read transcript sequences from GenBank. These sequences provide a resource to evaluate gene expression,
Sackler Medical School
Data Management and Accessibility Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes.
File 5 MOUSE sequences used in target gene promoter regions to detect NFATc1 protein binding by ChIP.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Genome representation and variant identification Deanna M. Church, NCBI.
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Data Loading into Ensembl Database TGAC Browser
Chapter 2: Access to Information Jonathan Pevsner, Ph.D.
Introduction to Genes and Genomes with Ensembl
Introduction to Bioinformatics
Retrieving Information: Using Entrez
The NCBI Annotation Pipeline
From: RTCGD: retroviral tagged cancer gene database
5' breakpoint in intron 2 (chr19:1,219,187-1,219,238 shown)
Access to Sequence Data and Related Information
Visualization of genomic data
Genome organization and Bioinformatics
Searching the NCBI Databases
Next Generation Sequencing and Human Genome Databases
TAMU Bovine QTL db and viewer
Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings  Ira M. Lubin, Nazneen Aziz,
lincRNAs: Genomics, Evolution, and Mechanisms
Yating Liu July 2018 G-OnRamp workshop
Gene Safari (Biological Databases)
DNA: The Secret of Life By: James Watson Chapter 3: Reading the Code.
Presentation transcript:

Sequence Tracking Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 Understanding your sequence context

What’s in a name? Bob

* * What’s in a name?

Bob Miranda Lydia Samantha What’s in a name? Need more than unique identifier track updates/improvements

chr1 Chr1 1 Chrom1

Mouse chrX: 34,800,000-34,890,000 NC_ CM

Mouse chrX: 35,000,000-36, X MGSCv3MGSCv36

GenBank Data Archives Data in a common format Data in a single location (and mirrored) Most quality checked prior to deposition Robust data tracking mechanism (accession.version) Data owned by submitter

Data tracking ABC J1 GapsPhaseLengthDate FP Oct-2009 FP Oct-2010 FP Nov-2010

Data Archives Initial versions of human and mouse reference assemblies not in INSDC!!* First human version in INSDC: GRCh37 First mouse version in INSDC: NCBI36 * But were tracked by RefSeq

Data Archives INSDC archives track INDIVIDUAL sequences An assembly is a COLLECTION of sequences

hg19 GRCh37 mm8 MGSCv37 NCBIM37 danRer5 Zv7 More naming issues

chr21:8,913,216-9,246,964 Zv7

Zv7 chr21:8,913,216-9,246,964 vs MGSCv36 chrX

GRCh37hg19

Genome Browser Agreement Submitter deposits assembly to GenBank/EMBL/DDBJ Assembly QA Submitter updates assembly based on QA results Browsers pick up assembly from GenBank/EMBL/DDBJ Assemblies must be in GenBank/EMBL/DDBJ

GenBankRefSeq vs Submitter OwnedRefSeq Owned RedundancyNon-Redundant Updated rarelyCurated INSDCNot INSDC BRCA1 83 genomic records 31 mRNA records 27 protein records 3 genomic records 5 mRNA records 1 RNA record 5 protein records

RefSeq for Assemblies Typical assembly edits Addition of non-nuclear (e.g. MT) assembly units Removal of contamination Drop unlocalized/unplaced scaffolds Mask contamination that is placed on chromosome (while preserving coordinate space)

Human assemblies in assembly database

Take home messages Assemblies can (and do) update! Know what assembly your are working on Track by accession.version, not just name Data in INSDC databases are mirrored RefSeq is NCBI specific