Identifier mapping: where do I go? Q5S007 ENSG00000188906 ?

Slides:



Advertisements
Similar presentations
Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Advertisements

Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Genome Assembly and Annotation Erik Arner Omics Science Center, RIKEN Yokohama, Japan
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop. ©Gary Larson (In not much detail)
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
UniProt - The Universal Protein Resource
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
Managing Data Modeling GO Workshop 3-6 August 2010.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
DNA PACKAGING. 8 histones make up the nucleosome core DNA wraps twice around the 8 histones Histone 1 helps maintain the nucleosome DNA is negatively.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Notes: Protein Synthesis
Strategies for functional modeling TAMU GO Workshop 17 May 2010.
AP Biology DNA Study Guide. Chapter 16 Molecular Basis of Heredity The structure of DNA The major steps to replication The difference between replication,
DNA TO RNA Transcription is the process of creating a molecule that can carry the genetic blueprint for a particular protein coding gene from the DNA.
Data Mining in Ensembl with BioMart Nov,
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Sackler Medical School
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Supplementary Figure 2A. A. ZMYM6-variant missing Exon 2 C. ZMYM6-variant missing Exon 4 B. ZMYM6-variant missing Exon 5 D. ZMYM6-variant missing Exons.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Research about Alternative Splicing recently 楊佳熒.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Analyzing Promoter Sequences with Multilayer Perceptrons Glenn Walker ECE 539.
DNA Transcription and Translation Review. There are 3 types of RNA: Messenger RNA (mRNA) Ribosomal RNA (rRNA) Transfer RNA (tRNA)
Mahmuda Akter, Paige Fairrow-Davis, and Rebecca Seipelt-Thiemann
Information flow from DNA to trait
RNA and protein synthesis
Data Mining with BioMart
Strategies for functional modeling
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Protein Synthesis.
UniProt: Universal Protein Resource
CRISPR + CAS = Defensive or Immune System
محاضرة عامة التقنيات الحيوية (هندسة الجينات .. مبادئ وتطبيقات)
Chapter 12: From Genes to Proteins
ID Mapping tools: Converting Accessions between Databases
Transcription.
Relationship between Genotype and Phenotype
___ carries _____ ____________.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Protein Synthesis Lecture 5
RNA and protein synthesis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Part II SeqViewer AraCyc Help
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Identifier mapping: where do I go? Q5S007 ENSG ?

EMBL-EBI Using identifiers/accessions The use of identifiers allows for “unambiguous” identifications of molecules and their representation in databases o In reality, they reflect a conceptual entity that might represent one or more molecules  Example: GeneID that reflects every variant/splicing alternative of a given gene – multiple sequences o That leaves space to ambiguity o There is a large number of identifiers that aim to represent the “same” entities  Example: alternative protein IDs (Ensembl protein vs UniProt)

EMBL-EBI Using identifiers: most commonly used accessions o Entrez GeneIDs Gene-centered identifier: DNA consensus sequence, no isoform or variants. o UniProt Represents proteins, taking into account isoforms. Additional identifiers for variants and post-processed chains. o RefSeq Represents sequences of DNA, RNA and proteins. o Ensembl Identifiers that represent genes and their different products: gene, gene tree, protein, regulatory feature, transcript, exon and protein family. o International Protein Index Proteomics reference database (protein sequences). Now obsoleted, but still used in proteomics. o HUGO gene symbols Unique symbols and names for human loci (protein-coding genes, RNA genes and pseudogenes). o Organism centered databases: TAIR, WormBase, SGD…

EMBL-EBI Mapping identifiers: common problems gene ≠ transcript ≠ protein ≠ isoform ≠ clone gene transcript protein isoform genetranscriptprotein transcript gene

EMBL-EBI Mapping identifiers: common problems gene ≠ transcript ≠ protein ≠ isoform ≠ clone gene transcript protein isoform protein transcript genetranscriptprotein transcript gene It’s a model! Models change: identifiers (and sequences!) disappear and get updated It’s “misused”! Example: Gene identifiers are used to represent proteins

EMBL-EBI Mapping identifiers: common problems gene ≠ transcript ≠ protein ≠ isoform gene transcript protein isoform protein transcript genetranscriptprotein transcript gene Solution Know your databases!

EMBL-EBI Mapping identifiers services UniProt ID mapping PICR MatchMiner Ensembl BioMart DAVID GeneID Conversion Tool CRONOS Clone/GeneID Converter Non exhaustive list!

EMBL-EBI Examples of use: UniProt ID mapping service

EMBL-EBI Examples of use: PICR

EMBL-EBI Hands-on: Translate into UniProt accessions Translate the identifiers from the files human_emsemblIDs.txt and human_entrezgeneIDs to UniProt accessions using different mapping tools What differences can you observe in the different services?

EMBL-EBI Hands-on: Translate into UniProt accessions Have a look at the file unknownidentifiers.txt Can you recognize the different identifiers listed there? Try translating the identifiers using different mapping tools. Can you get the whole list translated? What differences can you observe in the different services?

EMBL-EBI