Introduction to Genes and Genomes with Ensembl

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

What is RefSeqGene?.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Bunu databases’in icine koy lecture 5i de sonuna
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
On line (DNA and amino acid) Sequence Information
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Introduction to Bioinformatics Introduction to Databases
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
Organizing information in the post-genomic era The rise of bioinformatics.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Wilson Leung08/2015.
The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
What is BLAST? Basic BLAST search What is BLAST?
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
Chapter 2: Access to Information Jonathan Pevsner, Ph.D.
Introduction to Bioinformatics
The Transcriptional Landscape of the Mammalian Genome
Retrieving Information: Using Entrez
Archives and Information Retrieval
Figure 1. Number of CCDS IDs and genes represented in the human (A) and mouse (B) CCDS releases. The X-axis indicates the year in which a CCDS dataset.
Functional Annotation of the Horse Genome
Mangaldai College, Mangaldai
Access to Sequence Data and Related Information
Genomes and Their Evolution
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
Next Generation Sequencing and Human Genome Databases
Gene Safari (Biological Databases)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Biological Databases.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Introduction to Genes and Genomes with Ensembl

Large amounts of raw DNA sequence data CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAAC ACGATCACTTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCT GGTAATTGCTGTATTCCGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGC CACGCGTCACTGGTTAGCGTGATTGAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTA ATGTGGCTAGTGGCACCGGTTTGGACAGCACAGCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGG GACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGAGTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGT GCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGAATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAG AGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAAATTTTATATAACTTTATAAATTACACCG AGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCATAAACTGTTCCTTATGTGTGT ATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATAAAGAGGAAACTG GCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTTTTGG TTTCAGTACTCCTTATACTCT AACTAAGAATTTAAGGCTGGG CCAGAAGTTTGAGACCAGCCT GTGCCTGTAATCCCAGCTACA ATGCCACTGCACTCTAGCCTG TAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCT CGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAG GGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAAT CGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAG GGCCACATAGCATGACTCTGTCTCAAAACAAACAAACAAACAAA Large amounts of raw DNA sequence data TACCATATTAGAAATTTAA GTGGGCGGATCACTTGAGG GTGCTGCGTGTGGTGGTGC GTTGCAGTGAGCCAAGATC AAACTAAGAATTTAAAGTT AATTTACTTAAAAATAATGAAAGCTAACCCATTGCATATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAG TGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCTAAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTG TTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTTTGGTA TTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTCCCATGTCGCAACATCATGCAGTGA TTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTATTTTAAAAATCTCATT CATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCATTTGAAAG TTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAAT TATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGC TCAACATGAGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTAT TAAAAGAAGTGCTAAAGCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTAT TACTATTATTTTTAACAAGGACACTCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATG CAAATGTGCCAGCAGTTTTACCCAGCATCATCTTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGA ATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCCGA

Making Sense out of Sequence … http://www.ensembl.org http:// www.ncbi.nlm.nih.gov/mapview http://genome.ucsc.edu

The Ensembl genome browser: making it interesting The ENCODE (ENCyclopedia Of DNA Elements) project Science 306: 636-640 (2004) Genes Variation Regulatory elements 9

Vertebrate species on Ensembl Mostly vertebrates                    

Non‐vertebrates on Ensembl genomes Fungi Bacteria Protists Metazoa Plants www.ensemblgenomes.org

Ensembl and EnsemblGenomes

Ensembl gene models Automatic annotation Manual annotation

Automatic gene annotation Genome-wide determination using the Ensembl automated pipeline Predictions based on the genomic sequence (ab initio) Predictions based on experimental (biological) data ESTs RNAseq data cDNA and protein alignments (from sequence DBs)

Biological Evidence International Nucleotide Sequence databases Protein sequence databases Swiss-Prot: manually curated TrEMBL: unreviewed translations NCBI RefSeq Manually annotated proteins and mRNAs (NP, NM)

Manual gene annotation Gene determination on a case by case basis by a curator • Genome-wide Genes list h v

Ensembl automatic annotation

Automatic annotation Many species (>60) Genome-wide at once Manual annotation Few species (Hs, Mm, Dr) Gene-by-gene

Golden transcripts Identical annotation • Higher confidence and quality gf 3’ UTR 5’ UTR UTR Intron Exon Exons are drawn as boxes – filled boxes are coding and unfilled boxes are untranslated. Introns are drawn as lines.

CCDS transcripts Consensus coding DNA sequence set Agreement between EBI, WTSI, UCSC and NCBI • http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi CCDS transcript vg

Higher quality transcripts CCDS transcripts (protein-coding only) Ensembl/Havana merged transcripts Both a limited number of species

Ensembl stable IDs ENSG########### Ensembl Gene ID ENST########### Ensembl Transcript ID ENSP########### Ensembl Peptide ID ENSE########### Ensembl Exon ID For non‐human species a suffix is added: MUS (Mus musculus) for mouse ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###

NCBI http://www.youtube.com/ncbinlm Go to www.youtube.com Search “NCBI tutorial general”

The National Center for Biotechnology Information Bethesda,MD Created in 1988 as a part of the National Library of Medicine at NIH Establish public databases Research in computational biology Develop software tools for sequence analysis – Disseminate biomedical information

Three international nucleotide sequence databases

Selected NCBI Databases Biomedical literature PubMed free Medline PubMed Central full text online access NCBI Bookshelf online biomedical textbooks Biomolecular Databases Nucleotide GenBank submitted sequence records RefSeq curated NCBI reference sequences Protein GenBank and RefSeq translations, outside protein dbSNP small scale genetic variations Structure biomolecular 3-D structures MMDB NCBI’s 3D structure database GEO microarray expression data SRA next-generation sequence data

GenBank & RefSeq

RefSeq: NCBI’s Derivative Sequence Database Experimentally verified / curated transcripts and proteins NM_, NP_ accession numbers Model transcripts and proteins XM_, XP_ accession numbers Assembled Genomic Regions (contigs) NT_, NW_ accession numbers Chromosome records NC_, AC_ accession numbers RefSeqGene Records NG_ accession numbers (NG_ also used pseudo genes and other fixed genomic sequences) Draft whole genome shotgun assemblies (microbial) NZ_ accession numbers Microbial proteins NP_, YP_, ZP_ accessions

UCSC Genome Browser https://genome.ucsc.edu/

GeneCards http://www.genecards.org/