How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November.

Slides:



Advertisements
Similar presentations
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Advertisements

Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Ensembl Developers Meeting September 2008 Xosé Mª Fernández European Bioinformatics Institute.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Mouse Genome Sequencing
1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
BioMart Databases made easy Richard Holland European Bioinformatics Institute Helsinki, September 2006.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005.
The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME: J. Pevsner
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
An Introduction to Ensembl Presented By Hilary O. Pavlidis.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
1 of 42 Browsing Genes and Genomes with Ensembl Maria Wilbe Department of Animal Breeding and Genetics, SLU, Sweden
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
Data Mining in Ensembl with BioMart Giulietta Spudich.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Kevin Howe, B. Aken, M. Caccamo, Y. Chen, L. Clarke, S. Dyer, G. Coates, T. Cox, F. Cunningham, V. Curwen, T. Cutts, R. Durbin, J. Fernandez-Banet, X.M.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
What is BLAST? Basic BLAST search What is BLAST?
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Lecture/Lab 7.31
Ensembl Database and Web Browser
The Ensembl Database Steven Jones August 18, 2004
Data Mining with BioMart
Functional Annotation of the Horse Genome
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Ensembl Genome Repository.
with the Ensembl Genome Browser
Vector NTI Introduction
Welcome to the GrameneMart Tutorial
Gene Safari (Biological Databases)
Human Genome Project Seminal achievement. Scientific milestone.
Welcome - webinar instructions
Presentation transcript:

How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November 2004

2 of 45 Schedule Today Introduction to the Ensembl system Hands-on examples to introduce the system Evaluating genes and transcripts Variation in Ensembl (SNPs, haplotypes) Tomorrow Data mining with EnsMart Comparative genomics and proteomics in Ensembl BioMart Advanced topics (Upload your own data, DAS)

3 of 45 Our goal

4 of 45 Other ordering data to 26,720 overlapping clones From 325,109 initial contigs Assembly non-redundant, “virtual contig” view

finished BAC draft sequence assembly WGS fragment pUCs avg size 2-4 kb Bentley et al 2001 Bruls et al 2001 McPherson et al 2001 Montgomery et al 2001 Tilford et al 2001 map Osoegawa et al 2001 fragment BACs bacterial artificial chromosomes avg size 150 kb Shizuya et al 1992 Dib et al 1996 Deloukas et al 1998 Mapping and Sequencing the human genome

Status of the human sequence finished red /orange ~96% (99.999% accurate) 30-40% repetitive elements ( eg Alpha satellite, Alu repeats ) All known genes, correctly identified (99.74%) heterochromatin ~4% grey Assembled draft sequence totals 2.85 Gb

7 of 45 Human genome: Current status 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes –1183 genes ‘were born’ in the last My –~ 30 genes ‘died’ in a similar time period Finishing the euchromatic sequence of the human genome, Nature 431: (2004)

8 of 45 Ensembl - project aims funded to provide metazoan genomes to the world aims to provide the world’s best automated genome annotation a leading group for human and mouse analysis all software, data and results freely available

9 of 45 Ensembl - project background group split between EBI and Sanger mainly Wellcome Trust funded largest dedicated compute in biology in Europe developer community > 100 people, including companies

10 of 45 Freely-available Community development. – >51 Ensembl installs worldwide. – Both public and commercial, e.g. Gramene (CSHL)Gramene Fugu-sg (ICMB)Fugu-sg Ciona-sg (Temasek)Ciona-sg Ensembl – Open source

11 of 45 Analysis DB CPU Final DB Supporting Databases SNP Manual Annotation Ensembl

12 of 45 Genome browsing why present the whole genome? Explore what is in a chromosome region See features in and around a specific gene Search & retrieve across the whole genome Investigate genome organization Compare to other genomes

13 of 45 Ensembl – public site + installable system Genome browsers NCBI Map Viewer UCSC Human Genome Browser

14 of 45 Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 34, mouse, rat, Fugu,mosquito adds annotation and links automated process presents all the data on a web site

15 of 45 Known genesNovel genes where? genomic structure? transcripts(s)? protein(s)? orthologues? attach useful links how to predict? require evidence transcripts(s)? protein(s)? orthologues? attach useful links Annotation: genes

16 of 45 Annotation: other features markers and SNPs cytogenetic bands repeated sequences ESTs & other sequence records where do they show sequence similarity? regions homologous to other species

17 of 45 How to get started … … Species homepage Site map Map View Text search BLAST SSAHA Disease View

Homepage

Site map

MapView AnchorView

BLAST and SSAHA

23 of 45 Regions, maps and markers MarkerView SNPView ContigView CytoView SyntenyView MultiContigView

Ensembl ContigView

ContigView close-up Evidence Transcripts red & black (Ensembl predictions) Blue (Vega) Customising & short cuts Pop-up menu

ContigView - Chromosome 20 close-up Manual annotation via Vega Ensembl predictions Ensembl EST-based predictions Forward strand Reverse strand Other chromosomes with manual annotation from : 6, 7, 9, 10, 13, 14, 20, 22, X

CytoView

GeneSNP View

MarkerView SNPView

Synteny View

MultiContig View

32 of 45 Genes & gene products GeneView TransView ExonView ProteinView FamilyView DomainView GOView DiseaseView

Ensembl GeneView

TransView ExonView

Protein View

Family View

GOView

DiseaseView

39 of 45 Data retrieval EnsMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View

EnsMart

42 of 45 Mouse differences Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs BACs are shown in CytoView (FPC map), but for most no sequence is available

Mouse CytoView

44 of 45 Help! context sensitive help pages - click access other documentation via generic home page the helpdesk HelpDesk / Suggestions

45 of 45 Thanks Ensembl Team

Database Schema and Core API Arne Stabenau Yuan Chen Ian Longden Craig Melsopp Glenn Proctor Daniel Ríos Guy Slater Distributed Annotation System Andreas Kähäri Project Leader Ewan Birney (EBI) Tim Hubbard (Sanger) Ensembl Web Team James Stalker Fiona Cunningham James Smith Vega Web Team Patrick Meidl Steve Trevianon Analysis and Annotation Pipeline Val Curwen Steve Searle Dan Andrews Mario Caccamo Laura Clarke Martin Hammond Jan Hinnerck-Vogel Kevin Howe Vivek Iyer Kerstin Jekosch Felix Kokocinski Simon White User Support Xosé Mª Fernández Michael Schuster Comparative Genomics Abel Ureta-Vidal Javier Herrero Sánchez Jessica Severin Cara Woodwark EnsMart & BioMart Arek Kasprzyk Damian Keefe Darin London Damian Smedley Ensembl Team November 2004