Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.

Slides:



Advertisements
Similar presentations
1 / 30 Data Mining with BioMart
Advertisements

On line (DNA and amino acid) Sequence Information Lecture 7.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
UCSC Archaeal genome browser Advanced browsing September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe September 19, 2006 David Bernick, Aaron Cozen.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
How to access genomic information using Ensembl August 2005.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
A Gentle Introduction to UCSC Genome Browser 陳任志, 游岳齊.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
BioMart Databases made easy Richard Holland European Bioinformatics Institute Helsinki, September 2006.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005.
An Introduction to ENSEMBL Cédric Notredame. The Top 5 Surprises in the Human Genome Map 1.The blue gene exists in 3 genotypes: Straight Leg, Loose Fit.
EnsEMBL Opening up the whole Genome Philip Lijnzaad
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Data Mining in Ensembl with BioMart Nov,
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Copyright OpenHelix. No use or reproduction without express written consent1.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Copyright OpenHelix. No use or reproduction without express written consent1.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Research proposal 2009 信息技术会议 Bioinformatics Analysis & Identification of non-Synonymous SNPs in Candidate Genes for Ascites College of Animal Husbandry.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Lecture/Lab 7.31
Ensembl Database and Web Browser
Data Mining with BioMart
Basics of BLAST Basic BLAST Search - What is BLAST?
ENCODE Pseudogenes and Transcription
What is Bioinformatics?
ID Mapping tools: Converting Accessions between Databases
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Searching the NCBI Databases
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
TargetDB and PEPCDB •
Welcome to the GrameneMart Tutorial
Gene Safari (Biological Databases)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Problems from last section
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Data Mining in Ensembl with EnsMart

2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes associated with SNPs Possible queries…

3 of 24 Specific queries Disease related genes between markers D10S255 and D10S259 Transmembrane proteins with an Ig-MHC domain (IPR003006) on chromosome 2 Genes with associated coding SNPs on chromosomal band 5q35.3 Mouse homologues for human disease genes.

4 of 24 Human genes with upstream regions conserved w.r.t. mouse Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non-synonymous SNPs. More specific queries

5 of 24 EnsMart – vertical and horizontal data integration Ensembl Genes EST Genes Vega Genes SNPs Zebrafish Human MouseAnophelesFugu Rat

6 of 24 Genes EST Markers Diseases Protein Annotation SNPs Homology Expression Ensembl data sets

7 of 24 Data retrieval tool Query builder interface Gene or SNP lists Associated features or sequences Various output formats EnsMart

8 of 24 SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION REFSEQ INTERPRO GO SWISSPROT EMBL AFFY REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION FASTA FILE EXCEL TEXT GTF HTML startfilteroutput Information flow

9 of 24 Species and focus

10 of 24 Restrict your query

11 of 24 Restrict your query

12 of 24 Select output options

13 of 24 Select output options

14 of 24 Output formats HTML

15 of 24 Obtaining sequences

16 of 24 Normalised Each data point stored only once Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining questions Ensembl core database

17 of 24 De-normalised Tables with ‘redundant’ information Query-optimised Fast and flexible Ideal for data mining Mart database

18 of 24 Mart database Arek Kasprzyk Damian Keefe Damian Smedley Darin London Craig Meslopp User interface (MartView) Will Spooner Data and general support The entire Ensembl team Acknowledgements