Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Genome Annotation: A Protein-centric Perspective.
1 / 30 Data Mining with BioMart
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
The European Bioinformatics Institute (EBI) Toolbox Julie Pellegrini Introduction to Bioinformatics.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Lecture 7.11 The Ensembl Database Erin Pleasance Steven Jones Canada’s Michael Smith Genome Sciences Centre, Vancouver.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
How to access genomic information using Ensembl August 2005.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
UniProt - The Universal Protein Resource
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
WormBase Workshop: 2015 International C. elegans Meeting Tools & Resources InterMine / WormMine – Chris Grove JBrowse – Scott Cain The WormBase Ontology.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
Data Mining in Ensembl with BioMart Nov,
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Copyright OpenHelix. No use or reproduction without express written consent1.
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Motif discovery and Protein Databases Tutorial 5.
Copyright OpenHelix. No use or reproduction without express written consent1.
Data Mining in Ensembl with BioMart Giulietta Spudich.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Lecture/Lab 7.31
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Data Mining with BioMart
Visualization of genomic data
ID Mapping tools: Converting Accessions between Databases
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
with the Ensembl Genome Browser
Welcome to the GrameneMart Tutorial
Gene Safari (Biological Databases)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Problems from last section
Welcome - webinar instructions
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View

Data Mining in Ensembl with EnsMart August 2005

All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes associated with SNPs Possible queries…

Human genes with upstream regions conserved w.r.t. mouse Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non- synonymous SNPs. More specific queries

Normalised Each data point stored only once Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining questions Ensembl core database

BioMart and EnsMart Large-scale data retrieval tool Query builder interface Databases: Ensembl, SNP, Vega, (MSD, UniProt) Associated features or sequences Flexible output formats

De-normalised Tables with ‘redundant’ information Query-optimised Fast and flexible designed for data mining Mart database

Primary Data Sets Ensembl genes SNP –Single nucleotide polymorphisms –Deletion-insertion polymorphisms –Short tandem repeats Vega genes (MSD protein structures) (UniProt proteomes)

Secondary Data Sets Markers Diseases Gene ontology Gene expression information Homology predictions Protein annotation

SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION REFSEQ INTERPRO GO SWISSPROT EMBL AFFY REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION FASTA FILE EXCEL TEXT GTF HTML startfilteroutput Information flow

BioMart

BioMart - Features

BioMart - Sequences

Output formats HTML

Direct database access at ensembldb.ensembl.org martdb.ebi.ac.uk MySQL client Download MySQL for Windows File: wmysr11.zip What about queries not possible to do in EnsMart

Based on bioperl Ensembl modules For an introduction, see the tutorial at: Access via Perl object API

There are other ways… MartShell Commandline interface to Mart written in Java. It works with a Mart Query Language

MartExplorer