Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View
Data Mining in Ensembl with EnsMart August 2005
All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes associated with SNPs Possible queries…
Human genes with upstream regions conserved w.r.t. mouse Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non- synonymous SNPs. More specific queries
Normalised Each data point stored only once Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining questions Ensembl core database
BioMart and EnsMart Large-scale data retrieval tool Query builder interface Databases: Ensembl, SNP, Vega, (MSD, UniProt) Associated features or sequences Flexible output formats
De-normalised Tables with ‘redundant’ information Query-optimised Fast and flexible designed for data mining Mart database
Primary Data Sets Ensembl genes SNP –Single nucleotide polymorphisms –Deletion-insertion polymorphisms –Short tandem repeats Vega genes (MSD protein structures) (UniProt proteomes)
Secondary Data Sets Markers Diseases Gene ontology Gene expression information Homology predictions Protein annotation
SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION REFSEQ INTERPRO GO SWISSPROT EMBL AFFY REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION FASTA FILE EXCEL TEXT GTF HTML startfilteroutput Information flow
BioMart
BioMart - Features
BioMart - Sequences
Output formats HTML
Direct database access at ensembldb.ensembl.org martdb.ebi.ac.uk MySQL client Download MySQL for Windows File: wmysr11.zip What about queries not possible to do in EnsMart
Based on bioperl Ensembl modules For an introduction, see the tutorial at: Access via Perl object API
There are other ways… MartShell Commandline interface to Mart written in Java. It works with a Mart Query Language
MartExplorer