Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.

Similar presentations


Presentation on theme: "Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View."— Presentation transcript:

1 Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View

2

3 Data Mining in Ensembl with EnsMart August 2005

4 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes associated with SNPs Possible queries…

5 Human genes with upstream regions conserved w.r.t. mouse Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non- synonymous SNPs. More specific queries

6 Normalised Each data point stored only once Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining questions Ensembl core database

7 BioMart and EnsMart Large-scale data retrieval tool Query builder interface Databases: Ensembl, SNP, Vega, (MSD, UniProt) Associated features or sequences Flexible output formats http://www.ebi.ac.uk/biomart/ http://www.ensembl.org/EnsMart/

8 De-normalised Tables with ‘redundant’ information Query-optimised Fast and flexible designed for data mining Mart database

9 Primary Data Sets Ensembl genes SNP –Single nucleotide polymorphisms –Deletion-insertion polymorphisms –Short tandem repeats Vega genes (MSD protein structures) (UniProt proteomes)

10 Secondary Data Sets Markers Diseases Gene ontology Gene expression information Homology predictions Protein annotation

11 SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION REFSEQ INTERPRO GO SWISSPROT EMBL AFFY REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION FASTA FILE EXCEL TEXT GTF HTML startfilteroutput Information flow

12 BioMart http://www.biomart.org/

13 BioMart - Features

14 BioMart - Sequences

15 Output formats HTML

16 Direct database access at ensembldb.ensembl.org martdb.ebi.ac.uk MySQL client Download MySQL for Windows http://www.winmysql.com/page4.html File: wmysr11.zip What about queries not possible to do in EnsMart

17 Based on bioperl Ensembl modules For an introduction, see the tutorial at: http://www.ensembl.org/info/software/core/ Access via Perl object API

18 There are other ways… MartShell Commandline interface to Mart written in Java. It works with a Mart Query Language

19 MartExplorer


Download ppt "Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View."

Similar presentations


Ads by Google