Download presentation
Presentation is loading. Please wait.
Published byKatherine Cooper Modified over 11 years ago
1
BioMart Query Network Arek Kasprzyk European Bioinformatics Institute 8 January 2005
2
Biological databases Distributed Different format Different focus Different release schedule Scalability factor
4
BioMart
5
Retrieval myDatabase SNPVega EnsemblUniProt myMart MSD BioMart API JAVAPerl MartExplorerMartShellMartView Schema transformation MartBuilder XML MartEditor Configuration Databases Public data (local or remote)
6
MartView
7
BioMart@Ensembl
8
MartShell
9
MartExplorer
10
Database
11
FK PK FK PK Schema
12
FK PK FK Schema
13
FK PK Schema
14
main1 PK1 2 PK2 PK1 FK2 dm FK2 dm FK1 FK2 dm FK1 FK2 PK1 FK1 FK2 PK2 FK1 Schema - reversed star
15
Fixed schema transformation A B TATA TBTB C
16
Schema transformation Central table –Longest n:1, 1:1 path Dimension table –Central transformation around 1:n table. –Link tables are decomposed into a set of 1:n first
17
MartBuilder Input –central object –database meta data –cardinalities Output –Set of SQL statements: create table as select … Transformations –represented as asymmetric tree
18
MartBuilder DATASET: hsapiens_gene_ensembl TYPE MAIN [M] DIMENSION [D] EXIT [E]: M TABLE NAME: gene gene: alt_allele cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: gene cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: gene_description cardinality [11] [n1] [0n] [1n] [SKIP S]: 11 gene: gene_stable_id cardinality [11] [n1] [0n] [1n] [SKIP S]: 11 gene: kk__gene__main cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: transcript cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: analysis cardinality [11] [n1] [0n] [1n] [SKIP S]: n1 gene: dna cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: dnac cardinality [11] [n1] [0n] [1n] [SKIP S]: S gene: seq_region cardinality [11] [n1] [0n] [1n] [SKIP S]: S TYPE MAIN [M] DIMENSION [D] EXIT [E]: E ADD EXTENSION: hsapiens_gene_ensembl__gene__MAIN [Y|N]: N CHANGE FINAL TABLE NAME: hsapiens_gene_ensembl__gene__MAIN TO: CREATE TABLE TEMP0 as SELECT gene.gene_id,gene.type,gene.analysis_id,gene.seq_region_id,gene.seq_region_start,gene.seq_region_end,gene.seq_region_strand,gene.display_xref_id,gene_ description.gene_id AS gene_id_TEMP0,gene_description.description FROM gene, gene_description WHERE gene_description.gene_id = gene.gene_id; CREATE TABLE hsapiens_gene_ensembl__gene__MAIN as SELECT TEMP0.gene_id,TEMP0.type,TEMP0.analysis_id,TEMP0.seq_region_id,TEMP0.seq_region_start,TEMP0.seq_region_end,TEMP0.seq_region_strand,TEMP0.dis play_xref_id,TEMP0.gene_id_TEMP0,TEMP0.description,gene_stable_id.gene_id AS gene_id_TEMP1,gene_stable_id.stable_id,gene_stable_id.version FROM TEMP0, gene_stable_id WHERE gene_stable_id.gene_id = TEMP0.gene_id; drop table TEMP0;
19
Transformation configuration satellog_repeats M repeats disease n1 satellog_repeats M repeats gc 11 satellog_repeats M repeats linkage_depth S satellog_repeats M repeats repeats S satellog_repeats M repeats transcripts S satellog_repeats M repeats ugcount S satellog_repeats M repeats ugstats S satellog_repeats M repeats rep_class n1 satellog_repeats D ugcount ugcount S satellog_repeats D ugcount ugstats S satellog_repeats D ugcount gc S satellog_repeats D ugcount repeats n1r
20
Data access
21
Dataset – Key Abstraction Dataset –Organised into a single schema –BioMart database contains one or more dataset(s) –Attribute –Filter –Exportable/Importable (Links) Dataset - an equivalent of relational table –Exportable/Importable = PK/FK
22
Key Abstractions GENE CENTRAL gene_id(PK) gene_stable_id gene_start gene_chrom_end chromosome gene_display_id description MartDataset Attribute Filter
23
Exportables, Importables and Links Exportable = ordered list of attributes Importable = ordered list of filters –WHERE filt1=value1 –WHERE filt1=value1 or filt1=value2 –WHERE filt1>value1 and filt2<value2 Links = matching importable and exportable
24
MartView
25
Dataset Configuration Dataset configuration Attributes Filters Trees, Groups, Collections Links Semantics Relational mapping User interface Linking datasets XML-based
26
Dataset Configuration XML
27
Table naming convention Naïve configuration Tables –Meta tables meta_content –Data tables dataset__content__type Data tables –Main __main –Dimension __dm Columns –Key _key –Boolean filter _bool –List filter _list
28
MartEditor
29
Naïve configuration Updates Links Automatic discovery of new tables
30
Class diagram - configuration
31
Class diagram - querying
32
Information flow Read connections Register individual datasets and create linked datasets Get input from the user, split queries to individual datasets. Find the shortest path between datasets (Dijikstra) Compile SQL
33
Summary
34
BioMart Domain independent Platform independent –MySQL 4 –Oracle 9i Plugin architecture
35
BioMart model Already applied –Ensembl –Vega –dbSNP –Uniprot –MSD –Variety of small projects In development –ArrayExpress –Wormbase –RGD
36
Future work BioMart v 0.2 to be released later on in january Java library to be upgraded over coming months to the new architecture BioMart has been integrated with Taverna MartBuilder - to be properly implemented
37
BioMart www.ebi.ac.uk/biomart Open source (LGPL) Public MySQL server ftp mart-dev@ebi.ac.uk mart-announce@ebi.ac.uk
38
Acknowledgments BioMart –Damian Smedley –Darin London Contributors –Arne Stabenau (Ensembl) –Andreas Kahari (Ensembl) –Craig Melsopp (Ensembl) –Katerina Tzouvara (Uniprot) –Paul Donlon (Unilever) –Will Spooner (CSHL)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.