1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI
2 Outline what is MAGE-OM what is ArrayExpress what language is used for modeling MAGE-OM structure ArrayExpress status and future MAGE future developments
3 MAGE-OM MicroArray Gene Expression Object Model –also: MAGE-ML (.. Markup Language), MAGE-STK (..Software ToolKit) Merging of MAML (MicroArray Markup Language) and GEML (Gene Expression Markup Language)
4 MAGE: brief history December initial submissions of proposals to OMG (Object Management Group): –EBI (on behalf of MGED) - MAML –Rosetta (on behalf of GEML community) - GEML + some IDLs –NetGenics - IDLs Decision to proceed with a joint submission Decision to comply with Model Driven Architecture (MDA) principles October joint submission to OMG (Rosetta and MGED)
5 Model Driven Architecture Platform Independent Model (UML) –most of the time spent on this Platform Specific Models –XML UML (refined from PIM) DTD (generated plus hand modifications) –CORBA (not for MAGE) UML (refined from PIM) IDL (hopefully generated) –….
6 ArrayExpress first version (object model) , in collaboration with German Cancer Research Centre (DKFZ) second version (object model) - end of 2000, prototype development funded by Incyte
7 ArrayExpress (2) implementation - first half of Oracle schema, data loader (from MAML), prototype Web interface, a few datasets loaded decision to use MAGE-OM as basis for further development EU funding , 8 new positions
8 ArrayExpress - features MIAME-compliant able to import MAML (MAGE-ML) formatted data can deal with both raw and processed data independence of: –experimental platforms –image analysis methods –data normalization methods object model-based query mechanism supports upcoming OMG standard for expression data
9 Unified Modeling Language graphical language for describing software systems (and more..) notation - yes methodology - no
10 UML diagram types class state collaboration sequence ……..
11 State diagram
12 Sequence diagram
13 Collaboration diagram
14 Class diagram
15 Class diagrams - notation classes attributes –types operations relationships –subclass relationship –aggregate relationship –association role names cardinalities navigation
16 class class from another package attribute aggregation navigation role name cardinality association name inheritance
17 Class diagram
18 Implementation issues Java, C++ - “easy” relational databases –classes - tables –1:1, 1:N - foreign key –N:M - table –subclass relations all subclasses in the same table separate table for superclass and subclasses XML
19 Tools Rational Rose –bad graphical capabilities –forward/reverse engineering –API (VB-based) open source –ArgoUML
20 BSANEBQS Description Protocol Measurement Audit Treatment Transformation BioEventExperiment ArrayDesign BioMaterial BioAssayData BioAssay DesignElement UML Packages HigherLevelAnalysis BioSequence ArrayManufacture QuantitationType
21 Top level structure
22 BioAssay
23 Biomaterial
24 ArrayDesign
25 DesignElement
26 DesignElement
27 DesignElement mapping
28 Data
29 BioSequence
30 ArrayManufacture
31 Quantitations
32 HigherLevelAnalysis
33 BioEvent
34 Protocol
35 Description
36 AuditAndSecurity
37 Measurement
38 ArrayExpress: current status Object model (MAGE-OM) - stable Database schema - generated (standard SQL, we run under Oracle) Data loader from MAGE-ML - generated Web interface (queries, browsing) - under development
39 Near future developments Dedicated hardware for ArrayExpress Good quality data coming from collaborators (annotation tools needed) Data uploading and Web interface made public
40 Future developments Integration with existing tools (Expression Profiler) New analytical tools Links with other databases Data curation, liaison with data providers
41 ArrayExpress architecture central database (experiment-centred) data warehouse application server (Java servlets) Web server image server ArrayExpress curation MAGE-ML API curation tool database
42 MAGE schedule OMG meeting, Dublin, November specification hopefully adopted Mechanism for incorporating changes and user feedback MAGE programming jamboree, EBI, December 6-11: API development, parser generation, annotation tools (MAGE STK)
43 Resources Web site –links to documents presentations UML models –also HTML version and PNG image files of diagrams – Mailing list –to subscribe, send the following to subscribe lsr-ge
44 Doug Bassett (Rosetta) Alvis Brazma (EBI) Steve Chervitz (Affymetrix) Francisco Dela Vega (Applied Biosystems) Michael Dickson (NetGenics) David Frankel (IONA) Scott Markel (NetGenics) Michael Miller (Rosetta) Dave Nellesen (Incyte) Alan Robinson (EBI) Martin Senger (EBI) Paul Spellman (Lawrence Berkley Lab) Jason Stewart (NCGR) Charles Troup (Agilent) Acknowledgements