ArrayExpress Ugis Sarkans EMBL - EBI
Outline why the domain model is not simple ArrayExpress object model ArrayExpress implementation status future developments
Underlying principles must be able to accommodate needs of a technology that is under constant development must be able to manage data in absence of standard measurement units and standards for reliability information gene expression data have any meaning only in the context of what are the experimental conditions –controlled vocabularies and ontologies needed for unambiguous sample annotation MIAME-compliant
ArrayExpress - conceptual overview
Simple version of AE object model - ArrayExpressBasic
Motivation for 2 object models many spots - one gene raw data - cleaned-up data - ratios - normalizations - higher-level analysis how detailed sample description is needed? for data mining we need ways to unify several datasets: –array features across different array platforms –samples from different experiments –various raw and derived measurements
ArrayExpressComplete
Scope of ArrayExpress object models useable for a public repository as well as a laboratory database (e.g., as a part of LIMS) implementation of “intermediate” models possible mapping to RDBMS tables - not necessarily straightforward models and documentation available at
ArrayExpress - features able to import MAML format can deal with both raw and processed data independence of: –experimental platforms –image analysis methods –data normalization methods object model-based query mechanism will support upcoming OMG standard for expression data
Key constructs in the AE object model structured sample descriptions notion of ExpressionValueSet several dimensions for ExpressionValues Transformations working on ExpressionValueSets and their dimensions
Structured representation of sample and treatment relations Sample source Primary sample 1 Primary sample 2 Derived sample 1 Labeled extract 1 Extract 1 Derived sample 2 A new state of sample source Extract 2 Labeled extract 2Hybridization labeling extraction treatment
Microarray expression value representation expression value types primary images composite images e.g., green/red ratios primary spots composite spots primary measurements derived values
Current status object model - stable, supports current MIAME physical database schema MAML data loader populated with one dataset from EMBL currently accessible through SQL
In development data loader - changes following MAML evolution annotation & MAML export tool Web interface to ArrayExpress –programmatic interface will follow
Proposed architecture data submission & curation database data warehouse application server Web server image server? ArrayExpress curation pipeline MAML data
Future developments will support upcoming OMG standard for gene expression data (XML, queries) diagrammatic interface to sample description submodel integration with other databases analytical tools running on top of ArrayExpress data curation pipeline development
Acknowledgements –MGED - MIAME, MAML –Incyte - Genomic Knowledge Platform –OMG gene expression data proposal submitters - Rosetta & NetGenics