MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute Microarrays and Data Mining 10 th -11 th December 2002
Outline Capturing information Ontologies MIAMExpress
Capturing information Lab book – only useful for the individual Annotate in a controlled way Submit information to a database / LIMS Need information understandable by all Allows easy retrieval Available to other researchers
What is an ontology? A kind of controlled vocabulary (CV) expressed in a structured way.
Components of an ontology Class Instance Has a definition and a relationship to other classes (is-a, part-of, kind-of). Terms that are contained within a class. = container for information. e.g. An exon is part of a gene
An ontology – what can it do? Captures knowledge Shared understanding Structure enriches CV Computer ‘readable’
Why do we need an ontology for the database? To help users annotate their data usefully and easily To perform structured queries To accurately compare data To avoid problems with free text searching To avoid excessive curation workload in future
Annotation Data mining Controlled vocabulary Free text Database Natural language processing
Standards and Ontologies for Functional Genomics Aim: To bring together scientists (biologists and bioinformaticians) developing standards and ontologies 17 – 20 th November 2002 Hinxton
Examples of ontologies and CVs MGED Ontology – For describing samples used in microarray experiments – Gene Ontology – Edinburgh Mouse Atlas Project – Drosophila genome database NCBI Taxonomy GO EMAP FlyBase - All organisms represented in the genetic databases
Infrastructure EBI Expression Profiler External bioinformatics databases www Submissions Queries www Data analysis www MAGE-ML Local MIAMExpress installations Array manufacturers LIMS Data pipelines ArrayExpress (Oracle) Other microarray databases Data analysis software Microarray software MAGE-ML import/export MIAMExpress MAGE-ML
MIAME requirements Experimental design Array design Samples Measurements Normalization controls Hybridizations Nature Genetics 29(4):
External links NormalizationData ArrayHybridizationSample Experiment 6 parts of a microarray experiment MEDLINE Publication details MGED Experiment details NCBI taxonomy CAS/ Merck EMAP Mouse stage Species Chemical compd. EMBL Gene acc. n o. Gene name GO Genew
MGED Ontology Community effort Supports efforts of MAGE - MGED Society Describes the parts of a microarray experiment References out to external ontologies
MGED Ontology Structured in DAML+OIL using OilEd 3.4
MIAMExpress Submission and annotation tool Based on MIAME concepts Array, Experiment and Protocol submissions Perl-CGI, MySQL database
Submission process
Tour of MIAMExpress Login +Password Multi-user environment Control over data access
Login New/Pending Experiment Sample 1Sample 2Sample 3Sample 4
Login New/Pending Experiment Sample 1Sample 2Sample 3Sample 4 Extracts 1….n E1E1 E1E1 E1E1 E1E1 E2E2 E2E2 E2E2 E2E2 EnEn EnEn EnEn EnEn
Login New/Pending Experiment Sample 1Sample 2Sample 3Sample 4 Extracts 1….n E1E1 E1E1 E1E1 E1E1 E2E2 E2E2 E2E2 E2E2 EnEn EnEn EnEn EnEn LE Lab. Extr. 1….n
Login New/Pending Experiment Sample 1Sample 2Sample 3Sample 4 Extracts 1….n E1E1 E1E1 E1E1 E1E1 E2E2 E2E2 E2E2 E2E2 EnEn EnEn EnEn EnEn LE Lab. Extr. 1….n Hybridizations Array 1 Array 2 Array 3 Array n Data 1 Data 2 Data 3 Data n
Submission successful Curation Export of MAGE-ML Loading to ArrayExpress
ArrayExpress MIAMExpress RAD MAGE-ML data exchange Ontology instances propagated to submission/annotation web forms Curation of user defined terms, before inclusion in the ontology User defined terms collected via forms MGED Ontology BiomaterialDescription Sex C C C C Gender documentation: Subclass of sex applicable to heterogametic species (i.e., those in which the sexes produce gametes of markedly different size). Males produce small numerous gametes. Females produce small numbers of large gametes. Hermaphrodites are individuals with both male and female characteristics. Mixed refers to a population of individuals with more than one type of gender. used in individuals: female,hermaphrodite,male,mixed_sex,unknown_sex
Resources Microarray Informatics Group MIAMExpress MGED Ontology Working Group Sourceforge
Acknowledgements ArrayExpress Ugis Sarkans Gonzalo Garcia Ahmet Oezcimen Anjan Sharma Curation Helen Parkinson Gaurab Mukherjee Philippe Rocca-Serra Susanna Sansone MIAMExpress Mohammad Shojatalab Niran Abeygunawardena Sergio Contrino Alvis Brazma MGED Ontology Chris Stoeckert (U. Penn)
GO EMAP FlyBase NCBI Taxonomy