Download presentation
Presentation is loading. Please wait.
Published byDora Johnson Modified over 9 years ago
1
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED V, Tokyo 2002
2
Introducing ArrayExpress Submission methods Ontologies and Annotation The Future Talk Structure
3
Infrastructure at the EBI ArrayExpress (Oracle) Other public Microarray Databases (GEO, CIBEX) ww w EBI Expression Profiler External databases Data analysis ww w Queries ww w MIAMExpress (MySQL) MAGE-ML Submissions MAGE-ML Array Manufacturers LIMS Microarray software Data Analysis software MAGE-ML Export Local MIAMExpress Installations MAGE-ML files Submissions MAGE-ML pipelines
4
ArrayExpress Public database for gene expression data, one of three including CIBEX and GEO Aims to store well annotated (MIAME compliant) and well structured data MIAME Recorded info should be sufficient to interpret and replicate the experiment Information should be structured so that querying and automated data analysis and mining are feasible* *Brazma et al,.Nature Genetics, 2001
5
ArrayExpress Conceptual Model Publication External links HybridisationArraySample Source (e.g., Taxonomy ) Experiment Normalisation Gene (e.g., EMBL ) Data
6
ArrayExpress Details Database schema derived from MAGE-OM Standard SQL, we use Oracle Validating data loader for MAGE-ML Web interface based on java servlets – Queries - experiment, array, sample – Browsing – views on expt Object model-based query mechanism, automatic mapping to SQL
7
Data Submission to ArrayExpress Via MAGE-ML pipeline->ftp->curation->database Parsed from LIMS or local database Current direct MAGE-ML submitters: Sanger, TIGR, Paul Spellman, Affymetrix Via external tools e.g BASE, J-Express Via MIAMExpress ArrayExpress Accepts Array, Experiment and Protocol submissions Provides accession numbers which can be used by journals
8
ArrayExpress Curation Effort Provide user support help and documentation Curate MAGE-ML pipeline data and data from MIAMExpress Support on use of ontologies and CV’s Task, to minimize free text, removal of synonyms MIAME encouragement Help on MAGE-ML Goal: to provide high-quality, well-annotated data to allow automated data analysis
9
Access to Data in ArrayExpress Via query interface, accession, author, species, experimental factors...etc Browsable lists Data export as tab delimited file Export to Expression profiler As MAGE-ML from query interface Array descriptions exportable as tab delimited file
11
Submission and Annotation Tool - MIAMExpress
12
What Makes Up a Submission ? Final data ArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationArraySampleHybridisationExperiment Protocols Normalisation
13
MIAMExpress Submission and Annotation Tool Based on MIAME concepts and questionnaire Perl-CGI, MySQL database Experiment, Array, Protocol submissions Generic annotation tool Exports MAGE-ML
15
Expected Users of MIAMExpress Biologists wishing to submit to AE Groups who install it and support locally (Perl-CGI and MySQL) Biologists who want to use it as part of a LIMS system
16
Annotation- Arrays and Samples (BioMaterial)
17
The Annotation Challenge Use of controlled terms (avoiding free text) Need for integration of controlled terms into annotation tools Avoidance of synonyms Provision of definitions and sources for user defined terms Provision of accurate up to date annotation for samples and arrays
18
Gene (BioSequence) Annotation Can be given by links to existing databases and GO can be used to describe sequences and for analysis MIAME is flexible, allows many kinds of sequence identifiers or sequence itself Use of existing resources is desirable to achieve interoperability
19
Sanger Human and Mouse Array Annotation Pipeline Takes sequences which represent Reporters on the array Exonerate (alignment) against the NCBI assembly (from Ensembl) Inherits annotation from Ensembl, gene names, database references, GO terms provides a common annotation in tab delimited format, can be parsed to MAGE- ML, or used in ADF Pipeline available for external users to beta test
20
Array Definition Format Tab delimited file format that can be parsed to MAGE-ML Defines relationships between features and sequences Provides sequence annotation Example in the Agenda, open letter to journals
21
Example BioSequence Annotation
22
Sample Annotation Gene expression data only have meaning in the context of detailed sample descriptions If the data is going to be interpreted by independent parties, sample information has to be searchable and in the database Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample description None of this is trivial
23
What Does an Ontology Do? Captures knowledge Creates a shared understanding – between humans and for computers Makes knowledge machine processable Makes meaning explicit – by definition and context It is more than a controlled vocabulary, it has structure
24
MGED Biomaterial (sample) Ontology Under construction – by MGED OWG – Using OILed Motivated by MIAME and coordinated with the database model (mapping available) We are extending classes, providing constraints, defining terms, and adding terms Extension to include other MAGE-OM required ontology entries and cv’s
25
BioMaterial Ontology Internal Terms
27
Internal and External Terms Combined
28
Examples of External Ontologies and CV’s NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy HUGO nomenclature for Human genes Chemical and compound Ontologies, e.g. CAS TAIR Flybase anatomy GO (www.geneontology.org)www.geneontology.org GOBO ontologies
29
Example BioMaterial Annotation Sample source and treatment description, and its correct annotation using the MGED BioMaterial Ontology classes and corresponding external references: “Seven week old C57BL/6N mice were treated with fenofibrate. Liver was dissected out, RNA prepared”
30
©-BioMaterialDescription ©-Biosource Property ©-Organism ©-Age ©-DevelopmentStage ©-Sex ©-StrainOrLine ©-BiosourceProvider ©-OrganismPart ©-BioMaterialManipulation ©-EnvironmentalHistory ©-CultureCondition ©-Temperature ©-Humidity ©-Light ©-PathogenTests ©-Water ©-Nutrients ©-Treatment ©-CompoundBasedTreatment (Compound) (Treatment_application) (Measurement) MGED BioMaterial Ontology Instances 7 weeks after birth Female Charles River, Japan 22 2 C 55 5% 12 hours light/dark cycle Specified pathogen free conditions ad libitum MF, Oriental Yeast, Tokyo, Japan in vivo, oral gavage 100mg/kg body weight External References NCBI Taxonomy Mouse Anatomical Dictionary International Committee on Standardized Genetic Nomenclature for Mice International Committee on Standardized Genetic Nomenclature for Mice Mouse Anatomical Dictionary ChemIDplus Mus musculus musculus id: 39442 Stage 28 C57BL/6 Liver Fenofibrate, CAS 49562-28-9
32
Future Data acquisition for ArrayExpress Update ArrayExpress to newest MAGE-OM V2.0 MIAMExpress, domain specific, portable Further ontology development and integration into tools, use of OWL Curation tools (other than grep, Perl scripts) Improved query interface for AE ArrayExpress update tool
33
Resources Schemas for both ArrayExpress and MIAMExpress, access to code MAGE-ML examples, Arrays, Expts, Protocols MIAME glossary, MAGE-MIAME-ontology mappings List of ontology resources from MGED pages Help in establishing pipelines MGED software MAGEstk Curation, help and advice www.mged.orgwww.mged.org www.ebi.ac.uk/arrayexpress
34
Acknowledgments Microarray Informatics Team, EBI, esp. Alvis Brazma, Ugis Sarkans, Mohammad Shojatalab, Ele Holloway, Gaurab Mukherjee, Philippe Rocca-Serra, Susanna Sansone Chris Stoeckert, U. Penn. Members of MGED Sanger Institute - Rob Andrews, Jurg Bahler, Adam Butler, Kate Rice, EMBL Heidelberg - Wilhelm Ansorge, Martina Muckenthaler,Thomas Preiss
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.