Download presentation
Presentation is loading. Please wait.
Published byVernon George Modified over 9 years ago
1
ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone sansone@ebi.ac.uk Microarray Informatics Team European Bioinformatics Institute (EBI) Hoffmann-La Roche The European Bioinformatics Institute
2
Acknowledgments Microarray Informatics Team, EBI, esp.: Alvis Brazma Helen Parkinson Mohammad Shojatalab Ugis Sarkans Industry Support team, EBI MGED steering committee MIAME working group Chris Stoeckert, U. Penn. and members of MGED The European Bioinformatics Institute
3
Talk structure The European Bioinformatics Institute Part I= ArrayExpress at EBI: A public repository for gene expression data Demo= MIAMExpress: Submission/annotation tool Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress
4
Part I - Talk structure Data standardization: MGED group MIAME concepts MGED Ontology Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model Data flow in – out ArrayExpress The European Bioinformatics Institute
5
Part I - Talk structure Data standardization: MGED group MIAME concepts MGED Ontology The European Bioinformatics Institute
6
Data standardization - MGED MGED = Microarray Gene Expression Db EBI+world’s largest labs (TIGR, Sanger, Stanford, Agilent, Affymetrics, etc.) www.mged.org Aims Facilitate adoption of standards: –Annotation –Data representation Introduce: –Experimental controls –Data normalization methods The European Bioinformatics Institute
7
Data standardization - Why? Size of dataset Different platforms - nylon, glass Different technologies - oligos, spotted References to external db not stable! Gene expression data only have a meaning in the context of a detailed experiment description The European Bioinformatics Institute
8
MIAME- Minimum Information About Microarray Experiment The European Bioinformatics Institute MGED group has published: MIAME v1.0 doc ( Brazma et al., Nature Gen, 2001 ) Minimum information that must be reported about a microarray experiment in order to ensure: its interpretability potential verification of the results
9
MIAME- Minimum Information About Microarray Experiment Publication External links Describes the 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute
10
MIAME - Experimental design Experiment 6 parts of a microarray experiment Normalisation Data Sample HybridisationArray Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication The set of the hybridisation experiments as a whole The European Bioinformatics Institute
11
MIAME - Experimental design One/more hybridisations experiments in some way related and addressing related questions: Author, contact information, citations Type of experiment e.g.: –time course –normal vs diseased comparison Experimental factors – i.e. tested parameters in the experiment e.g.: –time –dose –response to a compound List of organisms used in the experiment List of platforms used The European Bioinformatics Institute
12
MIAME - Experimental design List of samples, array and hybridisations and their relationship e.g.: SamplesS1, S2, S3 ArraysA1, A2, A3 Hybridisations:H1 is S1 and S2 on A1 H2 is S2 and S3 on A2 H3 is S1 and S2 on A3 Which hybridisations are replicates e.g.: H1 and H3 are replicates The European Bioinformatics Institute
13
MIAME - Experimental design Quality related indicators e.g.: type of replicates Free-text description of the experiment or link to an e-publication The European Bioinformatics Institute
14
MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment Hybridisation Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Array
15
MIAME - Array design Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Each array used and each element (spot) on the array The European Bioinformatics Institute
16
MIAME - Array design For the database, the array description should be normally submitted only once For each physical array used in the experiment a unique ID and the array type are given Array design related information e.g.: platform type = insitu synthesized or spotted, array provider, etc. surface type = glass, membrane, etc. The European Bioinformatics Institute
17
MIAME - Array design Properties of each type of elements on the array, that are generated by similar protocols e.g.: synthesized oligos, PCR products, plasmids, colonies, etc. Each element (spot) on the array: Elements may be simple or composite (Affymetrix) Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous way Composite elements may be identified by a reference sequence Elements may be linked to genes (preferably) This information is normally provided in a separate file e.g.: –spreadsheet The European Bioinformatics Institute
18
MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Sample
19
MIAME - Sample Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Samples used, the extract preparation and labelling The European Bioinformatics Institute
20
MIAME - Sample Sample source e.g.: Organism Cell source and type Developmental stage Organism part (tissue) Animal/plant strain or line Genetic variation Disease state or normal Typically only some of these qualifiers are relevant and there is the need to implement the annotation for sample source ! (To be continued……)
21
The European Bioinformatics Institute MIAME - Sample Sample treatment e.g.: in vivo / in vitro Compounds There is the need to implement the annotation for sample treatment ! (To be continued……) Hybridisation extract preparation Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method Labelling Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)
22
MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment Array Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Hybridisation
23
MIAME - Hybridisations Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Procedures and parameters The European Bioinformatics Institute
24
Laboratory protocol including: The solution e.g.: –concentration of solutes Blocking agent Wash procedure Quantity of labelled target used Time, concentration, volume, temperature Description of the hybridisation instruments The European Bioinformatics Institute MIAME - Hybridisations
25
MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) Experiment Normalisation The European Bioinformatics Institute Data
26
MIAME - Data Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Images, quantitation, specifications
27
MIAME - Data The European Bioinformatics Institute Conditions Genes Gene expression levels Final data Raw data Array scans Intermediate data Spots Quantitations Spot quantitations Three data processing levels:
28
MIAME - Data The European Bioinformatics Institute Why three data processing levels? Each experiment uses different units! Non reliable information Lack of gene expression measurement units! What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis This passes on the responsibility of interpreting the final data to the user
29
MIAME - Data The European Bioinformatics Institute Raw data Array scans The scanner image file e.g.: TIFF, DAT Scanning information: Scan parameters: – laser power – spatial resolution – pixel space – PMT voltage Laboratory protocol for scanning Scanning hardware and software No MGED consensus on raw data!!
30
MIAME - Data The European Bioinformatics Institute Intermediate data Spots Quantitations Spot quantitations Image analysis and quantitation: Complete image analysis output for each element normally given as separate file e.g.: – spreadsheet Image analysis information: Image analysis software specifications All parameters
31
MIAME - Data The European Bioinformatics Institute Summarised information from possible replicates: Derived measurement values summarising related elements as used by the author Reliability information for these values given as separate file, e.g.: – spreadsheet Specifications of these two e.g.: – median value of the replicates, standard deviation Conditions Genes Gene expression levels Final data
32
MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute
33
MIAME - Normalisation Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute A typical experiment involves a number of hybridisations in which the data from multiple samples are analysed and compared For this comparison, the reported hybridisation intensities (from the image processing) must be first normalised
34
MIAME - Normalisation Normalisation adjust for a number of technical variations between and within hybridisation Normalisation strategy e.g.: Spiking Housekeeping gene Total array Normalisation algorithm Control array elements Hybridisation extract preparation The European Bioinformatics Institute
35
6 parts of a microarray experiment Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Annotation implementations required Gene expression data only have a meaning in the context of a detailed sample (source-treatment) and array (gene) description The European Bioinformatics Institute MIAME - Annotation
36
MIAME - Gene annotation Normalisation Data Sample Hybridisation Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Source (e.g. Taxonomy) Unambiguous identification: Interpret data !!Synonyms!! Alternative to gene names Community approved names Usable external sources e.g.: EMBL-GenBank (sequence acc#) Jackson Lab (approved mouse gene names) HUGO (approved human gene names)
37
MIAME - Sample annotation Normalisation Data Sample Hybridisation Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Unambiguous identification: Interpret data Usable external sources e.g.: NCBI Taxonomy (organisms) Jackson Lab (mouse strains) Mouse Atlas (mouse anatomy) Merck Index, CAS # (compounds) CVs and ontologies are needed: Reduce free-text description Facilitate data queries-analysis Source (e.g. Taxonomy)
38
What are CV and Ontology? The European Bioinformatics Institute CV = Controlled Vocabulary: Set of restrictive terms used to describe something, in the simplest case it could be a list Ontology: Describes the relationship between the terms in a structured way Provides semantics and constraints Allows for computational inferences and reliable comparisons
39
Ontology example Build an ontology for e.g.: Affymetrics GeneChip Rat Toxicology U34 Array (Top Level Class) Array element type (Sub-Class) oligos (slot constraint) manufactured by Affymetrics (instance) GeneChip Rat Toxicology U34 Array The European Bioinformatics Institute
40
MIAME - MGED Ontology MGED Sample (BioMaterial) ontology: Under construction by Chris Stoeckert www.cbil.upenn.edu/Ontology/MGED_ontology.html Motivated by MIAME Defines terms, provides constraints, develops CVs for microarray experiment submissions Links also to external CVs and ontologies The European Bioinformatics Institute
41
MIAME – Q,V,S triplets MIAME definitions include the Q,V,S triplets: User defined ‘qualifier, value, source’ triplet Used to describe a new term –qualifier = what the term describes (cell type) –value = its value (epithelial) –source = its source (Gray’s anatomy- 38 th ed.) User defined terms are added to the MGED ontology The European Bioinformatics Institute
42
Part I - Talk structure Data standardization: MGED group MIAME concepts MGED Ontology Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model The European Bioinformatics Institute
43
Uses of MIAME concepts The European Bioinformatics Institute Specifies the content of the information: Sufficient information must be recorded to: – Correctly interpret – Replicate the experiments Structured information must be recorded to: – Correctly retrieve – Analyse the data Uses: Creation of MIAME-compliant databases e.g.: – ArrayExpress at EBI Development of submission/annotation tool for generating MIAME-compliant information e.g.: – MIAMExpress
44
ArrayExpress A public repository for gene expression data MIAME-compliant The European Bioinformatics Institute HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) Data Experiment Normalisation Top level structure (conceptual model):
45
The European Bioinformatics Institute MAGE-OM Microarray Gene Expression Object Model: MIAME compliant Standard Joint submission to OMG, 2001, by MGED and Rosetta – OMG (Object Management Group) is an international non-profit software consortium that is setting standards in the area of distributed object computing ArrayExpress- Object Model
46
The European Bioinformatics Institute MAGE-ML Mark-up Language: Derived from MAGE-OM Describe and communicate MIAME information DTD = ‘predominantly’ computer readable…… UML Unified Modelling Language: UML specifications are used to develop and describe MAGE-OM UML = ……human readable ArrayExpress- Object Model
47
MAGE-OM - UML specifications Related classes are grouped together in packages MAGE-OM has 16 packages Class name Attributes Top level class=package Packages linked to each other by reference Class describes objects Relationships
48
MAGE-OM mapping to MIAME The European Bioinformatics Institute HybridisationArraySample DataExperiment Normalisation + other 7 “auxiliary” packages: AuditandSecurity, Protocol, Measuraments, BioEvent, BQS, Description, HighLevelAnalysis ExperimentDesign BioAssay ArrayDesign, ArrayManufacture, BioSequence, DesignElement BioMaterial BioAssayData, QuantitationType
49
Part I - Talk structure Data standardization: MGED group MIAME concepts MGED Ontology Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model Data flow in – out ArrayExpress The European Bioinformatics Institute
50
Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update MAGE-ML Output Loader MIAMExpress Submission LIMS Submission MIAMExpress
51
Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update Output Loader MIAMExpress Submission LIMS Submission MIAMExpress MAGE-ML MIAME compliant Data model implemented in ORACLE Deals with: Raw data Processed data Data transformation Independent of: Experimental platform Image analysis method Normalization method
52
Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update Output Loader Submission LIMS Submission MIAMExpress MAGE-ML MIAMExpress Submission/annotation tool Generates MIAME-compliant information Beta-testers Demo version (general) Target specific interfaces e.g.: Specie specific Toxicology specific
53
Talk structure The European Bioinformatics Institute Part I= ArrayExpress at EBI: A public repository for gene expression data Demo= MIAMExpress: Submission/annotation tool
54
Talk structure The European Bioinformatics Institute Part I= ArrayExpress at EBI: A public repository for gene expression data Demo= MIAMExpress: Submission/annotation tool Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress
55
Part II - Talk structure Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant? Toxicology specific MIAMExpress interface: ILSI toxicogenomics data submission Areas of collaboration-Summary The European Bioinformatics Institute
56
Part II - Talk structure Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant? The European Bioinformatics Institute
57
Data parsing? From IMD to ArrayExpress: Lexical parsing –Mapping information to MAGE-OM !! Semantic parsing !! –Glossary issues The European Bioinformatics Institute
58
Normalisation The European Bioinformatics Institute Sample Hybridisation Array Data Experiment ExperimentDesign IMD = Experimental condition description ?? Experimental design (study) ?? Data mapping - Semantics!
59
Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array Data mapping - Semantics! IMD=chip, microarray chip !! Synonyms !!
60
Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array ArrayManufacture, Biosequence ArrayManufacture, Biosequence IMD=chip description, microarray chip description !! Synonyms !! Data mapping - Semantics!
61
Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array IMD=chip design, microarray chip design !! Synonyms !! Biosequence Data mapping - Semantics!
62
Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array PlatformType IMD=platform, microarray platform, microarray platform type !! Synonyms !! Data mapping - Semantics!
63
MIAME - compliant? IMD MIAME-compliant? “Minimal system” for data exchange Comparisons Current status for toxicogenomic data: Non-MIAME compliant Additional information required: To be flagged as MIAME compliant To build queries to the database: – ArrayExpress has a object model query mechanism Why additional information? The European Bioinformatics Institute
64
ILSI-HESI Objective The European Bioinformatics Institute ILSI-HESI objective: To have publicly available information to assist in developing consensus on potential applications and interpretation of microarray data with respect to mechanism-based risk assessment To critically assess the potential utility of these new method for the process of hazard identification Toxicologists (other than ILSI-HESI members) Can correctly interpret and replicate the toxicogenomics experiments Can correctly retrieve and analyse the toxicogenomics data Sufficient and structured information must be recorded in order to achieve ILSI-HESI objective
65
IMD - Data Three type of data: Required: –fold_change of spot intensity Optional: – relative_intensity – coefficient_variation of relative_intensity Additional: –present/absent/marginal_call (for Affymetrics) –P_value (for replicates) The European Bioinformatics Institute
66
MIAME compliant - Data The European Bioinformatics Institute Conditions Genes Gene expression levels Final data Raw data Array scans Intermediate data Spots Quantitations Spot quantitations Requirements:
67
The European Bioinformatics Institute Why three data processing levels? Lack of gene expression measurement units! What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis This allows toxicologists (other than ILSI-HESI members) to interpret the final data Increase the value of toxicology data by achieving ILSI-HESI objective To give a critical mass to the ILSI-HESI studies MIAME compliant - Data
68
IMD – Experiment description The European Bioinformatics Institute Hepatotoxicity e.g.: Oral (gavage) Study in Male SD Rats on Methapyrilene
69
AuditandSecurity Array Experiment ExtractionProtocol ImageAnalysisProtocol LabellingProtocol Sample= TreatmentAppl. Sample= Treatment Sample= Org. Sample= BioSource Normalization Sample= Treatment Sample= BioSourceProvider Normalization Sample= ?
70
IMD – Experiment description The European Bioinformatics Institute Good level of information Still incomplete to be MIAME compliant e.g.: Detailed protocols required e.g.: – Hybridization chamber type, scanner type, label quantity etc. Need for : CV and ontologies
71
ChemID: 3 systematic names and 39 synonyms !!
72
The European Bioinformatics Institute Excerpt from Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: Mus musculus [ NCBI taxonomy browser ] Cell source: in-house bred mice (contact: person@somewhere.ac.uk) Sex: female [ MGED ] Age: 3 - 4 weeks after birth [ MGED ] Growth conditions: normal controlled environment 20 - 22 o C average temperature housed in cages according to EU legislation specified pathogen free conditions (SPF) 14 hours light cycle 10 hours dark cycle Developmental stage: stage 28 (juvenile (young) mice) [ GXD "Mouse Anatomical Dictionary" ] Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ] Strain or line: C57BL/6 [ International Committee on Standardized Genetic Nomenclature for Mice ] Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to 1937. This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [ International Committee on Standardized Genetic Nomenclature for Mice ] Treatment: in vivo [ MGED ] [ intraperitoneal ] injection of [ Dexamethasone ] into mice, 10 microgram per 25 g bodyweight of the mouse Compound: drug [ MGED ] synthetic [ glucocorticoid ] [ Dexamethasone ], dissolved in PBS
73
Part II - Talk structure Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant? Toxicology specific MIAMExpress interface: ILSI toxicogenomics data submission Areas of collaboration-Summary The European Bioinformatics Institute
74
Toxicology specific MIAMExpress Toxicology specific interface options: in vivo or in vitro Study specific (Hepatotoxicity, Nephrotoxicity, Genotoxicity) CVs and ontologies to be developed: CVs in pull down menus ‘Q,V,S’ users driven ontologies Extend MGED ontology to include toxicology specifics terms Dynamic, fast and easy to use Browse: Protocols Arrays The European Bioinformatics Institute
75
Areas of collaboration Data transfer: Parser from IMD to ArrayExpress (MAGE-ML) Additional information required: –MIAME compliant flag (e.g. data, protocols, sample pooling etc.) –Build complex queries Data submission: Submission via toxicology specific MIAMExpress –CVs and ontologies –Interfaces options –Protocols Other data: Volume (79 from Hetapotoxicity) Clinical chemistry, Histophatology –Format (images also?) and volume Mailing list The European Bioinformatics Institute
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.