ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics.

Slides:



Advertisements
Similar presentations
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Visualisationmodule Catherine Leroy, Pierre Marguerite, Bhuwan Tiwari, Niran Abeygunawardena, Sergio Contrino, Anna Farne, Ele Holloway, Gaurab Mukherjee,
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Welcome to mini-symposium on ontologies for biological sample description EMBL-EBI Wellcome Trust Genome Campus Deceber 5, 2001.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
The MGED Ontology Is An Experimental Ontology Bio-Ontologies Aug 8, 2002 Chris Stoeckert, Helen Parkinson and the MGED Ontology Working Group.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays Microarray Data Analysis and Management: Bio-ontologies for Microarrays EMBL-EBI,
MIAME Minimum Information About a Microarray Experiment
The MGED Ontology: A framework for describing functional genomics experiments SOFG Nov. 19, 2002 Chris Stoeckert, Ph.D. Dept. of Genetics & Center for.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Microarray data repositories
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
EMBL Outstation — The European Bioinformatics Institute MIAME and ArrayExpress - a standard for microarray data annotation and a database to store it Helen.
Microarray Gene Expression Database (MGED) Ontology Working Group Chris Stoeckert Center for Bioinformatics University of Pennsylvania July 26, 2001.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
The importance of meta data capture – problems and solutions Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NERC Meta Data.
Excerpts from a Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: mus musculus [ NCBI taxonomy browser ] Cell source:
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
The European Bioinformatics Institute MIAME and Ontologies for Sample Description Helen Parkinson Microarray Informatics Team European Bioinformatics Institute.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
European Bioinformatics Institute MGED Society Establishing the infrastructure for sharing microarray data Alvis Brazma European Bioinformatics Institute.
Gene Expression Omnibus (GEO)
Susanna-Assunta Sansone (Toxicogenomics project coordinator) Microarray Informatics Team EMBL- EBI (European Bioinformatics Institute) Transcriptome Symposium,
Test1 April 2004 Microarray Data Management Jianwei (Jerry) Li.
Copyright OpenHelix. No use or reproduction without express written consent1.
Standards and Ontologies for Data Annotation Helen Parkinson Microarray Informatics Team European Bioinformatics Institute NBN-EBI Course, October 2002.
MIAMExpress development and local installation DESPRAD Meeting,November 2002 Mohammad shojatalab
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
From MIAME to MAML: Microarray Gene Expression Database (MGED) Chris Stoeckert Center for Bioinformatics University of Pennsylvania Sept. 19, 2001 GE ^
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
MGED Ontology Working Group MGED4 Boston, MA Feb. 15, 2002 Chris Stoeckert, Center for Bioinformatics, U. Penn Helen Parkinson, EBI.
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
MIAMExpress development October 2002 Mohammad shojatalab
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Extending FuGE into other domains Andrew Jones School of Computer Science, University of Manchester
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
1 ArrayExpress Ugis Sarkans, EBI. 2 Overview Underlying standards –MIAME –MAGE* Data submission Data access –annotations –actual data –array design descriptions.
The European Bioinformatics Institute ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
ArrayExpress - a Public Repository for Microarray Based Gene Expression Data European Bioinformatics Institute - EMBL outstation and German Cancer Research.
ArrayExpress Ugis Sarkans EMBL - EBI
Using ArrayExpress.
MGED Ontology: An Ontology of Biomaterial Descriptions for Microarrays
From MIAME to MAML: Microarray Gene Expression Database (MGED)
MGED Ontology Working Group Report
St. Bonifatiuskloster Hünfeld; Klosterstraße 5; D Hünfeld
Presentation transcript:

ILSI-HESI agreement with EBI: ArrayExpress, public repository for toxicogenomics data Susanna Assunta Sansone Microarray Informatics Team European Bioinformatics Institute (EBI) Hoffmann-La Roche The European Bioinformatics Institute

Acknowledgments  Microarray Informatics Team, EBI, esp.: Alvis Brazma Helen Parkinson Mohammad Shojatalab Ugis Sarkans  Industry Support team, EBI  MGED steering committee  MIAME working group  Chris Stoeckert, U. Penn. and members of MGED The European Bioinformatics Institute

Talk structure The European Bioinformatics Institute  Part I= ArrayExpress at EBI: A public repository for gene expression data  Demo= MIAMExpress: Submission/annotation tool  Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress

Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology  Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model  Data flow in – out ArrayExpress The European Bioinformatics Institute

Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology The European Bioinformatics Institute

Data standardization - MGED  MGED = Microarray Gene Expression Db EBI+world’s largest labs (TIGR, Sanger, Stanford, Agilent, Affymetrics, etc.)  Aims Facilitate adoption of standards: –Annotation –Data representation Introduce: –Experimental controls –Data normalization methods The European Bioinformatics Institute

Data standardization - Why?  Size of dataset  Different platforms - nylon, glass  Different technologies - oligos, spotted  References to external db not stable!  Gene expression data only have a meaning in the context of a detailed experiment description The European Bioinformatics Institute

MIAME- Minimum Information About Microarray Experiment The European Bioinformatics Institute  MGED group has published: MIAME v1.0 doc ( Brazma et al., Nature Gen, 2001 )  Minimum information that must be reported about a microarray experiment in order to ensure: its interpretability potential verification of the results

MIAME- Minimum Information About Microarray Experiment Publication External links Describes the 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute

MIAME - Experimental design Experiment 6 parts of a microarray experiment Normalisation Data Sample HybridisationArray Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication The set of the hybridisation experiments as a whole The European Bioinformatics Institute

MIAME - Experimental design  One/more hybridisations experiments in some way related and addressing related questions: Author, contact information, citations Type of experiment e.g.: –time course –normal vs diseased comparison Experimental factors – i.e. tested parameters in the experiment e.g.: –time –dose –response to a compound List of organisms used in the experiment List of platforms used The European Bioinformatics Institute

MIAME - Experimental design  List of samples, array and hybridisations and their relationship e.g.: SamplesS1, S2, S3 ArraysA1, A2, A3 Hybridisations:H1 is S1 and S2 on A1 H2 is S2 and S3 on A2 H3 is S1 and S2 on A3  Which hybridisations are replicates e.g.: H1 and H3 are replicates The European Bioinformatics Institute

MIAME - Experimental design  Quality related indicators e.g.: type of replicates  Free-text description of the experiment or link to an e-publication The European Bioinformatics Institute

MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment Hybridisation Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Array

MIAME - Array design Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Each array used and each element (spot) on the array The European Bioinformatics Institute

MIAME - Array design  For the database, the array description should be normally submitted only once  For each physical array used in the experiment a unique ID and the array type are given  Array design related information e.g.: platform type = insitu synthesized or spotted, array provider, etc. surface type = glass, membrane, etc. The European Bioinformatics Institute

MIAME - Array design  Properties of each type of elements on the array, that are generated by similar protocols e.g.: synthesized oligos, PCR products, plasmids, colonies, etc.  Each element (spot) on the array: Elements may be simple or composite (Affymetrix) Each element must be identified by either the sequence, clone ID, PCR primer pair, or in any other unambiguous way Composite elements may be identified by a reference sequence Elements may be linked to genes (preferably) This information is normally provided in a separate file e.g.: –spreadsheet The European Bioinformatics Institute

MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Sample

MIAME - Sample Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Samples used, the extract preparation and labelling The European Bioinformatics Institute

MIAME - Sample  Sample source e.g.: Organism Cell source and type Developmental stage Organism part (tissue) Animal/plant strain or line Genetic variation Disease state or normal Typically only some of these qualifiers are relevant and there is the need to implement the annotation for sample source ! (To be continued……)

The European Bioinformatics Institute MIAME - Sample  Sample treatment e.g.: in vivo / in vitro Compounds There is the need to implement the annotation for sample treatment ! (To be continued……)  Hybridisation extract preparation Laboratory protocol, including extraction method, whether RNA, mRNA, or genomic DNA is extracted, amplification method  Labelling Laboratory protocol, including amount of nucleic acids labelled, label used (e.g. Cy3, Cy5, 33P, etc)

MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment Array Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute Hybridisation

MIAME - Hybridisations Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment Procedures and parameters The European Bioinformatics Institute

 Laboratory protocol including: The solution e.g.: –concentration of solutes Blocking agent Wash procedure Quantity of labelled target used Time, concentration, volume, temperature Description of the hybridisation instruments The European Bioinformatics Institute MIAME - Hybridisations

MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) Experiment Normalisation The European Bioinformatics Institute Data

MIAME - Data Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Images, quantitation, specifications

MIAME - Data The European Bioinformatics Institute Conditions Genes Gene expression levels Final data Raw data Array scans Intermediate data Spots Quantitations Spot quantitations  Three data processing levels:

MIAME - Data The European Bioinformatics Institute  Why three data processing levels? Each experiment uses different units! Non reliable information  Lack of gene expression measurement units!  What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis  This passes on the responsibility of interpreting the final data to the user

MIAME - Data The European Bioinformatics Institute Raw data Array scans  The scanner image file e.g.: TIFF, DAT  Scanning information: Scan parameters: – laser power – spatial resolution – pixel space – PMT voltage Laboratory protocol for scanning Scanning hardware and software  No MGED consensus on raw data!!

MIAME - Data The European Bioinformatics Institute Intermediate data Spots Quantitations Spot quantitations  Image analysis and quantitation: Complete image analysis output for each element normally given as separate file e.g.: – spreadsheet  Image analysis information: Image analysis software specifications All parameters

MIAME - Data The European Bioinformatics Institute  Summarised information from possible replicates: Derived measurement values summarising related elements as used by the author Reliability information for these values given as separate file, e.g.: – spreadsheet Specifications of these two e.g.: – median value of the replicates, standard deviation Conditions Genes Gene expression levels Final data

MIAME- Minimum Information About Microarray Experiment Publication External links 6 parts of a microarray experiment HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) DataExperiment Normalisation The European Bioinformatics Institute

MIAME - Normalisation Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute A typical experiment involves a number of hybridisations in which the data from multiple samples are analysed and compared For this comparison, the reported hybridisation intensities (from the image processing) must be first normalised

MIAME - Normalisation  Normalisation adjust for a number of technical variations between and within hybridisation  Normalisation strategy e.g.: Spiking Housekeeping gene Total array  Normalisation algorithm  Control array elements  Hybridisation extract preparation The European Bioinformatics Institute

6 parts of a microarray experiment Normalisation Data Sample Hybridisation Source (e.g. Taxonomy) Gene (e.g. EMBL) Publication Array Experiment  Annotation implementations required Gene expression data only have a meaning in the context of a detailed sample (source-treatment) and array (gene) description The European Bioinformatics Institute MIAME - Annotation

MIAME - Gene annotation Normalisation Data Sample Hybridisation Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute Source (e.g. Taxonomy)  Unambiguous identification: Interpret data  !!Synonyms!! Alternative to gene names Community approved names  Usable external sources e.g.: EMBL-GenBank (sequence acc#) Jackson Lab (approved mouse gene names) HUGO (approved human gene names)

MIAME - Sample annotation Normalisation Data Sample Hybridisation Gene (e.g. EMBL) Publication Array Experiment The European Bioinformatics Institute  Unambiguous identification: Interpret data  Usable external sources e.g.: NCBI Taxonomy (organisms) Jackson Lab (mouse strains) Mouse Atlas (mouse anatomy) Merck Index, CAS # (compounds)  CVs and ontologies are needed: Reduce free-text description Facilitate data queries-analysis Source (e.g. Taxonomy)

What are CV and Ontology? The European Bioinformatics Institute  CV = Controlled Vocabulary: Set of restrictive terms used to describe something, in the simplest case it could be a list  Ontology: Describes the relationship between the terms in a structured way Provides semantics and constraints Allows for computational inferences and reliable comparisons

Ontology example  Build an ontology for e.g.: Affymetrics GeneChip Rat Toxicology U34 Array (Top Level Class) Array element type (Sub-Class) oligos (slot constraint) manufactured by Affymetrics (instance) GeneChip Rat Toxicology U34 Array The European Bioinformatics Institute

MIAME - MGED Ontology  MGED Sample (BioMaterial) ontology: Under construction by Chris Stoeckert Motivated by MIAME Defines terms, provides constraints, develops CVs for microarray experiment submissions Links also to external CVs and ontologies The European Bioinformatics Institute

MIAME – Q,V,S triplets  MIAME definitions include the Q,V,S triplets: User defined ‘qualifier, value, source’ triplet Used to describe a new term –qualifier = what the term describes (cell type) –value = its value (epithelial) –source = its source (Gray’s anatomy- 38 th ed.) User defined terms are added to the MGED ontology The European Bioinformatics Institute

Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology  Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model The European Bioinformatics Institute

Uses of MIAME concepts The European Bioinformatics Institute  Specifies the content of the information: Sufficient information must be recorded to: – Correctly interpret – Replicate the experiments Structured information must be recorded to: – Correctly retrieve – Analyse the data  Uses: Creation of MIAME-compliant databases e.g.: – ArrayExpress at EBI Development of submission/annotation tool for generating MIAME-compliant information e.g.: – MIAMExpress

ArrayExpress  A public repository for gene expression data  MIAME-compliant The European Bioinformatics Institute HybridisationArray Gene (e.g. EMBL) Sample Source (e.g. Taxonomy) Data Experiment Normalisation Top level structure (conceptual model):

The European Bioinformatics Institute  MAGE-OM Microarray Gene Expression Object Model: MIAME compliant Standard Joint submission to OMG, 2001, by MGED and Rosetta – OMG (Object Management Group) is an international non-profit software consortium that is setting standards in the area of distributed object computing ArrayExpress- Object Model

The European Bioinformatics Institute  MAGE-ML Mark-up Language: Derived from MAGE-OM Describe and communicate MIAME information DTD = ‘predominantly’ computer readable……  UML Unified Modelling Language: UML specifications are used to develop and describe MAGE-OM UML = ……human readable ArrayExpress- Object Model

MAGE-OM - UML specifications Related classes are grouped together in packages MAGE-OM has 16 packages Class name Attributes Top level class=package Packages linked to each other by reference Class describes objects Relationships

MAGE-OM mapping to MIAME The European Bioinformatics Institute HybridisationArraySample DataExperiment Normalisation + other 7 “auxiliary” packages: AuditandSecurity, Protocol, Measuraments, BioEvent, BQS, Description, HighLevelAnalysis ExperimentDesign BioAssay ArrayDesign, ArrayManufacture, BioSequence, DesignElement BioMaterial BioAssayData, QuantitationType

Part I - Talk structure  Data standardization: MGED group MIAME concepts MGED Ontology  Uses of MIAME concepts: ArrayExpress database MAGE-OM the object model  Data flow in – out ArrayExpress The European Bioinformatics Institute

Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update MAGE-ML Output Loader MIAMExpress Submission LIMS Submission MIAMExpress

Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update Output Loader MIAMExpress Submission LIMS Submission MIAMExpress MAGE-ML  MIAME compliant  Data model implemented in ORACLE  Deals with: Raw data Processed data Data transformation  Independent of: Experimental platform Image analysis method Normalization method

Data flow in-out ArrayExpress Users EBI Web server Browse-Query central database data warehouse ArrayExpress The European Bioinformatics Institute curation tool database image server Update Output Loader Submission LIMS Submission MIAMExpress MAGE-ML MIAMExpress  Submission/annotation tool  Generates MIAME-compliant information  Beta-testers  Demo version (general)  Target specific interfaces e.g.: Specie specific Toxicology specific

Talk structure The European Bioinformatics Institute  Part I= ArrayExpress at EBI: A public repository for gene expression data  Demo= MIAMExpress: Submission/annotation tool

Talk structure The European Bioinformatics Institute  Part I= ArrayExpress at EBI: A public repository for gene expression data  Demo= MIAMExpress: Submission/annotation tool  Part II= ILSI-HESI IMD: Toxicogenomics data transfer to ArrayExpress

Part II - Talk structure  Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant?  Toxicology specific MIAMExpress interface: ILSI toxicogenomics data submission  Areas of collaboration-Summary The European Bioinformatics Institute

Part II - Talk structure  Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant? The European Bioinformatics Institute

Data parsing?  From IMD to ArrayExpress: Lexical parsing –Mapping information to MAGE-OM !! Semantic parsing !! –Glossary issues The European Bioinformatics Institute

Normalisation The European Bioinformatics Institute Sample Hybridisation Array Data Experiment ExperimentDesign IMD = Experimental condition description ?? Experimental design (study) ?? Data mapping - Semantics!

Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array Data mapping - Semantics! IMD=chip, microarray chip !! Synonyms !!

Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array ArrayManufacture, Biosequence ArrayManufacture, Biosequence IMD=chip description, microarray chip description !! Synonyms !! Data mapping - Semantics!

Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array IMD=chip design, microarray chip design !! Synonyms !! Biosequence Data mapping - Semantics!

Experiment Normalisation The European Bioinformatics Institute Sample Hybridisation Data Array PlatformType IMD=platform, microarray platform, microarray platform type !! Synonyms !! Data mapping - Semantics!

MIAME - compliant?  IMD MIAME-compliant? “Minimal system” for data exchange Comparisons  Current status for toxicogenomic data: Non-MIAME compliant  Additional information required: To be flagged as MIAME compliant To build queries to the database: – ArrayExpress has a object model query mechanism  Why additional information? The European Bioinformatics Institute

ILSI-HESI Objective The European Bioinformatics Institute  ILSI-HESI objective: To have publicly available information to assist in developing consensus on potential applications and interpretation of microarray data with respect to mechanism-based risk assessment To critically assess the potential utility of these new method for the process of hazard identification  Toxicologists (other than ILSI-HESI members) Can correctly interpret and replicate the toxicogenomics experiments Can correctly retrieve and analyse the toxicogenomics data  Sufficient and structured information must be recorded in order to achieve ILSI-HESI objective

IMD - Data  Three type of data: Required: –fold_change of spot intensity Optional: – relative_intensity – coefficient_variation of relative_intensity Additional: –present/absent/marginal_call (for Affymetrics) –P_value (for replicates) The European Bioinformatics Institute

MIAME compliant - Data The European Bioinformatics Institute Conditions Genes Gene expression levels Final data Raw data Array scans Intermediate data Spots Quantitations Spot quantitations  Requirements:

The European Bioinformatics Institute  Why three data processing levels? Lack of gene expression measurement units!  What do we do in absence of standards? Record raw, intermediate and final analysis data Together with detailed annotation on the analysis  This allows toxicologists (other than ILSI-HESI members) to interpret the final data  Increase the value of toxicology data by achieving ILSI-HESI objective To give a critical mass to the ILSI-HESI studies MIAME compliant - Data

IMD – Experiment description The European Bioinformatics Institute  Hepatotoxicity e.g.: Oral (gavage) Study in Male SD Rats on Methapyrilene

AuditandSecurity Array Experiment ExtractionProtocol ImageAnalysisProtocol LabellingProtocol Sample= TreatmentAppl. Sample= Treatment Sample= Org. Sample= BioSource Normalization Sample= Treatment Sample= BioSourceProvider Normalization Sample= ?

IMD – Experiment description The European Bioinformatics Institute  Good level of information  Still incomplete to be MIAME compliant e.g.: Detailed protocols required e.g.: – Hybridization chamber type, scanner type, label quantity etc.  Need for : CV and ontologies

ChemID: 3 systematic names and 39 synonyms !!

The European Bioinformatics Institute Excerpt from Sample Description courtesy of M. Hoffman, S. Schmidtke, Lion BioSciences Organism: Mus musculus [ NCBI taxonomy browser ] Cell source: in-house bred mice (contact: Sex: female [ MGED ] Age: weeks after birth [ MGED ] Growth conditions: normal controlled environment o C average temperature housed in cages according to EU legislation specified pathogen free conditions (SPF) 14 hours light cycle 10 hours dark cycle Developmental stage: stage 28 (juvenile (young) mice) [ GXD "Mouse Anatomical Dictionary" ] Organism part: thymus [ GXD "Mouse Anatomical Dictionary" ] Strain or line: C57BL/6 [ International Committee on Standardized Genetic Nomenclature for Mice ] Genetic Variation: Inbr (J) 150. Origin: substrains 6 and 10 were separated prior to This substrain is now probably the most widely used of all inbred strains. Substrain 6 and 10 differ at the H9, Igh2 and Lv loci. Maint. by J,N, Ola. [ International Committee on Standardized Genetic Nomenclature for Mice ] Treatment: in vivo [ MGED ] [ intraperitoneal ] injection of [ Dexamethasone ] into mice, 10 microgram per 25 g bodyweight of the mouse Compound: drug [ MGED ] synthetic [ glucocorticoid ] [ Dexamethasone ], dissolved in PBS

Part II - Talk structure  Data transfer from IMD to ArrayExpress: Can data be parsed? MIAME-compliant?  Toxicology specific MIAMExpress interface: ILSI toxicogenomics data submission  Areas of collaboration-Summary The European Bioinformatics Institute

Toxicology specific MIAMExpress  Toxicology specific interface options: in vivo or in vitro Study specific (Hepatotoxicity, Nephrotoxicity, Genotoxicity)  CVs and ontologies to be developed: CVs in pull down menus ‘Q,V,S’ users driven ontologies Extend MGED ontology to include toxicology specifics terms  Dynamic, fast and easy to use  Browse: Protocols Arrays The European Bioinformatics Institute

Areas of collaboration  Data transfer: Parser from IMD to ArrayExpress (MAGE-ML) Additional information required: –MIAME compliant flag (e.g. data, protocols, sample pooling etc.) –Build complex queries  Data submission: Submission via toxicology specific MIAMExpress –CVs and ontologies –Interfaces options –Protocols  Other data: Volume (79 from Hetapotoxicity) Clinical chemistry, Histophatology –Format (images also?) and volume  Mailing list The European Bioinformatics Institute