Download presentation
Presentation is loading. Please wait.
Published byDortha Dixon Modified over 9 years ago
1
Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT
2
Outline Why do we need a database for toxicogenomics How is it envisioned that this will be developed What are the issues for such a database Who is involved in such development The ILSI – EBI Collaboration
3
Traditional Biology One tree at a time
4
“Omic” Biology Forests and Mountains
5
Challenge of Genomics “It’s the informatics, period!” And it’s awfully tempting to take shortcuts! Experiment Biological Explanation INFORMATICS ?
6
Why do we need a database? Volume of data Traditional endpoints per animal <20 histopathology observations <10 gross measurements (e.g. weights, food) <25 serum measurements <10 urinalysis measurements Genomic endpoints per animal 5,000-10,000 transcripts !!!
7
Why do we need a database? (cont) Influence of technology details Influence of probe sequence Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray
8
Influence of Probe Sequence Most arrays target this region of the mRNA!
9
Why do we need a database? (cont) Influence of technology details Influence of probe sequence Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray For cDNA arrays, probes may hybridize to more than one sequence A database that captures probe sequence is required to resolve discrepancies through automated bioinformatics
10
How are databases being developed? Microarray Gene Expression Data Society - MGED Society MIAME - Minimum Information About a Microarray Experiment “the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction” Essentially, what should go into the database
11
How are databases being developed? MIAME – Basic Areas Experiment Design Samples used, extract preparation and labeling Hybridization procedures and parameters Measurement data and specifications Array Design
12
How are databases being developed? (cont) MGED Society MAGE Programming conventions and data structures to communicate Microarray Gene Expression data MAGE-OMObject Model MAGE-MLMarkup Language Essentially, how the data is exchanged/ how the database is constructed
13
How are databases being developed? (cont) MGED Society Ontology working group Ontologies provide a vocabulary for representing and communicating knowledge about a topic, allowing interpretation and use by computers MGED Ontology will provide standard terms for the annotation of microarray experiments, allowing: structured queries unambiguous descriptions of experiments
14
How are databases being developed? (cont) MGED Society Data Transformation and Normalization Working Group Standards for recording how microarray data are transformed and normalized.
15
What are the issues for a toxicogenomics database? Scope of the ILSI effort: Genotoxicity Group 10 array platforms 11 compounts >2 time points, up to 10 doses / compound Nephrotoxicity Group 6 array platforms 3 compounds, 260 animals
16
What are the issues for a toxicogenomics database? Scope of the ILSI effort: Hepatotoxicity Group 8 array platforms 2 compounds, 144 animals 2 in-life studies / compound ALL Groups Analysis of each sample at multiple sites
17
What are the issues for toxicogenomics databases? (cont) Traditional toxicology endpoints are not currently covered by MAGE, MIAME, or the MGED Ontologies! Organ weights Clinical pathology Histopathology Etc
18
What are the issues for toxicogenomics databases? Traditional toxicology endpoints are not standardized in nomenclature Clinical pathology/chemistry AACC IUPAC Histopathology STP WHO/IARC/RITA NACAD SNOMED NTP, TDMS Database Pathology Code Table
19
Who is involved in database development Private Companies Genelogic, Iconix, Curagen MSU- dbZach NIEHS- CEBS NCTR- ArrayTrack ILSI - EBI
20
ILSI-HESI and EBI collaboration Establishment of database for toxicogenomics data Capture, store and analyse gene expression data produced from many different toxicogenomic experiments, conducted in several different laboratories worldwide by the ILSI-HESI members Interrogate the gene array data integrating information from genomic, experimental and toxicological domains Gain knowledge of possible links between gene expression changes and toxicological endpoints
21
ILSI-HESI and EBI collaboration Aims of the database and tools Provide a way to integrate the different domains Control the annotation to achieve data harmonization Centralize the information to ease data access and data sharing Improve array annotations as the genome assemblies are released ALLOW data comparison
22
ILSI-HESI and EBI collaboration Main challenge Get internally consistent data to allow comparability among the experiments and run complex queries across and within domains Note= Experiments conducted in ~40 different sites, using different array platforms and terminologies, measuring parameters with different units and storing information in different format !
23
ILSI-HESI and EBI collaboration ‘Simple’ question: “Does gene X expression goes up after treatment with compound Y with biological endpoint Z in experiments from ILSI-HESI members A and B ?” ‘Not simple’ question: “Which are the most reproducible gene expression changes (and the quantitative measure of this reproducibility) for all experiments on the rat arrays, with biological endpoint X, and which functional category these genes belong to and which are the human homologues ? ”
24
An international effort aiming to Share expertise Encourage harmonization Promote standardization initiative A call for community participation! NIEHS-NCT EMBL-EBI Toxico- genomics ILSI-HESI MIAME/Tox
25
MIAME/Tox objectives Standard contextual information Establish worldwide scientific consensus on the minimal information descriptors for array-based toxicogenomics experiments Data harmonization Encourage use of controlled vocabularies for the toxicological assessments Data integration and data sharing Link data within a study Link several studies from one institution Exchange datasets among institutions Data storage Facilitate development of MIAME/Tox compliant data management softwares and databases - ArrayExpress @ EBI and CEBS @ NIEHS-NCT
26
MIAME/Tox document Promote standard contextual information Defining the core common to most experiments - Minimum/sufficient information -Structured information Promote data harmonization, data capture and communication MIAME/Tox is based on MIAME Focus on toxicological domain Sample treatment and conventional toxicology information - Clinical pathology, pathology, histopathology……
27
MIAME/Tox document Available at the MGED Society and ILSI-HESI web sites Circulate for consensus - Toxicogenomics, pharmacogenomics and ecotoxicogenomics communities -Regulatory bodies - MGED Meeting (AAAS, Denver, Feb 2003; MGED6, France, Sept 2003) -T oxicology societies (SOT Meeting, Salt Lake City, March 2003) Review and publish Work closely with the MGED working groups Ontology working group - Identify controlled vocabularies for toxicological metadata
28
Data Input As a Key Step 1.Capture data in a standard manner Tox-MIAMExpress 2.Store information domains in database ArrayExpress 3.Compare/query across and within domains
29
Tox-MIAMExpress Protocols Conventional toxicology tests Microarray experiments
30
Tox-MIAMExpress Array designs A set of procedures for formatting the array design information into a standard referencing format (ADF) A set of procedure to re-annotate or up date the array designs via a link to another database at EBI (EnsMart)
31
Tox-MIAMExpress Experiment Experiment design, quality controls, publications Sample source and treatment Conventional toxicology tests data Microarray hybridizations data
32
Tox-MIAMExpress
35
ILSI-HESI and EBI collaboration Status: Interface and database infrastructure developed Data input ongoing
36
Acknowledgments Microarray Informatics Team at EBI, in particular Alvis Brazma (Team Leader and MGED Society President) Susanna-Assunta Sansone Philippe Rocca-Serra (Data Management) NIEHS-NCT and NTP ILSI-HESI EBI Steering Committee ILSI-HESI Genomics Committee
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.