Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.

Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT

Outline  Why do we need a database for toxicogenomics  How is it envisioned that this will be developed  What are the issues for such a database  Who is involved in such development The ILSI – EBI Collaboration

Traditional Biology One tree at a time

“Omic” Biology Forests and Mountains

Challenge of Genomics  “It’s the informatics, period!”  And it’s awfully tempting to take shortcuts! Experiment Biological Explanation INFORMATICS ?

Why do we need a database?  Volume of data Traditional endpoints per animal <20 histopathology observations <10 gross measurements (e.g. weights, food) <25 serum measurements <10 urinalysis measurements Genomic endpoints per animal 5,000-10,000 transcripts !!!

Why do we need a database? (cont)  Influence of technology details Influence of probe sequence Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray

Influence of Probe Sequence Most arrays target this region of the mRNA!

Why do we need a database? (cont)  Influence of technology details Influence of probe sequence Many genes are “alternatively spliced” – such events may not be detected unambiguously by a microarray For cDNA arrays, probes may hybridize to more than one sequence A database that captures probe sequence is required to resolve discrepancies through automated bioinformatics

How are databases being developed?  Microarray Gene Expression Data Society - MGED Society MIAME - Minimum Information About a Microarray Experiment “the minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction” Essentially, what should go into the database

How are databases being developed?  MIAME – Basic Areas Experiment Design Samples used, extract preparation and labeling Hybridization procedures and parameters Measurement data and specifications Array Design

How are databases being developed? (cont)  MGED Society MAGE Programming conventions and data structures to communicate Microarray Gene Expression data  MAGE-OMObject Model  MAGE-MLMarkup Language Essentially, how the data is exchanged/ how the database is constructed

How are databases being developed? (cont)  MGED Society Ontology working group Ontologies provide a vocabulary for representing and communicating knowledge about a topic, allowing interpretation and use by computers MGED Ontology will provide standard terms for the annotation of microarray experiments, allowing:  structured queries  unambiguous descriptions of experiments

How are databases being developed? (cont)  MGED Society Data Transformation and Normalization Working Group Standards for recording how microarray data are transformed and normalized.

What are the issues for a toxicogenomics database?  Scope of the ILSI effort: Genotoxicity Group 10 array platforms 11 compounts  >2 time points, up to 10 doses / compound Nephrotoxicity Group 6 array platforms 3 compounds, 260 animals

What are the issues for a toxicogenomics database?  Scope of the ILSI effort: Hepatotoxicity Group 8 array platforms 2 compounds, 144 animals 2 in-life studies / compound ALL Groups Analysis of each sample at multiple sites

What are the issues for toxicogenomics databases? (cont)  Traditional toxicology endpoints are not currently covered by MAGE, MIAME, or the MGED Ontologies! Organ weights Clinical pathology Histopathology Etc

What are the issues for toxicogenomics databases?  Traditional toxicology endpoints are not standardized in nomenclature Clinical pathology/chemistry AACC IUPAC Histopathology STP WHO/IARC/RITA NACAD SNOMED NTP, TDMS Database Pathology Code Table

Who is involved in database development  Private Companies Genelogic, Iconix, Curagen  MSU- dbZach  NIEHS- CEBS  NCTR- ArrayTrack  ILSI - EBI

ILSI-HESI and EBI collaboration  Establishment of database for toxicogenomics data Capture, store and analyse gene expression data produced from many different toxicogenomic experiments, conducted in several different laboratories worldwide by the ILSI-HESI members Interrogate the gene array data integrating information from genomic, experimental and toxicological domains Gain knowledge of possible links between gene expression changes and toxicological endpoints

ILSI-HESI and EBI collaboration  Aims of the database and tools Provide a way to integrate the different domains Control the annotation to achieve data harmonization Centralize the information to ease data access and data sharing Improve array annotations as the genome assemblies are released ALLOW data comparison

ILSI-HESI and EBI collaboration  Main challenge Get internally consistent data to allow comparability among the experiments and run complex queries across and within domains Note= Experiments conducted in ~40 different sites, using different array platforms and terminologies, measuring parameters with different units and storing information in different format !

ILSI-HESI and EBI collaboration  ‘Simple’ question: “Does gene X expression goes up after treatment with compound Y with biological endpoint Z in experiments from ILSI-HESI members A and B ?”  ‘Not simple’ question: “Which are the most reproducible gene expression changes (and the quantitative measure of this reproducibility) for all experiments on the rat arrays, with biological endpoint X, and which functional category these genes belong to and which are the human homologues ? ”

 An international effort aiming to Share expertise Encourage harmonization Promote standardization initiative  A call for community participation! NIEHS-NCT EMBL-EBI Toxico- genomics ILSI-HESI MIAME/Tox

MIAME/Tox objectives  Standard contextual information Establish worldwide scientific consensus on the minimal information descriptors for array-based toxicogenomics experiments  Data harmonization Encourage use of controlled vocabularies for the toxicological assessments  Data integration and data sharing Link data within a study Link several studies from one institution Exchange datasets among institutions  Data storage Facilitate development of MIAME/Tox compliant data management softwares and databases - ArrayExpress @ EBI and CEBS @ NIEHS-NCT

MIAME/Tox document  Promote standard contextual information Defining the core common to most experiments - Minimum/sufficient information -Structured information  Promote data harmonization, data capture and communication MIAME/Tox is based on MIAME  Focus on toxicological domain Sample treatment and conventional toxicology information - Clinical pathology, pathology, histopathology……

MIAME/Tox document  Available at the MGED Society and ILSI-HESI web sites Circulate for consensus - Toxicogenomics, pharmacogenomics and ecotoxicogenomics communities -Regulatory bodies - MGED Meeting (AAAS, Denver, Feb 2003; MGED6, France, Sept 2003) -T oxicology societies (SOT Meeting, Salt Lake City, March 2003) Review and publish  Work closely with the MGED working groups Ontology working group - Identify controlled vocabularies for toxicological metadata

Data Input As a Key Step 1.Capture data in a standard manner  Tox-MIAMExpress 2.Store information domains in database  ArrayExpress 3.Compare/query across and within domains

Tox-MIAMExpress  Protocols Conventional toxicology tests Microarray experiments

Tox-MIAMExpress  Array designs A set of procedures for formatting the array design information into a standard referencing format (ADF) A set of procedure to re-annotate or up date the array designs via a link to another database at EBI (EnsMart)

Tox-MIAMExpress  Experiment Experiment design, quality controls, publications Sample source and treatment Conventional toxicology tests data Microarray hybridizations data

Tox-MIAMExpress

ILSI-HESI and EBI collaboration  Status: Interface and database infrastructure developed Data input ongoing

Acknowledgments  Microarray Informatics Team at EBI, in particular Alvis Brazma (Team Leader and MGED Society President) Susanna-Assunta Sansone Philippe Rocca-Serra (Data Management)  NIEHS-NCT and NTP  ILSI-HESI EBI Steering Committee  ILSI-HESI Genomics Committee

Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.

Similar presentations

Presentation on theme: "Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT.

Similar presentations

Presentation on theme: "Content, Format, and Standards in Genomics Scale Data The ILSI – EBI Collaboration Wm. B. Mattes, PhD, DABT."— Presentation transcript:

Similar presentations

About project

Feedback