Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strategies to build toxicity databases for data mining Chihae Yang June 28, 2007 Leadscope, inc.

Similar presentations

Presentation on theme: "Strategies to build toxicity databases for data mining Chihae Yang June 28, 2007 Leadscope, inc."— Presentation transcript:

1 Strategies to build toxicity databases for data mining Chihae Yang June 28, 2007 Leadscope, inc.

2 Overview Definitions of databases Landscape of current public toxicity databases Relational database Simple examples Strategies

3 Types of databases Literature –Bibliography Factual –Primary source –Secondary and tertiary source Curated Metadatabases

4 Types of information formats Monographs Simple relational tables –compound name –toxicity endpoints Structure-integrated databases

5 Primary sources US National Toxicology Program (NTP) – US Environmental Protection Agency, Mid- Continent Ecology Division – Tokyo Eiken (Tokyo Metropolitan Institute of Public Health) –

6 Curated 2 nd and 3 rd databases Chemical Carcinogenesis Research Information System ( US EPA ECOTOX ( Metadatabase –ToxNet ( QSAR database –Danish EPA (

7 Examples of monographs and risk assessment Internation Programme on Chemical Safety, INCHEM ( US CDC Agency for Toxic Substances & Disease Registry, ATSDR ( International Agency for Research on Cancer International Toxicity Estimates for Risk (ITER) International Risk Information System (IRIS)

8 ToxicityDatabase sources Carcinogenicity CCRIS, IRIS (EPA), ITER (TERA), NTP, PAN Pesticide, IARC, ATSDR, … Genetic toxicity CCRIS, GeneTox (EPA), NTP, The Mutants- Japan, Tokyo Eiken… Target organsNTP, ATSDR, IRIS (EPA), … Reproductive Developmental DART, NTP, PAN Pesticide ImmunologyNTP Skin sensitizationNo public database Environmental endpoints ECOTOX (EPA)

9 Issues with toxicity databases Narrow use scenarios –Mainly designed for information look-up –Limited usefulness for data mining and SAR analysis Reasons –Fragmented, inconsistent, conflicting information –Disparate format –Non-standard or questionable data quality –Lack of accessibility –…

10 In Silico toxicology workflow Standardize Assess Search Visualize Data mine Predict Publish Reports Databases Spreadsheets Microfiche …… Reports Databases Spreadsheets Microfiche …… IntegrationManagement and storageData mining

11 Trends in toxicity databases Relational databases Standardization Data mining Link to chemical structures Link to genomics, proteomics, metabonomics (metabolomics) data

12 Relational database - Benefits Search across fields and domains –Precise searching –Asking complex questions –Hypothesis-driven queries Basis for data integration Read across

13 Relational database - Requirements Data model –Standardized fields –Relationships between the fields Database platform –Oracle –MySQL –BerkeleyDB –PointBase –…

14 Structure-integrated databases - PubChem

15 Structure-integrated databases - DSSTox Link-out to PubChem CID

16 ToxML methodology Open standard XML format for representing toxicity data –Standardized fields –Controlled vocabulary Extensible through a schema Independent of any particular database schema and independent of any application

17 Compound level Study level Test level Treatment level QSAR Signal detection Individual level Yang, CODDD, 9(1), 124, 2006. Flexible data model for different uses

18 Example: Look-up lanzoprazole (pravacid) lanzoprazole Search ( 검색 ) ID Name

19 Example: Look-up lanzoprazole (pravacid) 아급성 독성 박테리아 돌연변이 포유류 돌연변이 염색체 이상 (in vivo) 반복 투여 – 만성 – 아급성 급성 독성 생식 독성

20 Example: Find analogs of lanzoprazole 화학구조 - substructure - similarity degree of similarity search

21 listed in the order of similarity Example: analogs of lanzoprazole

22 Profile of toxicity of lanzoprazole analogs 병리결과임상결과유전독성결과독성실험 type 투여방법

23 Traditional paradigm of chemical analog searching structural descriptions chemical stressor analogs profile Current Computer-Aided Drug Design, 2006, 2, 1-19. biological/environmental fate

24 Finding biological analogs structural descriptions analogs profile biological/environmental fate

25 Example: Hypothesis-driven query Compounds that are genotoxic may give reproductive-developmental effects. –Positives in clastogenicity –Positives in reproductive-developmental effects For example, in US FDA PAFA database, most of the food direct additives that are clastogens resulted in some level of reproductive developmental effects.

26 Setting up the query: Clastogenic and repro-dev effects clastogenicity 생식 - 발생독성 염색체 이상 repro-dev

27 Example of query results Repro-developmental clastogenicity

28 Components of data mining Relational database Searching Visualization Data analysis Categorization/Ranking SAR & QSAR Chemical structure Toxicity studies Look up Asking questions Read across Data, Structures, Relationships

29 Example of data mining the database Search and results –Hypothesis-driven queries Analysis –Toxicity database analysis for linking profiles of different endpoints Visualization –profiles and correlation

30 Example: Are there correlations between pathological lesions? Target siteSpeciesChemicals that induces tumor at each site CPDB using NTP pathological terms Current Computer-Aided Drug Design, 2006, 2, 1-19.

31 Transformed information – need for database model NameSpeciesAdrenalBoneLiverLungThyroid 1,4-Dioxanerat, mouseabsent presentabsent 1,5- Naphthalenediamine rat, mouseabsent present Estradiol mustardratabsent presentabsent Mirexrat, mouseabsent presentabsent C.I. direct blue 15rat, mouseabsent presentabsent 11-Aminoundecanoic acid ratabsent presentabsent Malonaldehyde, sodium salt ratabsent present Trimethylthiourearatabsent present

32 biological profile (lesions in target organs) compound classes

33 Correlations of lesions between target organ sites – qualifying read across Pearson correlation coefficients: liver – thyroid80% bone – thyroid82% lung – thyroid40%

34 Role of toxicity databases Chemical categorization Environmental and human health risk assessment –PBT assessment persistence, bioaccumulation, toxicity A data resource for QSAR (quantitative structure activity relationship) analysis required for international regulatory initiatives

35 International regulatory initiatives Canadian DSL (domestic substance list) –Implemented EU REACH –Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) –Originally targeted for June 1, 2007 EU 7 th Amendment –Limitation of animal testing for cosmetics

36 Required components of toxicity database for data mining Well modeled relational database –Searching and retrieval –Search forms allowing hypothesis-driven queries –Database entry/build tools Knowledge base –Chemistry –Biology (toxicity study results) Foundation for analysis

37 Strategies - Technology Relational data Database “data model” Public standards –standardized field structures –harmonized controlled vocabulary Web-based public open source Link with existing public technology

38 Strategies - Content Provide or link with all available toxicity endpoints Link structures with toxicity data Link with international database efforts –PubChem ( –OECD toolbox ( ) –EU ECB JRC QSAR ( Ambit ( –US EPA ToxCast ( –US EPA DSSTox ( –US FDA – Leadscope, ToxML ( )

39 From data to database Vitamin A reproductive- developmental effects data entry form database tree view ToxML Editor – free download at

40 If you build it, they will come… W.P. Kinsella, Shoeless Joe

Download ppt "Strategies to build toxicity databases for data mining Chihae Yang June 28, 2007 Leadscope, inc."

Similar presentations

Ads by Google