Download presentation
Presentation is loading. Please wait.
Published byAnnabella Price Modified over 6 years ago
1
QSAR Application Toolbox: Step 12: Building a QSAR model
2
Objectives This presentation demonstrates building a QSAR model for predicting acute toxicity to Tetrahymena pyriformis of aldehydes. The presentation addresses specifically: predicting acute toxicity for a target chemical; building QSAR model based on the prediction; applying the model to other aldehydes; exporting the predictions to a file.
3
The Exercise This exercise includes the following steps:
select a target chemical – Furfural, CAS 98011; extract available experimental results; search for analogues; estimate the 48h-IGC50 for Tetrahymena pyriformis by using trend analysis; improve the data set by either: subcategorising by “Protein binding” mechanisms, or assessing the difference between outliers and the target chemical evaluate and save the model; Use the model to display its training set, visualize its applicability domain and perform predictions.
4
Chemical Input After launching the Toolbox, select the “Flexible Track”. This takes you to the first module, which is “Chemical input”. Enter the target chemical by its CAS number ( )
5
Select target chemical – Furfural, CAS 98011
6
Substance Information
7
Profiling the Target Chemical
Select the “Profiling methods” you wish to use by clicking on the box before the name of the profiler. For this example check all mechanistic methods. Click on “Apply”.
8
Profiling
9
Target interaction with proteins
Double clicking shows profiling scheme The chemical could interact with protein by Schiff-base formation.
10
Target interaction with proteins
11
Endpoints “Endpoints” refer to the electronic process of retrieving the environmental fate, ecotoxicity and toxicity data that are stored in the Toolbox database. Data gathering can be executed in a global fashion (i.e., collecting all data of all endpoints) or on a more narrowly defined basis (e.g., collecting data for a single or limited number of endpoints).
12
Extracting endpoint values
13
Redundancy table Reports for same endpoint values across databases
14
Reproducing endpoint value
In this exercise we will build a QSAR model to estimate the following endpoint : Ecotoxicological Information Aquatic Toxicity Protozoa Tetrahymena pyriformis IGC50 48h
15
Defining a Category The initial search for analogues is
based on structural similarity, in this example: - US EPA categorization
16
Category Definition
17
Set Category Name
18
Analogues The data is automatically collated.
Based on the defined category (Aldehydes US EPA categorisation) 274 analogues have been identified. These 274 compounds along with the target chemical form a category (Aldehydes), which can be used for data gap filling (see next slide).
19
Analogues
20
Extracting experimental results for analogues
Highlight the [274] Aldehydes (US EPA categorisation). The inserted window entitled “Read Data?” appears (see next slide). Click OK.
21
Extracting experimental results for analogues
22
Extracting experimental results for analogues
23
Applying Trend-analysis
Move to the module “Filling data gap” Open the data tree to: Ecotoxicological information Protozoa Tetrahymena pyriformis IGC50 48 h Highlight the data endpoint box under the target chemical. It contains already an experimental result, which we are going to reproduce by trend analysis. Next with the “trend analysis” box highlighted, click “Apply” (see next slide).
24
Apply Trend-analysis
25
Results of Trend-analysis
26
Interpreting the Trend-analysis
The resulting plot outlines the available experimental results of all analogues (Y axis) according to a default descriptor Log Kow (X axis). The RED dot represents the target chemical. The BLUE dots represent the experimental results available for the analogues. The GREEN dots represent the analogues belonging to a different subcategory (see following slides).
27
An Accurate Trend Analysis of the Data set (1)
In this example, the mechanistic properties of the analogues are not consistent. Subcategorization can be performed based on protein binding mechanisms. This is the second stage of analogue search - requiring the same interaction mechanism. Acute effects are indeed associated with interaction of chemicals with lipid cell membrane, i.e. with protein binding. Chemicals with a different protein binding mechanism compared to the target chemical will be removed.
28
Subcategorization To improve the data by subcategorizing, follow these steps: Click on Subcategor. Select Protein binding from the Grouping methods list. All chemicals which have a potential protein binding mechanism different from the target chemical are highlighted (GREEN dots) Click on Remove.
29
Subcategorization
30
Result after Subcategorization
31
An Accurate Trend Analysis of the Data set (2)
The chemicals which differ from the target are: Michael type nucleophilic addition (23); No binding (48); Nucleophilic addition to azomethynes (1); Nucleophilic substitution of haloaromatics (1); Another way for refining the data set is to ask what makes the obvious outliers different from the target.
32
Subcategorization Right-Click on any of the outlying results from the analogues (BLUE dots) Select Differences to target from the menu Select Protein binding from the Grouping methods list Click on Remove (see next slide)
33
Subcategorization
34
Result after Subcategorization
35
QSAR Model evaluation To assess the model accuracy use: - Adequacy (predictions after leave-one-out) - Statistics - Cumulative frequency
36
QSAR Model evaluation
37
QSAR Model evaluation
38
abs (obs-predicted) for 95% comparable with the variation
QSAR Model evaluation The residuals abs (obs-predicted) for 95% of analogues are comparable with the variation of experimental data.
39
Saving the Derived QSAR Model
To save the new regression model follow these steps: - Click on Save model button - Enter the model name “Acute tox” - Click on OK and - Accept the value
40
QSAR Model evaluation
41
Apply QSAR model The derived model can be used to:
List training set chemicals; Right-click on the QSAR model Acute tox ; Select training set from the context menu; Visualize whether a chemical is in the applicability domain of the model; In the data matrix highlight the empty cell of one of the analogues (e.g. chemical no 2 in the matrix) for the endpoint 48-h IGC Tetrahymena pyriformis Select Display domain; Perform predictions for the chemicals in the matrix. Select Predict endpoint and All Chemicals in domain
42
Apply QSAR model Training set
43
Apply QSAR model Visualize whether a chemical is in the applicability domain of the model
The chemical is an aldehyde as required by the model. It can react with protein by Schiff-base formation and does not react to protein by any of the eliminated mechanisms: Michael-type nucleophilic addition No binding Nucleophilic addition to azomethynes Nucleophilic substitution of haloaromatics Another requirement is Log Kow to be >= and <= The last requirement is slightly violated (Log Kow = 4.87) and therefore the chemical is outside of the applicability domain of the model.
44
Apply QSAR model Visualize whether a chemical is in the applicability domain of the model
45
Apply QSAR model Perform predictions
46
Apply QSAR model Perform predictions
47
Export QSAR results The predictions for the chemicals in the matrix can be exported into a text file. In the data tree right-click on 48 h (for the endpoint IGC50 for Tetrahymena pyriformis) and select Export endpoint data from the menu.
48
Export QSAR results click right button
49
Export QSAR results
50
Export QSAR results
51
Export QSAR results The resulting text file can be loaded into a spreadsheet and further analysed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.