Presentation is loading. Please wait.

Presentation is loading. Please wait.

QSAR Application Toolbox: Step 12: Building a QSAR model

Similar presentations


Presentation on theme: "QSAR Application Toolbox: Step 12: Building a QSAR model"— Presentation transcript:

1 QSAR Application Toolbox: Step 12: Building a QSAR model

2 Objectives This presentation demonstrates building a QSAR model for predicting acute toxicity to Tetrahymena pyriformis of aldehydes. The presentation addresses specifically: predicting acute toxicity for a target chemical; building QSAR model based on the prediction; applying the model to other aldehydes; exporting the predictions to a file.

3 The Exercise This exercise includes the following steps:
select a target chemical – Furfural, CAS 98011; extract available experimental results; search for analogues; estimate the 48h-IGC50 for Tetrahymena pyriformis by using trend analysis; improve the data set by either: subcategorising by “Protein binding” mechanisms, or assessing the difference between outliers and the target chemical evaluate and save the model; Use the model to display its training set, visualize its applicability domain and perform predictions.

4 Chemical Input After launching the Toolbox, select the “Flexible Track”. This takes you to the first module, which is “Chemical input”. Enter the target chemical by its CAS number ( )

5 Select target chemical – Furfural, CAS 98011

6 Substance Information

7 Profiling the Target Chemical
Select the “Profiling methods” you wish to use by clicking on the box before the name of the profiler. For this example check all mechanistic methods. Click on “Apply”.

8 Profiling

9 Target interaction with proteins
Double clicking shows profiling scheme The chemical could interact with protein by Schiff-base formation.

10 Target interaction with proteins

11 Endpoints “Endpoints” refer to the electronic process of retrieving the environmental fate, ecotoxicity and toxicity data that are stored in the Toolbox database. Data gathering can be executed in a global fashion (i.e., collecting all data of all endpoints) or on a more narrowly defined basis (e.g., collecting data for a single or limited number of endpoints).

12 Extracting endpoint values

13 Redundancy table Reports for same endpoint values across databases

14 Reproducing endpoint value
In this exercise we will build a QSAR model to estimate the following endpoint : Ecotoxicological Information Aquatic Toxicity Protozoa Tetrahymena pyriformis IGC50 48h

15 Defining a Category The initial search for analogues is
based on structural similarity, in this example: - US EPA categorization

16 Category Definition

17 Set Category Name

18 Analogues The data is automatically collated.
Based on the defined category (Aldehydes US EPA categorisation) 274 analogues have been identified. These 274 compounds along with the target chemical form a category (Aldehydes), which can be used for data gap filling (see next slide).

19 Analogues

20 Extracting experimental results for analogues
Highlight the [274] Aldehydes (US EPA categorisation). The inserted window entitled “Read Data?” appears (see next slide). Click OK.

21 Extracting experimental results for analogues

22 Extracting experimental results for analogues

23 Applying Trend-analysis
Move to the module “Filling data gap” Open the data tree to: Ecotoxicological information Protozoa Tetrahymena pyriformis IGC50 48 h Highlight the data endpoint box under the target chemical. It contains already an experimental result, which we are going to reproduce by trend analysis. Next with the “trend analysis” box highlighted, click “Apply” (see next slide).

24 Apply Trend-analysis

25 Results of Trend-analysis

26 Interpreting the Trend-analysis
The resulting plot outlines the available experimental results of all analogues (Y axis) according to a default descriptor Log Kow (X axis). The RED dot represents the target chemical. The BLUE dots represent the experimental results available for the analogues. The GREEN dots represent the analogues belonging to a different subcategory (see following slides).

27 An Accurate Trend Analysis of the Data set (1)
In this example, the mechanistic properties of the analogues are not consistent. Subcategorization can be performed based on protein binding mechanisms. This is the second stage of analogue search - requiring the same interaction mechanism. Acute effects are indeed associated with interaction of chemicals with lipid cell membrane, i.e. with protein binding. Chemicals with a different protein binding mechanism compared to the target chemical will be removed.

28 Subcategorization To improve the data by subcategorizing, follow these steps: Click on Subcategor. Select Protein binding from the Grouping methods list. All chemicals which have a potential protein binding mechanism different from the target chemical are highlighted (GREEN dots) Click on Remove.

29 Subcategorization

30 Result after Subcategorization

31 An Accurate Trend Analysis of the Data set (2)
The chemicals which differ from the target are: Michael type nucleophilic addition (23); No binding (48); Nucleophilic addition to azomethynes (1); Nucleophilic substitution of haloaromatics (1); Another way for refining the data set is to ask what makes the obvious outliers different from the target.

32 Subcategorization Right-Click on any of the outlying results from the analogues (BLUE dots) Select Differences to target from the menu Select Protein binding from the Grouping methods list Click on Remove (see next slide)

33 Subcategorization

34 Result after Subcategorization

35 QSAR Model evaluation To assess the model accuracy use: - Adequacy (predictions after leave-one-out) - Statistics - Cumulative frequency

36 QSAR Model evaluation

37 QSAR Model evaluation

38 abs (obs-predicted) for 95% comparable with the variation
QSAR Model evaluation The residuals abs (obs-predicted) for 95% of analogues are comparable with the variation of experimental data.

39 Saving the Derived QSAR Model
To save the new regression model follow these steps: - Click on Save model button - Enter the model name “Acute tox” - Click on OK and - Accept the value

40 QSAR Model evaluation

41 Apply QSAR model The derived model can be used to:
List training set chemicals; Right-click on the QSAR model Acute tox ; Select training set from the context menu; Visualize whether a chemical is in the applicability domain of the model; In the data matrix highlight the empty cell of one of the analogues (e.g. chemical no 2 in the matrix) for the endpoint 48-h IGC Tetrahymena pyriformis Select Display domain; Perform predictions for the chemicals in the matrix. Select Predict endpoint and All Chemicals in domain

42 Apply QSAR model Training set

43 Apply QSAR model Visualize whether a chemical is in the applicability domain of the model
The chemical is an aldehyde as required by the model. It can react with protein by Schiff-base formation and does not react to protein by any of the eliminated mechanisms: Michael-type nucleophilic addition No binding Nucleophilic addition to azomethynes Nucleophilic substitution of haloaromatics Another requirement is Log Kow to be >= and <= The last requirement is slightly violated (Log Kow = 4.87) and therefore the chemical is outside of the applicability domain of the model.

44 Apply QSAR model Visualize whether a chemical is in the applicability domain of the model

45 Apply QSAR model Perform predictions

46 Apply QSAR model Perform predictions

47 Export QSAR results The predictions for the chemicals in the matrix can be exported into a text file. In the data tree right-click on 48 h (for the endpoint IGC50 for Tetrahymena pyriformis) and select Export endpoint data from the menu.

48 Export QSAR results click right button

49 Export QSAR results

50 Export QSAR results

51 Export QSAR results The resulting text file can be loaded into a spreadsheet and further analysed.


Download ppt "QSAR Application Toolbox: Step 12: Building a QSAR model"

Similar presentations


Ads by Google