Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation.

Similar presentations


Presentation on theme: "Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation."— Presentation transcript:

1 Workflow

2 The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation of the molecular structure that is used in the CORALSEA is SMILES = simplified molecular input-line entry system For details, please see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

3 Here we used for the demo of CORALSEA our model from article “THE DEFINITION OF THE MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA AGENTS BY THE MONTE CARLO METHOD” Struct. Chem. 2013; 24:1369–1381 You can develop a better model, but now please follow our suggestions.

4 The first action is the preparation of SMILES file which is the input for CORALSEA +1 COc1ccc2c(c1)NC(C)=C(CCCCCCC)C2=O 7.332 +2 COc1ccc2c(c1)NC(C)=CC2=O 4.903 +3 O=C1c2ccccc2NC(C)=C1CCCCCCC 6.979 +4 O=C1c2ccccc2NC(C)=C1CCCCCCCCC 7.400 #5 O=C1c3ccccc3NC(C)=C1C2CCCCC2 5.652 -6 O=C1c3ccccc3NC(C)=C1c2ccccc2 6.270 +7 O=C2c3ccccc3NC(C)=C2Cc1ccccc1 5.207 +8 O=C1c2ccccc2NC(C)=C1Br 7.110 -9 O=C1c2ccccc2NC(C)=C1\C=C\CCCCCCC 7.824 +10 C=C(CCCCCCC)C=1C(=O)c2ccccc2NC=1C 7.472 +12 O=C2c3ccccc3NC(C)=C2/C=C/c1ccccc1 5.827 +13 COc1ccc2NC(C)=C(Br)C(=O)c2c1 5.934 -14 Cc1ccc2NC(C)=C(Br)C(=O)c2c1 6.583 #15 Brc1ccc2NC(C)=C(Br)C(=O)c2c1 6.470 +17 Fc1ccc2NC(C)=C(Br)C(=O)c2c1 6.903 +18 Clc1ccc2NC(C)=C(C#CCCCC)C(=O)c2c1 4.336 #19 COc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.675 -21 COc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 5.859 -22 COc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.295 -23 COc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 6.570 +24 COc3cccc1c3NC(C)=C(C1=O)c2ccccc2 5.779 -25 Clc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.279 #26 Clc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 5.485 #28 Clc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.324 -29 Clc1ccc2NC(C)=C(C(=O)c2c1)c3ccccc3 6.110 -30 Clc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 5.731 -31 Clc1ccc2NC(C)=C(C(=O)c2c1Cl)c3ccccc3 5.493 #33 Clc1cc2NC(C)=C(C(=O)c2c(Cl)c1)c3ccccc3 5.464 #34 COc1ccc3c(c1)C(=O)C(Cc2ccccc2)=C(C)N3C 5.094 +35 COc1ccc3c(c1)N(C)C(C)=C(Cc2ccccc2)C3=O 5.106 +36 Fc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.081 +37 Clc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.815 +38 Brc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.602 #39 Fc1cc2c(cc1OC)NC(C)=C(CC)C2=O 6.793 +41 Brc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.440 -44 Clc1cc2c(cc1OC)NC(C)=C(C2=O)C3CCCCC3 6.401 +45 Clc1cc3c(cc1OC)NC(C)=C(Cc2ccccc2)C3=O 7.164 -46 Clc1cc2c(cc1OC)NC(C)=C(C)C2=O 7.564 #47 CC(C)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 6.712 +48 CC(CC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.199 +49 Clc1cc2c(cc1OC)NC(C)=CC2=O 5.731 -50 Clc1cc2c(cc1OC)NC(C)=C(C#CCCCC)C2=O 5.376 #53 CC(C)(C)OC(=O)/C=C/C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.271 Each compound should be represented by (1) The type=[+,-,#]; (2) The ID: it can be CAS (chemical abstract service) or a number; (3) SMILES; and (4) Endpoint value. “+” is indicator of sub-training set; “-” is indicator of calibration set; “#” is indicator of test set. The role of sub-training set is developer of model; The role of calibration set is critic of model; The role of test set is estimator of model. MyFile.txt

5 It is a good idea to reserve some substances as "invisible" validation set for final estimation of the model 10 *11 O=C1c2ccccc2NC(C)=C1C\C=C\CCCCCC 6.728 *16 Clc1ccc2NC(C)=C(Br)C(=O)c2c1 6.900 *20 COc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 4.624 *27 Clc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 4.805 *32 Clc1cc2c(cc1Cl)NC(C)=C(C2=O)c3ccccc3 6.456 *40 Clc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.559 *42 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCC)C2=O 8.530 *43 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCCCC)C2=O 8.779 *51 C=C(CCCCC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.830 *52 Clc1cc2c(cc1OC)NC(C)=C(\C=C\CCCCC)C2=O 7.975 Format of file for this validation is the following: (1)The number of compounds; (2) list of compounds in the above-mentioned format type-ID-SMILES-Endpoint values. MyInput.txt

6 In order to start your work you must download CORALSEA.zip from www.insilico.eu/coral When it is done, you must insert folder "CORALSEA" in your computer:

7 …and insert your data (i.e. “MyTRNCLBTST.txt”) in folder “MyCORALSEA”:

8 Containing of MyCORALSEA is the following:

9 In order to carry out QSPR/QSAR analysis of data represented for CLASSIFICATION MODEL one should do the following: (i)Insert “#TRNCLBTST-1.txt” in the folder; (ii)Insert “#Input-1.txt” in the folder. (iii)Click CORALSEA.exe. “#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB),and test(TST) sets #Input.txt is data which are not visible during building up model

10 It appears in your screen: Click Button “Load method”…

11 It appears in your screen: Insert name “#TRNCLBTST-1.txt” in text box 1 3 2

12 It appears in your screen: Click “ SAVE SYSTEM”

13 It appears in your screen: Restart program and Click “Load system”

14 It appears in your screen: Click “OK”

15 It appears in your screen: This plot relates to the external “invisible” validation set

16 It appears in your screen: File “#Output-1.txt contains statistical characteristics for the validation set (#Output-1.txt is placed in folder “Model”)

17 In order to carry out QSPR/QSAR analysis of data represented for REGRESSION MODEL one should do the following: (i)Insert “#TRNCLBTST.txt” in the folder; (ii)Insert “#Input-1.txt” in the folder. (iii)Click CORALSEA.exe. “#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB),and test(TST) sets #Input.txt is data which are not visible during building up model

18 It appears in your screen: Insert name “#TRNCLBTST-1.txt” in text box. After this, please select “Classic Scheme” or “Balance of Correlation” for your QSPR/QSAR investigation SELECT INSERT

19 It appears in your screen: Two actions: (1) define Method and (2)Save method 1 2

20 It appears in your screen: You can involve graph invariants in addition to SMILES attributes 1 2

21 It appears in your screen: You can use “classic scheme”, balance of correlations, and Ideal slopes C1,C1’

22 It appears in your screen: You can choice your mode e.g. (1) Define Dstart=0.25 ; (2) Nepoch=20; after this you must do (3) Click “Save method”, otherwise method remains the same 1 1 2 3

23 It appears in your screen: Click “Search for preferable model (T*,N*)”

24 It appears in your screen: Programm will carry out the Monte Carlo optimization with various threshold and the number of epochs. The preferable values of threshold and the number of epochs one can find in file “Search/BestMDL.txt” when the calculation will be completed.

25 The containing of file “search/BestMDL.txt” will be approximately the following: One can see that preferable threshold (T*) is 2, and the preferable number of epochs (N*) is 15. One can use this information to build up robust model.

26 An attempt to build up robust model… Create Folder “MyCORALSEA-T2- N15” (copy of “MyCORALSEA”) Run CORALSEA.exe in this folder “MyCORALSEA-T2-N15” Click “Load method”

27 It appears in your screen: (1)Insert Nepoch=15, (2) Click “Building up preferable model (T*,N*)” T*=2 N*=15 (3)Insert Threshold=2, and (4) Click “Continue” 1 2 3 4

28 It appears in your screen: Click “Yes”

29 Gradually the program will be calculating the model :

30 When the model will be ready the screen will be the following : Click “Save system”

31 Folder “Model” contains parameters of the QSPR/QSAR model File “#Output-1.txt contains statistics for the invisible validation set

32 When the model will be ready the screen will be the following : Click “Load system”

33 It will appear at the screen (1)Insert name “MyInput.txt” instead of “#Input-1.txt” (2) Click “Start of DCW and Endpoint calculation for SMILES input file” 2 MyInput.txt 1

34 It will appear at the screen After these actions, file “model/Output.txt” will contain results of calculation for compounds from “MyInput.txt” Click “OK”

35 It will appear at the screen You will see graphical representation for sub-training, calibration, test, and validation sets.

36 The containing of the “model/Output.txt” will be the following: Last, but not least…

37 One can calculate model for individual SMILES (1)Insert SMILES in indicated box; (2) Click “Start of DCW and Endpoint Calculation for Inserted SMILES” 1 2

38 It appears in your screen: See file “Model/DemoDesc.txt”

39 The Containing of “Model/DemoDesc.txt” is the following: DCW is DCW(2,15) for NC(CCCNC(N)=N)C(O)=O; Endpoint=2.9412. This example is only demo, the NC(CCCNC(N)=N)C(O)=O is apparently out of Domain of applicability.

40 These slides have shown the "technology", but to understand "philosophy", please read file "ReadMe.pdf"

41 Some definitions


Download ppt "Workflow. The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity relationships (QSPRs/QSARs) The representation."

Similar presentations


Ads by Google