Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detailed q2/Q2 results for 100 bootstraps for final runs with (38 + dummy features)

Similar presentations


Presentation on theme: "Detailed q2/Q2 results for 100 bootstraps for final runs with (38 + dummy features)"— Presentation transcript:

1 Detailed q2/Q2 results for 100 bootstraps for final runs with (38 + dummy features)

2

3

4

5 Ordered by correlation coefficient

6

7 Example last pass feature Selection from 80 sensitivity bags Random dummy variable Bagged relative sensitivity from 80 bootstraps for the random dummy variable: descriptors with lower sensitivities will be eliminated in the next iteration Descriptors that will be eliminated in the next iteration

8 STRIPMINER OPERATION MODE Mode #6: feature selection with sensitivity analysis (~ 1000 neural nets) (Q2 = 0.46, all molecules) Bootstraps with sensitivity analysis with a dummy var for descriptor selection (480  39 descriptors) Mode #0: train neural nets 300 bootstrap ANNs (300 neural nets trained) Ensemble bagging for selected descriptors Note: all ANNs 39x13x11x1 trained to error of 0.12 11 pats in validation set Mode #4: predict for test set using bagging weights (100x30/300 bags) (3000 ANNs in user mode) Bag prediction on test set Note: ensemble results weighted by Q2 calculated in Mode#0

9 Stripminer Neural Network Sensitivity Analysis With Dummy Feature REPEAT REPEAT 100x Do neural network bootstrap And calculate Q2 for validation set There is one random dummy feature There is a validation set for bagging Prepare file for sensitivity analysis (can be up to 30 MB) Run neural net in user mode for sensitivity analysis MetaNeural Calculate sensitivity results for 13 levels and tally results in sen#.txt SENSIT Reduce features by dropping feats with lower sensitivity than dummy CONTINUE Bag sensitivities TEST (repeat until the dummy variable is the least sensitive feature) Bagging and feature selection

10    Molecular weight H-bonding Boiling Point Hydrofobicity Biological response Electrostatic interactions   w 11 w 34   w 23 w 11 h h Neural Network Molecular Descriptor Observable Projection Neural Network Sensitivity Analysis RENSSELAER Keep all inputs frozen at median values Turn one input at a time from 0 to 1 Monitor vaqiation in outputs Outputs with largest variation are most sensitive  more important

11

12

13

14

15

16 Correleation biased Sum of 30 best correlated variables seems to have spurious correlation

17

18

19

20


Download ppt "Detailed q2/Q2 results for 100 bootstraps for final runs with (38 + dummy features)"

Similar presentations


Ads by Google