Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR Linear Program Feature Selection Model Selection and Bagging Computational Results Discussion
Support Vector Regression -insensitive loss function
Quadratic SVMs with L 2 -norm
Linear SVMs with L 1 -norm (-SVR)
QSAR Problems and Data SVMs for QSAR Statistical Analysis QSAR Model Building Statistical Analysis QSAR Model Building Calculation of Descriptors 3D Geometry Optimization Preparation of Input DATA (Bioactivity value, Structures) Preparation of Input DATA (Bioactivity value, Structures)
Data Sets HIV dataset five classes of Anti-HIV molecules, 64 molecules, 620 descriptors Lombardo benchmark dataset Brain-blood barrier partitioning dataset, 62 molecules, 649 descriptors Data Matrix descriptor1 descriptor descriptor m Activity Molecule 1 x11 x12 x1m ln BB Molecule 2 x21 x22 x2m ln BB Molecule n x n1 x n2 x nm ln BB
Data Matrix descriptor1 descriptor2 descriptor descriptor m Activity Molecule 1 x11 x12 x13 x1m ln BB Molecule 2 x21 x22 x23 x2m ln BB Molecule n x n1 x n2 x n3 x nm ln BB
SVMs for QSAR Construct Datasets Final Model Optimize Model Model Selection C, ,, Bagging Models Feature Selection
Linear Program Feature Selection
Bagging Different validation sets give different models Many local minima in SVM parameter search Average models Model Selection Choose SVM model parameters, C, or, Select evaluation function Q 2 Evaluate on testing data Adjust using cross validation
Computational Results Methods (10-fold CV) Full Data (649) LP FS (21) NN SA (9) Q2Q2 q2q2 Q2Q2 q2q2 Q2Q2 q2q2 L 1 -SVM L 2 -SVM NN
This work is supported by NSF (IIS and ) Discussion Robust optimization methods LPFS outperforms NNSA L1-SVM can run faster than L2-SVM ? May improve LPFS method ? May improve performance of L1-SVM