1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National Research Council
2/15 Supervised learning Supervised learning refers to the capability of a system to learn from a set of input/output couples: Training Set.
3/15 Classification Consists of determining a model that it allows to group elements according to determined features The groups are the classes
4/15 Evaluation of classification methods It’s ability’s pointer of prediction model Some methods employ little time than others The defined rules and the accuracy do not change considerable with various set Possibility to classify dataset of great dimensions Accuracy Speed Robustness Scalability
5/15 To render more efficient the examples’ choice during the training Goals Delete the redundant examples or insufficient informative contribution Strengthening the training set, deleting the obsolete knowledge Building an efficient, scalabile and generalizable model
6/15 Classification techniques Based on tree Compute posterior probabilities with Bayes’ theorem Simulate the behavior of the biological systems Calculate hyperplanes Decision tree Bayesian Networks Neurals Networks Support Vector Machine (SVM) (Optimal Tree) (Slow in training)
7/15 SVM: The state of the art Support vector Optimal Hyperplane Separation margin Find an examples set (support vectors) representatives for classes Nonlinear case Linear case
8/15 Regec Two Hyperplanes representative for classes (GEPSVM’s family) Based on Genralized Eigenvalue
9/15 I-Regec Select k points for each class with a clustering technique (K-means) |S| = 2xK Classify the test-set with the S points Add misclassified points in incremental mode to the S set On proceede until the finish of misclassified points
10/15 Strengthening Apply I-ReGEC in order to obtain the training set Each iteration delete a point from training set Apply I-ReGEC in each iteration with new input set S Strengthening the set (save new S) if accuracy is improved
11/15 Microarray and matrix EXAMPLESEXAMPLES FEATURES CLASSES Gene expression
12/15 Results DATASET ACC. I-Regec N° of points ACC. Strengthening N° of points Alon (62x2000) Colon cancer 73,00% 7,78 74,60% 7,78 Golub (72x7129) Leukaemia 87,12% 9,44 89,88% 9,44 Nutt (50x12625) Gliome 65,20% 7,47 65,20% 7,47 BRCA1 (22x3226) Breast Cancer 67,50% 4,24 67,50% 4,24 BRCA2 (22x3226) Breast Cancer 78,50% 5,53 79,50% 5,96
13/15 Results and Diagrams Golub 2D Golub 3D I-RegecStrengthening StrengtheningI-Regec
14/15 The examples choice became more efficient Conclusions The reduntants or obsolete examples have been deleted The training set are “strengthened”
15/15 Future work In order to optimize the execution time, the Strengthening technique would to go integrated into I-Regec.