Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Similar presentations


Presentation on theme: "Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin."— Presentation transcript:

1 Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin

2 Abstract  Describe mathematical programming to feature selection, clustering and robust representation

3 Outline  Motivation  Objective  Problems  Feature Selection  Clustering  Robust Representation  Conclusion

4 Motivation  Mathematical programming has been applied to a great variety of theoretical  Problems can be formulated and effectively solved as mathematical programs

5 Objective  Describe three mathematical- programming-based developments relevant to data mining

6 Problems  Feature Selection  Clustering  Robust Representation

7 Problem - Feature Selection  Discriminating between two finite point sets in n-dimensional feature space and utilizes as few of the feature as possible  Formulated as mathematical program with a parametric objective function and linear constraints

8 Problem - Clustering  Assigning m points in the n- dimensional real space R n to k clusters  Formulated as determining k centers in R n, the sum of distances of each point to the nearest center is minimized

9 Problem - Robust Representation  Modeling a system of relations in a manner that preserves the validity of the representation when the data on which the model is based changes  Use a sufficiently small error זּ is purposely tolerated

10 Feature Selection  Use the simplest model to describe the essence of a phenomenon  Binary classification problem: –discriminating between two given point sets A and B in the n-dimensional real space R n by using as few of the n- dimensions of the space as possible

11 Binary classification W P

12  the following are some defined:  A  B Feature Selection

13 Successive Linearization Algorithm  w vector is result

14 Experimentation  32-feature Wisconsin Prognostic Breast Cancer(WPBC)  N=32, m = 28, k = 118, r = 0.05, 4 features, increasing tenfold cross-validation correctness by 35.4%

15 Clustering  Determining k cluster centers, the sum of the 1-norm distances of each point in a given database to nearest cluster center is minimized  Minimizing product of two linear functions on a set defined by linear inequalities

16 K-Median Algorithm  Need to solve

17 Experimentation  used as a KDD tool to mine WPBC to discover medical knowledge  key observation is curves are well separated

18 Experimentation

19 Robust Representation  model remains valid under a class of data perturbation  Use זּ -tolerance zone wherein errors are disregarded  Better generalization results than conventional zero-tolerance

20 Robust Representation  A is a m*n matrix, a is a m*1 vector  x is a vector be “ learned ”  find minimize of Ax - a

21 Robust Representation = x Aa זּ זּ -tolerate = x Aa

22 Conclusion  Mathematical programming codes are reliable and robust codes  Problems solved demonstrate mathematical programming as versatile and effective tool for solving important problems in data mining and knowledge discovery in databases

23 Opinion  Mathematical describe can explain about complex problems and convince others, but … you must be understand it first


Download ppt "Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin."

Similar presentations


Ads by Google