Download presentation
Presentation is loading. Please wait.
Published byMagdalen King Modified over 9 years ago
1
Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin
2
Abstract Describe mathematical programming to feature selection, clustering and robust representation
3
Outline Motivation Objective Problems Feature Selection Clustering Robust Representation Conclusion
4
Motivation Mathematical programming has been applied to a great variety of theoretical Problems can be formulated and effectively solved as mathematical programs
5
Objective Describe three mathematical- programming-based developments relevant to data mining
6
Problems Feature Selection Clustering Robust Representation
7
Problem - Feature Selection Discriminating between two finite point sets in n-dimensional feature space and utilizes as few of the feature as possible Formulated as mathematical program with a parametric objective function and linear constraints
8
Problem - Clustering Assigning m points in the n- dimensional real space R n to k clusters Formulated as determining k centers in R n, the sum of distances of each point to the nearest center is minimized
9
Problem - Robust Representation Modeling a system of relations in a manner that preserves the validity of the representation when the data on which the model is based changes Use a sufficiently small error זּ is purposely tolerated
10
Feature Selection Use the simplest model to describe the essence of a phenomenon Binary classification problem: –discriminating between two given point sets A and B in the n-dimensional real space R n by using as few of the n- dimensions of the space as possible
11
Binary classification W P
12
the following are some defined: A B Feature Selection
13
Successive Linearization Algorithm w vector is result
14
Experimentation 32-feature Wisconsin Prognostic Breast Cancer(WPBC) N=32, m = 28, k = 118, r = 0.05, 4 features, increasing tenfold cross-validation correctness by 35.4%
15
Clustering Determining k cluster centers, the sum of the 1-norm distances of each point in a given database to nearest cluster center is minimized Minimizing product of two linear functions on a set defined by linear inequalities
16
K-Median Algorithm Need to solve
17
Experimentation used as a KDD tool to mine WPBC to discover medical knowledge key observation is curves are well separated
18
Experimentation
19
Robust Representation model remains valid under a class of data perturbation Use זּ -tolerance zone wherein errors are disregarded Better generalization results than conventional zero-tolerance
20
Robust Representation A is a m*n matrix, a is a m*1 vector x is a vector be “ learned ” find minimize of Ax - a
21
Robust Representation = x Aa זּ זּ -tolerate = x Aa
22
Conclusion Mathematical programming codes are reliable and robust codes Problems solved demonstrate mathematical programming as versatile and effective tool for solving important problems in data mining and knowledge discovery in databases
23
Opinion Mathematical describe can explain about complex problems and convince others, but … you must be understand it first
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.