Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.
Pattern Recognition and Machine Learning
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
Locally Constraint Support Vector Clustering
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
Unconstrained Optimization Problem
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Presented by Arun Qamra
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Mathematical Programming in Support Vector Machines
Data Mining Techniques
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Support Vector Machine & Image Classification Applications
Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN.
2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Mining A Stream of Transactions for Customer Patterns Author: Diane Lambert Advisor: Dr. Hsu Graduate: Yan-cheng Lin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Nonlinear Knowledge in Kernel Approximation Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison.
Nonlinear Knowledge in Kernel Machines Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Data Mining and Mathematical Programming Workshop.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Data Mining, Neural Network and Genetic Programming
An Introduction to Support Vector Machines
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Nearest-Neighbor Classifiers
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Concave Minimization for Support Vector Machine Classifiers
Using Manifold Structure for Partially Labeled Classification
Topic 5: Cluster Analysis
Presentation transcript:

Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin

Abstract  Describe mathematical programming to feature selection, clustering and robust representation

Outline  Motivation  Objective  Problems  Feature Selection  Clustering  Robust Representation  Conclusion

Motivation  Mathematical programming has been applied to a great variety of theoretical  Problems can be formulated and effectively solved as mathematical programs

Objective  Describe three mathematical- programming-based developments relevant to data mining

Problems  Feature Selection  Clustering  Robust Representation

Problem - Feature Selection  Discriminating between two finite point sets in n-dimensional feature space and utilizes as few of the feature as possible  Formulated as mathematical program with a parametric objective function and linear constraints

Problem - Clustering  Assigning m points in the n- dimensional real space R n to k clusters  Formulated as determining k centers in R n, the sum of distances of each point to the nearest center is minimized

Problem - Robust Representation  Modeling a system of relations in a manner that preserves the validity of the representation when the data on which the model is based changes  Use a sufficiently small error זּ is purposely tolerated

Feature Selection  Use the simplest model to describe the essence of a phenomenon  Binary classification problem: –discriminating between two given point sets A and B in the n-dimensional real space R n by using as few of the n- dimensions of the space as possible

Binary classification W P

 the following are some defined:  A  B Feature Selection

Successive Linearization Algorithm  w vector is result

Experimentation  32-feature Wisconsin Prognostic Breast Cancer(WPBC)  N=32, m = 28, k = 118, r = 0.05, 4 features, increasing tenfold cross-validation correctness by 35.4%

Clustering  Determining k cluster centers, the sum of the 1-norm distances of each point in a given database to nearest cluster center is minimized  Minimizing product of two linear functions on a set defined by linear inequalities

K-Median Algorithm  Need to solve

Experimentation  used as a KDD tool to mine WPBC to discover medical knowledge  key observation is curves are well separated

Experimentation

Robust Representation  model remains valid under a class of data perturbation  Use זּ -tolerance zone wherein errors are disregarded  Better generalization results than conventional zero-tolerance

Robust Representation  A is a m*n matrix, a is a m*1 vector  x is a vector be “ learned ”  find minimize of Ax - a

Robust Representation = x Aa זּ זּ -tolerate = x Aa

Conclusion  Mathematical programming codes are reliable and robust codes  Problems solved demonstrate mathematical programming as versatile and effective tool for solving important problems in data mining and knowledge discovery in databases

Opinion  Mathematical describe can explain about complex problems and convince others, but … you must be understand it first