Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han.

Slides:



Advertisements
Similar presentations
Liang Shan Clustering Techniques and Applications to Image Segmentation.
Advertisements

(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
An Introduction of Support Vector Machine
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Clustering data in an uncertain environment using an artificial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Local or Global Minima: Flexible Dual-Front Active Contours Hua Li Anthony Yezzi.
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Locally Constraint Support Vector Clustering
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Clustering Algorithm presentation by : Jialiang Wu.
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Efficient Model Selection for Support Vector Machines
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
An Introduction to Support Vector Machines (M. Law)
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter.
Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A fast nearest neighbor classifier based on self-organizing incremental neural network (SOINN) Neuron.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Eghbal G. Mansoori 2011,IEEE FRBC: A Fuzzy Rule-Based Clustering Algorithm.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.
CURE: An Efficient Clustering Algorithm for Large Databases Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Presentation by: Vuk Malbasa For CIS664.
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
SUPPORT VECTOR MACHINES
Support Vector Machine
Computational Intelligence: Methods and Applications
Support Vector Machines
An Introduction to Support Vector Machines
Fuzzy Support Vector Machines
Support Vector Machines
Minimal Kernel Classifiers
Presentation transcript:

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC SVC is from SVMs SVMs is supervised clustering technique  Fast convergence  Good generalization performance  Robustness for noise SVC is unsupervised approach 1. Data points map to HD feature space using a Gaussian kernel. 2. Look for smallest sphere enclose data. 3. Map sphere back to data space to form set of contours. 4. Contours are treated as the cluster boundaries. 3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: 4 a

Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis 5

Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis Karush-Kuhn-Tucker complementarity: 6 Bound SV; Outlier

Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC -Sphere Analysis To find the minimal enclose sphere with soft margin: C : existence of outliers allowed 7 Wolfe dual optimization problem a

Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC -Sphere Analysis The distance between x and a: q : |clusters| & the smoothness/tightness of the cluster boundaries. 8 Mercer kernel Kernel: Gaussian a Gaussian function:

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 9 The traditional cluster validity measure such as Partition coefficient (PC) Separation measures Base on fuzzy membership grades and cancroids of clusters. SVC algorithm generates boundaries to cluster are arbitrary no fuzzy membership grade. Which clustering is better?

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives Optimal cluster number  Cluster validity measure  Outlier-detection algorithm  Cluster merging mechanism 10 Outlier-detection Cluster merging

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Overview 11 Cluster Validity Measure for the SVC Algorithm Outlier detection Cluster-Merging Mechanism C=1, no outliers are allowed

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster Validity Measure for the SVC Algorithm 12 Compactness (intra-cluster) Separation (inter-cluster) Cluster Validity measure (ratio) for SVC min

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection 13 In SVC, outliers (BSV) are the data in boundary regions. q = 1 q = 4 q = 2 q = 1.8 C=0.02 singleton

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection C  If C=1, result clusters are smooth, but not desirable BSV (outlier)  All outlier are SVs  Some outlier is far away from other data in clusters SVs  More SVs make too tight to fit the data q  Increase q makes clusters compact Singleton  Important criterion 14 q = 1 q = 4 q = 2 q = 1.8 C=0.02 singleton

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection Outlier Existence Criterion Desirable Cluster Criterion  Singleton clusters can’t exceed threshold  Datapoint’s % of SVs can’t greater than threshold, suggested 50%  Recursively adjust C to satisfy this two criterion 15 Suggested γ = 2

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster-Merging Mechanism Similarity: overlapping degree 16 Gaussian function: P C = 0 P A > 0

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster-Merging Mechanism 1) Agglomerative outliers/noises: identification For all ci 0, merge cluster i and cluster j. Otherwise, discard cluster i. Set K ← K − 1.} 2) Compatible clusters: Combination (similarity) Sort the size of the remaining K clusters in ascending order such that cK = max(ci), ∀ i ∈ K. For each i, i = 1,..., K, perform {Set x ← mi. For each j, j = i + 1,..., K, perform pj(x) Find l = arg max i+1≤j≤K pj(x), where arg maxa denotes the value of a at which the expression that follows is maximized. If pl > 0, merge cluster i with cluster l. Set K ← K − 1 and repeat 2) until no further combination.} 17

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Summary 1) Initialize a small value of q, and set C = 1 and γ = 2 2) Perform SVC algorithm, get |clusters|. 3) If |clusters| < 2, increase q, go to 2). 4) If the outlier-detection criterion holds, decrease C, fix q, and go to 2). Otherwise, go to 5). 5) If |SVs|< 50% of the datapoints, go to 6). Otherwise, decrease C, and go to 2). 6) Compute validity measure index (V (m)). 7) If |clusters| > √N, increase q, and go to 2). Otherwise, stop the SVC. 8) Use cluster-merging mechanism to identify an ideal |clusters|. Output |clusters|. 18

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples Bensaid Data Set 19

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples Five-Cluster Data Set & Five-Cluster Data Set With Noise 20

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples 21 Five-Cluster Data Set With Noise, after cluster-merge Merge

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples 22 Crescent Data Set

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - IRIS Data Set 23 Misclassificatoin

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions This paper integrated for SVC:  cluster validity measure  Outlier detection  Merging mechanism Automatically determine suitable values for  Kernel parameter  Soft-margin constant Clustering with  Compact and smooth arbitrary-shaped cluster contours  Increasing robustness to outliers and noises 24

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage  Provide a cluster validity index for a cluster method Drawback  … Application  SVC 25