A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Application a hybrid controller to a mobile robot J.-S Chiou, K. -Y. Wang,Simulation Modelling Pratice and Theory Vol. 16 pp (2008) Professor:
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
QR Code Recognition Based On Image Processing
Clustering Basic Concepts and Algorithms
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Pattern Recognition and Machine Learning
Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Introduction to Machine Learning Approach Lecture 5.
FLANN Fast Library for Approximate Nearest Neighbors
Radial Basis Function Networks
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Mathematical Programming in Support Vector Machines
Collaborative Filtering Matrix Factorization Approach
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
1 PSO-based Motion Fuzzy Controller Design for Mobile Robots Master : Juing-Shian Chiou Student : Yu-Chia Hu( 胡育嘉 ) PPT : 100% 製作 International Journal.
MultiSimplex and experimental design as chemometric tools to optimize a SPE-HPLC-UV method for the determination of eprosartan in human plasma samples.
Efficient Model Selection for Support Vector Machines
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
DDSS2006 A Comparison Study about the Allocation Problem of Undesirable Facilities Based on Residential Awareness – A Case Study on Waste Disposal Facility.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Biological data mining by Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon
Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.
Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory By Mahmoud Mohamed Al-Bouraie Yasser Fouad Mahmoud Hassan Wesam Fathy Jasser.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
A New Method to Forecast Enrollments Using Fuzzy Time Series and Clustering Techniques Kurniawan Tanuwijaya 1 and Shyi-Ming Chen 1, 2 1 Department of Computer.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Application of the GA-PSO with the Fuzzy controller to the robot soccer Department of Electrical Engineering, Southern Taiwan University, Tainan, R.O.C.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Medical Diagnosis via Genetic Programming
Instance Based Learning (Adapted from various sources)
A Modified Naïve Possibilistic Classifier for Numerical Data
Collaborative Filtering Matrix Factorization Approach
Data Transformations targeted at minimizing experimental variance
Cluster Analysis.
University of Wisconsin - Madison
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO Sadaaki

Outline Background and goal of our study The concept of tolerance New clustering algorithms for data with tolerance Numerical examples Conclusion and future works

Introduction Clustering is one of the unsupervised automatic classification. Classification methods classify a set of data into several groups. Many clustering algorithms have been proposed and fuzzy c-means (FCM) is the most typical method of fuzzy clustering. In this presentation, I would like to talk about one way to handle the uncertainty with data and present some new clustering algorithms which are based on FCM.

Uncertainty In clustering, each data on a real space is regarded as one point in a pattern space and classified. However, the data with uncertainty should be often represented not by a point but by a set.

Three examples of data with uncertainty Example 1: Data has errors When a spring scale of which the measurement accuracy is plus/minus 5g shows 450g, an actual value is in the interval from 445g to 455g.

Three examples of data with uncertainty Example 2: Data has ranges An apple has not only one color but also a lot of colors so that colors of the apple could not be represented as one point on color space.

Three examples of data with uncertainty Example 3: Missing values exist in data In case of a social investigation, if there are unanswered items in the questionnaire, the items are handled as missing values.

Background In the past, these uncertainties of data have been represented as interval data. Some algorithms for interval data have been proposed (e.g., Takata and Miyamoto[1]). In those algorithms, dissimilarity is defined between interval data by using particular measures, e.g., nearest-neighbor, furthest- neighbor or Hausdorff distance.

Background The methodology of interval has the following disadvantages: We have to introduce a particular measure. But how do we select the adequate measure? Actually, only boundary of interval data is handled by these measures.

Goal of our study From a view point of strict optimization problem, we handle uncertainty as tolerance and consider the new type of optimization problem for the data with tolerance. Moreover, we construct new clustering algorithms in the optimization framework. In these algorithms, dissimilarity is defined between target data by using L 1 or squared L 2 norm.

Features of proposed algorithms The methodology of tolerance has the following advantages: Particular distances between intervals don’t have to be defined. Not only the boundary but also all region in tolerance is handled. Our discussion becomes mathematically simpler than using interval distances.

The concept of tolerance

We define as the -th data on a dimensional vector space, and as the tolerance vector of. The constraint condition is shown by following expression.

An example of tolerance vector on R : Tolerance vector It is calculated in algorithm. : Tolerance It is decided before calculate.

Comparison of Tolerance and Other Measures Nearest-neighbor method Furthest-neighbor method Proposed method

Proposed algorithms

Conventional fuzzy c-means sFCM: standard fuzzy c-means ….. Number of clusters ….. Number of data ….. Number of dimensions of the pattern space ….. Membership grade ….. Data ….. Cluster center

Conventional fuzzy c-means AlgorithmObjective function sFCM-L 1 sFCM-L 2

Optimization problem: sFCM-L 2 Objective function: Membership grade U : Cluster center V :

Algorithm: sFCM-L 2 Step1 Set the initial value of V. Step2 Update U by. Step3 Update V by. Step4 If is convergent, stop. Otherwise, go back to Step2.

Proposed algorithms AlgorithmObjective function sFCMT-L 1 sFCMT-L 2 The constraint condition:

An example of tolerance vector on R : Tolerance vector It is calculated in algorithm. : Tolerance It is decided before calculate.

Optimization problem: sFCMT-L 2 Objective function: Membership grade U : Cluster center V :

Optimization problem: sFCMT-L 2 Tolerance vector E :

Algorithm: sFCMT-L 2 Step1 Set the initial values of V and E. Step2 Update U by. Step3 Update V by. Step4 Update E by. Step5 If is convergent, stop. Otherwise, go back to Step2.

Outline of proposed algorithms: Step1 Set the initial values of V and E. Step2 Update U by Eq.A. Step3 Update V by Eq.B. Step4 Update E by Eq.C. Step5 If is convergent, stop. Otherwise, go back to Step2.

Proposed algorithms AlgorithmEq.AEq.BEq.C sFCMT-L 1 -1(7)(11) sFCMT-L 1 -2Same as the above (10)(14) eFCMT-L 2 (19)(20)(21) The numbers in the above table correspond to ones of equations in our paper in the proceeding respectively.

Numerical examples

Test data: sFCMT-L 2

Diagnosis of heart disease data Heart disease database has five attributes. The result of diagnosis, presence or absence is known. The number of data is 866 and 560 data contains missing values in some attributes. AttributeNumber of missing values Resting blood pressure5 Maximum heart rate achieved1 ST depression induced by exercise relative to rest8 The slope of the peak exercise ST segment255 Number of major vessels colored by fluoroscopy557

Diagnosis of heart disease data In all algorithms, the convergence condition is where is the previous optimal solution. In addition, in sFCM. To handle missing values as tolerance, we define it as follows.

Diagnosis of heart disease data We try to classify all 866 data with missing values by using proposed algorithms, and only 306 data without missing values by using conventional algorithms. In each algorithm, we give initial cluster centers at random and classify the data set into two clusters. We run this trial 1000 times and show the average of ratio of correctly classified results.

Diagnosis of heart disease data This tables shows the results of classifying only 306 data without missing values. This table shows the results of classifying all 866 data. AlgorithmAverage ratio sFCM-L sFCM-L AlgorithmAverage ratio sFCMT-L sFCMT-L

Diagnosis of heart disease data This table shows the results of classifying all 866 data by using the proposed algorithms in our research. This table shows the results of classifying all 866 data by using an algorithm which handles missing value as interval data and uses nearest- neighbor distance to calculate dissimilarity. AlgorithmAverage ratio sFCMT-L sFCMT-L AlgorithmAverage ratio sFCMT-L sFCMT-L

Conclusion and future works Conclusion ▫We considered the optimization problems for data with tolerance and solved the optimal solutions. Using the results, we have constructed new six algorithms. ▫We have shown the effectiveness of the proposed algorithms through some numerical examples.

Conclusion and future works Future works ▫We will calculate other data sets with tolerance. ▫We will apply the concept of tolerance to regression analysis, support vector machine and so on.

Thank you for your attention.

References 1.Osamu Takata, Sadaaki Miyamoto : “Fuzzy clustering of Data with Interval Uncertainties”, Journal of Japan Society for Fuzzy Theory and Systems, Vol.12, No.5, pp (2000) (in Japanese)