Download presentation
Presentation is loading. Please wait.
Published byIsabella Lester Modified over 9 years ago
1
Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov
2
Clustering Goals –To detect the underlying structure in data –To reduce data set capacity –To extract unique objects Usage –Data mining –Machine learning –Financial mathematics –Optimization –Statistics –Pattern recognition –Control strategies development SYRCoSE’09
3
Clustering Problem Clustering and Classification SYRCoSE’09
4
Variety of Clustering Algorithms Hierarchical –Aglomerative –Partitioning Iterative –Hard (K-means, SVM, SPSA) –Fuzzy (FCM) Important parameters -Distance norm -Number of clusters -Initial values of cluster centers SYRCoSE’09
5
Cluster Stability Algorithms Indexes Stability (similarity, merit) functions Probabilistic measures assessing the likelihood of a decision Density estimation approaches SYRCoSE’09
6
Stochastic Approximation Recursive stochastic approximation FDSA SPSA SYRCoSE’09
8
Effectiveness of SPSA SYRCoSE’09
9
Finding the number of clusters in data set Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions Select a transformation power, Y Calculate the “jumps” in transformed distortion Estimate the number of clusters in the data set by SYRCoSE’09
10
Structure of data set detection SYRCoSE’09
11
Examples Iris (3 clusters, 4 features, 150 instances) Wine (3 clusters, 13 features, 178 instances) Breast Cancer (2 clusters, 32 features, 569 instances) Image Segmentation (7 clusters, 19 features, 2310 instances) SYRCoSE’09
12
Software Tools for Clustering Analysis Research –COMPACT –DCPR (Data Clustering & Pattern Recognition) –FCDA (Fuzzy Clustering and Data Analysis Toolbox) –ClusterPack Matlab Toolbox –The Curve Clustering Toolbox –SOM (Self-Organizing Map) –Spectral Clustering Toolbox –Yashil's FCM Clustering License software –SPSS –STATISTICA Characteristics –Visualization –Efectiveness analysis with patterns –Tools to check performance Shortcomings –Limited number of data sets and algorithms –No possibilities to load own algorithm –No on-line services –MATLAB SYRCoSE’09
13
Clustering Algorithms Meta Applier SYRCoSE’09
14
Clustering Algorithms Meta Applier SYRCoSE’09
15
CAMA. Kernel SYRCoSE’09
16
CAMA. Kernel SYRCoSE’09
17
CAMA Toolbox http://ancient.punklan.net:8084/CAMA2/index.jsp SYRCoSE’09
18
CAMA Toolbox SYRCoSE’09
19
CAMA Toolbox SYRCoSE’09
20
Thank you! SYRCoSE’09
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.