Clustering Sequential Data: Research Paper Review Presented by Glynis Hawley April 28, 2003 On the Optimal Clustering of Sequential Data by Cheng-Ru Lin.

Slides:



Advertisements
Similar presentations
Algorithms and applications
Advertisements

K-means Clustering Ke Chen.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
PARTITIONAL CLUSTERING
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
CS 376b Introduction to Computer Vision 04 / 08 / 2008 Instructor: Michael Eckmann.
Buron Florian CS223b Target finding Visibility-Based Pursuit-Evasion in a Polygonal Environment. L.J. Guibas, J.C. Latombe, S.M. LaValle, D. Lin, and R.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering over Multiple Evolving Streams by Events and Correlations Mi-Yen Yeh, Bi-Ru Dai, Ming-Syan Chen Electrical Engineering, National Taiwan University.
Reduced Support Vector Machine
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Content-Based Image Retrieval using the EMD algorithm Igal Ioffe George Leifman Supervisor: Doron Shaked Winter-Spring 2000 Technion - Israel Institute.
Evaluating Performance for Data Mining Techniques
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
CSCI 256 Data Structures and Algorithm Analysis Lecture 14 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Mining High Utility Itemset in Big Data
Collection Depots Facility Location Problems in Trees R. Benkoczi, B. Bhattacharya, A. Tamir 陳冠伶‧王湘叡‧李佳霖‧張經略 Jun 12, 2007.
ROCK: A Robust Clustering Algorithm for Categorical Attributes Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Data Engineering, Proceedings.,
1 Topics Recursion sections 8.1 – Recursion A recursively defined sequence –First, certain initial values are specified –Later terms of the sequence.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
DATA CLUSTERING WITH KERNAL K-MEANS++ PROJECT OBJECTIVES o PROJECT GOAL  Experimentally demonstrate the application of Kernel K-Means to non-linearly.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 18.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Data Mining – Algorithms: K Means Clustering
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Basic Cluster Analysis
Semi-Supervised Clustering
Topic:- ALGORITHM Incharge Faculty – Lokesh Sir.
A New Support Vector Finder Method Based on Triangular Calculations
Metric Learning for Clustering
E190Q – Project Introduction Autonomous Robot Navigation
AIM: Clustering the Data together
Dr. Unnikrishnan P.C. Professor, EEE
Multivariate Statistical Methods
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Data Mining – Chapter 4 Cluster Analysis Part 2
Text Categorization Berlin Chen 2003 Reference:
SEEM4630 Tutorial 3 – Clustering.
Data Mining CSCI 307, Spring 2019 Lecture 24
Introduction to Machine learning
Presentation transcript:

Clustering Sequential Data: Research Paper Review Presented by Glynis Hawley April 28, 2003 On the Optimal Clustering of Sequential Data by Cheng-Ru Lin and Ming-Syan Chen, Electrical Engineering Department National Taiwan University, Taipei, Taiwan Second SIAM International Conference on Data Mining April 11-13,

Agenda n Introduction: What is sequential clustering? n Problem definition for algorithm design n Optimal Algorithm: SC OPT n Greedy Algorithm: SC GD n Conclusion

Sequential Clustering Problem n Attributes and sequence of objects are both important. n Objects within a cluster form a continuous region. n An object within one cluster may be closer to the centroid of a different cluster than it is to its own centroid.

Conventional Clustering vs. Sequential Clustering

Application Areas n Analysis of motion patterns of objects. –Cellular phones. n Analysis of status logs of running machines.

Problem Definition n Partitioning problem –n sequential objects into k clusters n Dissimilarity measurement –Squared Euclidean distance n Cluster quality –Cost measurement: penalizes clusters for amount of dissimilarity of objects n Best solution minimizes the sum of the costs of all clusters

Cost Definition n Cost of a cluster: summation over all m objects of the squared Euclidean distance of the object from the cluster centroid.

Sequential Clustering Algorithms n Optimal Sequential Clustering Algorithm –SC OPT n Greedy Sequential Clustering Algorithm –SC GD

Algorithm SC OPT n Determines optimal k-partition of a set of sequential objects. n Uses the property of optimal substructure. –Systematically solves all possible sub- problems. –Stores results to be used in later steps.

Complexity of Algorithm SC OPT Time: O (kn 2 ) Space: O (kn)

n Initially, arbitrarily insert separators to divide the n objects into k clusters | | Algorithm SC GD

n Reposition the separators by “moves” and “jumps” to reduce the cost of the clusters n The best possible move or jump is determined by calculating the cost reductions of all possible moves and jumps. Algorithm SC GD (Cont.) move jump move jump

Algorithm SC GD (Cont.) n Continue repositioning separators until no further cost reductions are possible. n Complexity –Time: O (nl / k + n), linear –Space: O (k) Quality of clusters increases with n and with average cluster size.

Conclusion n Sequential clustering requires that the sequence of data points be considered as well as the similarity of attributes. n Algorithms: –SC OPT and SC GD –SC GD approaches SC OPT in terms of quality of clusters when average cluster sizes are large.