Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 6: Clustering Methods Prepared by: Mahmoud Rafeek Al-Farra 2013 www.cst.ps/staff/mfarra
Course’s Out Lines Introduction Data Preparation and Preprocessing Data Representation Classification Methods Evaluation Clustering Methods Mid Exam Association Rules Knowledge Representation Special Case study : Document clustering Discussion of Case studies by students
Out Lines Definition of Clustering Why clustering? Where to use clustering? Next: Types of Data in Cluster Analysis Next: A Categorization of Major Clustering Methods
Definition of Clustering Clustering can be considered the most important unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
Definition of Clustering Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping a set of data objects into clusters Clustering is unsupervised classification: no predefined classes
Learning
Why clustering? Simplifications Pattern detection Useful in data concept construction Unsupervised learning process
Where to use clustering? Data mining Information retrieval text mining Web analysis marketing medical diagnostic
Which method should I use? Type of attributes in data Scalability to larger dataset Ability to work with irregular data Time cost complexity Data order dependency Result presentation
Thanks