MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Slides:



Advertisements
Similar presentations
Distributed Scheduling in Supply Chain Management Emrah Zarifoğlu
Advertisements

On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra.
Cluster Analysis: Basic Concepts and Algorithms
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
The loss function, the normal equation,
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
MassConf: Automatic Configuration Tuning By Leveraging User Community Information Computer Science Wei Zheng, Ricardo Bianchini, Thu Nguyen Rutgers University.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
What is Cluster Analysis?
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Recommender systems Ram Akella November 26 th 2008.
What is Cluster Analysis?
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
Jingchen Liu, Geongjun Xu and Zhiliang Ying (2012)
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Part I: Classification and Bayesian Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Efficient Model Selection for Support Vector Machines
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
K Nearest Neighborhood (KNNs)
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
The DM Process – MS’s view (DMX). The Basics  You select an algorithm, show the algorithm some examples called training example and, from these examples,
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Design Process 중앙대학교 전자전기공학부. Design for Electrical and Computer Engineers 2. Design Process  Engineering : Problem solving through specialized scientific.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.
Proximity and Hierarchy Principle of Design. Pecking order.
A Viable Implementation of a Comparison Algorithm for Regions of Interest John P. Heminghous Computer Science Clemson University
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
First topic: clustering and pattern recognition Marc Sobel.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
MS Sequence Clustering
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
CS 682, AI:Case-Based Reasoning, Prof. Cindy Marling1 Chapter 8: Organizational Structures and Retrieval Algorithms This chapter deals with how to find.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
1 Chapter 8: Introduction to Pattern Discovery 8.1 Introduction 8.2 Cluster Analysis 8.3 Market Basket Analysis (Self-Study)
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Clustering Algorithms Minimize distance But to Centers of Groups.
Items relating to each other should be grouped close together. When several items are in close proximity to each other, they become one visual unit rather.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.
Today Cluster Evaluation Internal External
Data Mining: Concepts and Techniques
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Improving searches through community clustering of information
A Genetic Algorithm Approach to K-Means Clustering
Advanced data mining with TagHelper and Weka
Challenges in Creating an Automated Protein Structure Metaserver
Data Mining K-means Algorithm
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Segmentation of Sea-bed Images.
Edinburgh Napier University
Presentation transcript:

MS Clustering Chapters15_to_17_Part5

What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.

We have being doing it  We have been grouping people, cars, etc.  We are just not very good when we have too many items to keep track  Experts can track five to six dimensions, we may have data set with many times of that  We can only see the obvious groups, most likely  It is difficult for us to see the hidden ones, or the combined ones

An Example  You can group your customers (for a bike store) into several groups based on Gender Income Age Etc  There may be other things, such as do they play game?

Principles of Clustering  Guessing and lying (MS) Setting clusters  Training with data  Calibrating your clusters  Training again  Repeating until converged or going nowhere  The clustering mythology is very sensitive to the starting points and can converge at local solutions that many not be optimal global solution

Soft and hard clustering  One case one cluster – hard  One case several clusters – soft

Scalable clustering  Ideally, the data point that will not change its cluster do not need to be considered  In MS’ implementation, it will read the first 50,000. If that don’t converge, we process the next 50K, rather than read in and process all 100K.

Few interesting parameters  Clustering_Method What method to use 1~4  Clustering_Count The number of clusters to find 0 makes the algorithms to guess a good number  Minimum_Support What case count can be considered as empty  Stopping_tolerance The number of cases switch clusters  Sample_size For scalable clustering  Cluster_Seed Where to put the clusters  Maximum_Input_attributes A number before attributed considered before automatic feature selection kicks in. Automatic feature selection, selects the most popular attributes  Maximum_states Possible values

Understanding The Results  Comprehending the results can be difficult because you have to look for many directions High-level overview Look into a cluster Determine how a cluster is different from a near by one

High-level overview  Cluster Profiles view -- too much info Getting some sense regarding who/what are in each cluster

High-level overview  Cluster Diagram view Get some sense the relationships among clusters

Look into a cluster  The Cluster characteristic view See the attributes that are going together Note that an attribute ranks high may be because it is ranked high on all the cluster. In that case, it is not that interesting.

Cluster characteristic view

Look outside a cluster  Discrimination and Complement Shows you what attributes are important