Recap. Be cautious.. Data may not be in one blob, need to separate data into groups Clustering.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Pattern Recognition and Machine Learning
Clustering Beyond K-means
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Computer vision: models, learning and inference Chapter 8 Regression.
Machine Learning and Data Mining Clustering
Introduction to Bioinformatics
Clustering and Dimensionality Reduction Brendan and Yifang April
Clustering II.
x – independent variable (input)
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning and Data Mining
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
What is Cluster Analysis?
What is Cluster Analysis?
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Birch: An efficient data clustering method for very large databases
Clustering Unsupervised learning Generating “classes”
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
The problem of overfitting
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Machine Learning Lecture 9: Clustering
Probabilistic Models for Linear Regression
Clustering.
Linear regression Fitting a straight line to observations.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning and Data Mining Clustering
Presentation transcript:

Recap

Be cautious.. Data may not be in one blob, need to separate data into groups Clustering

CS 498 Probability & Statistics Clustering methods Zicheng Liao

What is clustering? “Grouping” –A fundamental part in signal processing “Unsupervised classification” Assign the same label to data points that are close to each other Why?

We live in a universe full of clusters

Two (types of) clustering methods Agglomerative/Divisive clustering K-means

Agglomerative/Divisive clustering Agglomerative clustering (Bottom up) Hierarchical cluster tree Divisive clustering (Top down)

Algorithm

Agglomerative clustering: an example “merge clusters bottom up to form a hierarchical cluster tree” Animation from Georg Berber

Dendrogram >> X = rand(6, 2); %create 6 points on a plane >> Z = linkage(X);%Z encodes a tree of hierarchical clusters >> dendrogram(Z);%visualize Z as a dendrograph >> X = rand(6, 2); %create 6 points on a plane >> Z = linkage(X);%Z encodes a tree of hierarchical clusters >> dendrogram(Z);%visualize Z as a dendrograph Distance

Distance measure

Inter-cluster distance Treat each data point as a single cluster Only need to define inter-cluster distance –Distance between one set of points and another set of points 3 popular inter-cluster distances –Single-link –Complete-link –Averaged-link

Single-link Minimum of all pairwise distances between points from two clusters Tend to produce long, loose clusters

Complete-link Maximum of all pairwise distances between points from two clusters Tend to produce tight clusters

Averaged-link Average of all pairwise distances between points from two clusters

How many clusters are there? Intrinsically hard to know The dendrogram gives insights to it Choose a threshold to split the dendrogram into clusters Distance

An example do_agglomerative.m

Divisive clustering “recursively split a cluster into smaller clusters” It’s hard to choose where to split: combinatorial problem Can be easier when data has a special structure (pixel grid)

K-means Partition data into clusters such that: –Clusters are tight (distance to cluster center is small) –Every data point is closer to its own cluster center than to all other cluster centers (Voronoi diagram) [figures excerpted from Wikipedia]

Formulation Cluster center

K-means algorithm

Illustration Randomly initialize 3 cluster centers (circles) Assign each point to the closest cluster center Update cluster center Re-iterate step2 [figures excerpted from Wikipedia]

Example do_Kmeans.m (show step-by-step updates and effects of cluster number)

Discussion K = 2? K = 3? K = 5?

Discussion Converge to local minimum => counterintuitive clustering [figures excerpted from Wikipedia]

Discussion Favors spherical clusters; Poor results for long/loose/stretched clusters Input data(color indicates true labels)K-means results

Discussion Cost is guaranteed to decrease in every step –Assign a point to the closest cluster center minimizes the cost for current cluster center configuration –Choose the mean of each cluster as new cluster center minimizes the squared distance for current clustering configuration Finish in polynomial time

Summary Clustering as grouping “similar” data together A world full of clusters/patterns Two algorithms –Agglomerative/divisive clustering: hierarchical clustering tree –K-means: vector quantization

CS 498 Probability & Statistics Regression Zicheng Liao

Example-I Predict stock price t+1time Stock price

Example-II Fill in missing pixels in an image: inpainting

Example-III Discover relationship in data Amount of hormones by devices from 3 production lots Time in service for devices from 3 production lots

Example-III Discovery relationship in data

Linear regression

Parameter estimation MLE of linear model with Gaussian noise [Least squares, Carl F. Gauss, 1809]  Likelihood function

Parameter estimation Closed form solution   Normal equation Cost function (expensive to compute the matrix inverse for high dimension)

Gradient descent deo=02.5-LinearRegressionI-GradientDescentForLinearRegression&speed=100 (Andrew Ng) deo=02.5-LinearRegressionI-GradientDescentForLinearRegression&speed=100 (Guarantees to reach global minimum in finite steps)

Example do_regression.m

Interpreting a regression

Zero mean residual Zero correlation

Interpreting the residual

How good is a fit?

R-squared measure –The percentage of variance explained by regression –Used in hypothesis test for model selection

Regularized linear regression Cost Closed-form solution Gradient descent

Why regularization? Handle small eigenvalues –Avoid dividing by small values by adding the regularizer

Why regularization? Avoid over-fitting: –Over fitting –Small parameters  simpler model  less prone to over-fitting Over fit: hard to generalize to new data Smaller parameters make a simpler (better) model

L1 regularization (Lasso)

How does it work?

Summary