Hierarchical and K-means Clustering

Hierarchical and K-means Clustering

“closest” Build a tree

End up with a large binary tree (dendogram)

Hierarchical Clustering
How to measure “closeness”? Each cluster is a set of points so not clear how to find the distance

Measuring Closeness Complete-Linkage – use the greatest distance from any member of one cluster to any member of the other cluster Compute for all pairs and choose the pair with the maximum distance

Complete Linkage Similarity of two clusters based on their least similar members

Complete Linkage - Crowding

How it should be clustered

p p p3 p4 p5

p p p3 p4 p5 Already clustered Result?

Max distances: p1 p2 3 p1 p5 7 p2 p5 4 p p p3 p4 p5 Already clustered

Max distances: p1 p2 3 p1 p5 7 p2 p5 4 p p p3 p4 p5 min

Measuring Closeness Average-Linkage – use the average distance from any member of one cluster to any member of the other cluster Compute for all pairs and use the average distance Number of points in G, H

Average Linkage (Not all lines shown) Similarity of two clusters based on average similarity of members

Merge the two closest clusters
New distance?

dist(X to MI/TO )= min {dist(X to MI), dist(X to TO)}

Next merge?

Recompute distances Next merge?

Next merge?

One more merge

Final Dendogram (Tree/Hierarchy)
How to split into clusters? ex. Want 3 clusters

How to split into clusters? ex. Want 3 clusters For k clusters, cut the k-1 longest links

Intuitively what does this mean?

These are the 3 groups of cities that are closest to each other.

Hierarchical Clustering Algorithm
Compute distance matrix Let each example be its own cluster while (# clusters > 1): Merge the two closest clusters Update distance matrix Run Time? (Simple implementation) Assume n data points, d dimensions How many iterations? First iteration: compute distance between all pairs: O(n2d) All other iterations: compute distance between most recently created cluster to all other clusters: O(nd) Total: O(n2d) k=2 k=4 k=3

Hierarchical Clustering Algorithm (for any k)
Compute distance matrix Let each example be its own cluster while (# clusters > 1): Merge the two closest clusters Update distance matrix Run Time? (Simple implementation) Assume n data points, d dimensions Using min-heap: deletions take O(log n2) How many iterations? First iteration: compute distance between all pairs: O(n2d) All other iterations: compute distance between most recently created cluster to all other clusters: O(nd) Total: O(n2d)  Slow (does not scale well)

Hierarchical Clustering Algorithm
Compute distance matrix Let each example be its own cluster while (# clusters > 1): Merge the two closest clusters Update distance matrix Run Time? (Simple implementation) Assume n data points, d dimensions Using min-heap: deletions take O(log n2) n – 1 iterations First iteration: compute distance between all pairs: O(n2d) All other iterations: compute distance between most recently created cluster to all other clusters: O(nd) Total: O(n2d)  Slow (does not scale well)

Hierarchical Clustering Pros/Cons
Simple Do not need to know k ahead of time Cons: Sensitive to noise Slow – does not scale well Does not “learn” – can’t undo any steps

K-Means Clustering Input: Data points k (number of desired clusters)

K-Means Clustering Choose k random points (“means”)

K-Means Clustering Assign each data point to closest mean.

K-Means Clustering Assign each data point to closest mean.
How can we adjust the means to be closer to their data points?

K-Means Clustering Adjust each mean to be the center of its
data points. Reassign data points to closest means.

K-Means Clustering Repeat until means no longer move.
(Convergence guaranteed)

K-Means Algorithm Assume n data points, k clusters desired
Choose k random points as means Repeat until means no longer move: Assign each data point to closest mean Move each mean to center of cluster of points that are assigned to it Run Time? (Simple implementation) Assume n data points, d dimensions, i iterations Total: O(i2ndk)

K-Means Algorithm Assume n data points, k clusters desired
Choose k random points as means Repeat until means no longer move: Assign each data point to closest mean Move each mean to center of cluster of points that are assigned to it Run Time? (Simple implementation) Assume n data points, d dimensions, i iterations Total: O(i2ndk) O(ndk) O(knd)

K-Means in action (k=3)

K-Means in action (k=3) All means shifted down

K-Means in action (k=3) Red points are converging

K-Means in action (k=3)

K-Means Pros/Cons Pros: Cons: Efficient
Some knowledge of k is required

Knowledge in AI

Agent sensors actuators environment agent ? ?

Algorithms: Search, CSP, Probabilistic Inference, Learning
Agent sensors actuators environment agent ? Algorithms: Search, CSP, Probabilistic Inference, Learning

Algorithms: Search, CSP, Probabilistic Inference, Learning +
Agent sensors actuators environment agent ? Algorithms: Search, CSP, Probabilistic Inference, Learning + Knowledge Base

Recall 8-Puzzle 1 2 3 4 5 6 8 7 Turns out, some puzzles are not solvable n x n board Inversion: number of pairs of tiles i, j such that i < j but i appears after j in row-major order If n odd and number of inversions is odd, then unsolvable

Recall 8-Puzzle 1 2 3 4 5 6 8 7 Turns out, some puzzles are not solvable n x n board Inversion: number of pairs of tiles i, j such that i < j but i appears after j in row-major order If n odd and number of inversions is odd, then unsolvable Agent should have this information

Knowledge Base Set of sentences/facts (in logical language)
“n is odd” “1 is odd” “Number of inversions is odd” Agent should be able to expand the knowledge base through inference “Puzzle is unsolvable”

Hunt the Wumpus Invented in the early 70s
originally command-line (think black screen with greenish text)

Wumpus World

Wumpus World Environment: Actuators: 4x4 grid of rooms
Agent starts in [1,1] Gold in a random room Wumpus in a different random room Bottomless pits in some rooms Wumpus can eat agent if in same room Agent can shoot Wumpus with arrow Actuators: Move left Move right Move up Move down Grab gold Shoot

Wumpus World Continued
Sensors: Stench (adjacent square contains Wumpus) Breeze (adjacent square contains pit) Glitter (this square contains gold) Scream (Wumpus killed) Performance measures: Gold: +1000 Death: (falling into pit or eaten by Wumpus) -1 per step -10 for using the arrow

Wumpus world environment
Fully Observable? No…unaware of environment until we explore Deterministic (state of environment determined by current state and action)? Yes Static? Adversarial?

Exploring a wumpus world
A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square Language to represent knowledge

Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold OK = Safe Square P = Pit S = Stench W = Wumpus

Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square What can we infer?

Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square

Breeze = yes, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square B What can we infer?

Breeze = yes, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B P?

Breeze = no, Glitter = no, Pit = no, Stench = yes, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B P? S What can we infer?

Breeze = no, Glitter = no, Pit = no, Stench = yes, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B P? OK S W?

Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B OK S W?

Wumpus with propositional logic
Using logic statements, we can determine all of the “safe” squares How to implement?

Propositional logic Syntax: Defines what makes a valid statement:
Statements are constructed from propositions A proposition can be either true or false Proposition made up of symbols and connectives Semantics: Rules for determining the truth of a statement <Later>

Propositional Logic - Syntax
Symbols: represents a proposition that can be true or false ex. Breeze in [2, 1]  B2,1 Pit in [2, 2]  P2,2 ex. n is Odd  n_odd Connectives: proposition operators Negation: not, , ~ Conjunction: and,  Disjunction: or,  Implication: implies, => Biconditional: iff, <=>

Propositional logic Syntax: Defines what makes a valid statement:
Statements are constructed from propositions A proposition can be either true or false Proposition made up of symbols and connectives Semantics: Rules for determining the truth of a statement Truth table Rules of logic

Propositional Logic Semantics
Some Rules of Logic: Modus Ponens: P => Q, P: can derive Q deMorgan’s: (AB): can derive A  B

Inference with Propositional Logic
Suppose we want to infer something: Wumpus in (2, 2) Goal: Given initial knowledge base, use semantics to make inferences to expand knowledge and ultimately prove a proposition Look familiar?

Inference with propositional logic
View it as a search problem: starting state: Initial Knowledge Base (KB) actions: all ways of deriving new propositions from the current KB result: add the new proposition to the KB goal: end up with the proposition we want to prove

Hierarchical and K-means Clustering

Similar presentations

Presentation on theme: "Hierarchical and K-means Clustering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hierarchical and K-means Clustering

Similar presentations

Presentation on theme: "Hierarchical and K-means Clustering"— Presentation transcript:

Similar presentations

About project

Feedback