Hierarchical and K-means Clustering
“closest” Build a tree
End up with a large binary tree (dendogram)
Hierarchical Clustering How to measure “closeness”? Each cluster is a set of points so not clear how to find the distance
Measuring Closeness Complete-Linkage – use the greatest distance from any member of one cluster to any member of the other cluster Compute for all pairs and choose the pair with the maximum distance
Complete Linkage Similarity of two clusters based on their least similar members
Complete Linkage - Crowding
Complete Linkage - Crowding How it should be clustered
Complete Linkage - Crowding p1 p2 p3 p4 p5 1 4 5 6 7 8
Complete Linkage - Crowding p1 p2 p3 p4 p5 1 4 5 6 7 8 Already clustered Result?
Complete Linkage - Crowding Max distances: p1 p2 3 p1 p5 7 p2 p5 4 p1 p2 p3 p4 p5 1 4 5 6 7 8 Already clustered
Complete Linkage - Crowding Max distances: p1 p2 3 p1 p5 7 p2 p5 4 p1 p2 p3 p4 p5 min 1 4 5 6 7 8
Complete Linkage - Crowding Max distances: p1 p2 3 p1 p5 7 p2 p5 4 p1 p2 p3 p4 p5 min 1 4 5 6 7 8
Measuring Closeness Average-Linkage – use the average distance from any member of one cluster to any member of the other cluster Compute for all pairs and use the average distance Number of points in G, H
Average Linkage (Not all lines shown) Similarity of two clusters based on average similarity of members
Merge the two closest clusters New distance?
dist(X to MI/TO )= min {dist(X to MI), dist(X to TO)}
Next merge?
Recompute distances Next merge?
Next merge?
One more merge
Final Dendogram (Tree/Hierarchy) How to split into clusters? ex. Want 3 clusters
Final Dendogram (Tree/Hierarchy) How to split into clusters? ex. Want 3 clusters For k clusters, cut the k-1 longest links
Final Dendogram (Tree/Hierarchy) How to split into clusters? ex. Want 3 clusters For k clusters, cut the k-1 longest links
Final Dendogram (Tree/Hierarchy) How to split into clusters? ex. Want 3 clusters For k clusters, cut the k-1 longest links
Final Dendogram (Tree/Hierarchy) How to split into clusters? ex. Want 3 clusters For k clusters, cut the k-1 longest links
Final Dendogram (Tree/Hierarchy) Intuitively what does this mean?
Final Dendogram (Tree/Hierarchy) These are the 3 groups of cities that are closest to each other.
Hierarchical Clustering Algorithm Compute distance matrix Let each example be its own cluster while (# clusters > 1): Merge the two closest clusters Update distance matrix Run Time? (Simple implementation) Assume n data points, d dimensions How many iterations? First iteration: compute distance between all pairs: O(n2d) All other iterations: compute distance between most recently created cluster to all other clusters: O(nd) Total: O(n2d) k=2 k=4 k=3
Hierarchical Clustering Algorithm (for any k) Compute distance matrix Let each example be its own cluster while (# clusters > 1): Merge the two closest clusters Update distance matrix Run Time? (Simple implementation) Assume n data points, d dimensions Using min-heap: deletions take O(log n2) How many iterations? First iteration: compute distance between all pairs: O(n2d) All other iterations: compute distance between most recently created cluster to all other clusters: O(nd) Total: O(n2d) Slow (does not scale well)
Hierarchical Clustering Algorithm Compute distance matrix Let each example be its own cluster while (# clusters > 1): Merge the two closest clusters Update distance matrix Run Time? (Simple implementation) Assume n data points, d dimensions Using min-heap: deletions take O(log n2) n – 1 iterations First iteration: compute distance between all pairs: O(n2d) All other iterations: compute distance between most recently created cluster to all other clusters: O(nd) Total: O(n2d) Slow (does not scale well)
Hierarchical Clustering Pros/Cons Simple Do not need to know k ahead of time Cons: Sensitive to noise Slow – does not scale well Does not “learn” – can’t undo any steps
K-Means Clustering Input: Data points k (number of desired clusters)
K-Means Clustering Choose k random points (“means”)
K-Means Clustering Assign each data point to closest mean.
K-Means Clustering Assign each data point to closest mean. How can we adjust the means to be closer to their data points?
K-Means Clustering Adjust each mean to be the center of its data points. Reassign data points to closest means.
K-Means Clustering Repeat until means no longer move. (Convergence guaranteed)
K-Means Algorithm Assume n data points, k clusters desired Choose k random points as means Repeat until means no longer move: Assign each data point to closest mean Move each mean to center of cluster of points that are assigned to it Run Time? (Simple implementation) Assume n data points, d dimensions, i iterations Total: O(i2ndk)
K-Means Algorithm Assume n data points, k clusters desired Choose k random points as means Repeat until means no longer move: Assign each data point to closest mean Move each mean to center of cluster of points that are assigned to it Run Time? (Simple implementation) Assume n data points, d dimensions, i iterations Total: O(i2ndk)
K-Means Algorithm Assume n data points, k clusters desired Choose k random points as means Repeat until means no longer move: Assign each data point to closest mean Move each mean to center of cluster of points that are assigned to it Run Time? (Simple implementation) Assume n data points, d dimensions, i iterations Total: O(i2ndk) O(ndk) O(knd)
K-Means in action (k=3)
K-Means in action (k=3) All means shifted down
K-Means in action (k=3) Red points are converging
K-Means in action (k=3)
K-Means in action (k=3)
K-Means in action (k=3)
K-Means Pros/Cons Pros: Cons: Efficient Some knowledge of k is required
Knowledge in AI
Agent sensors actuators environment agent ? ?
Algorithms: Search, CSP, Probabilistic Inference, Learning Agent sensors actuators environment agent ? Algorithms: Search, CSP, Probabilistic Inference, Learning
Algorithms: Search, CSP, Probabilistic Inference, Learning + Agent sensors actuators environment agent ? Algorithms: Search, CSP, Probabilistic Inference, Learning + Knowledge Base
Recall 8-Puzzle 1 2 3 4 5 6 8 7 Turns out, some puzzles are not solvable n x n board Inversion: number of pairs of tiles i, j such that i < j but i appears after j in row-major order If n odd and number of inversions is odd, then unsolvable
Recall 8-Puzzle 1 2 3 4 5 6 8 7 Turns out, some puzzles are not solvable n x n board Inversion: number of pairs of tiles i, j such that i < j but i appears after j in row-major order If n odd and number of inversions is odd, then unsolvable Agent should have this information
Knowledge Base Set of sentences/facts (in logical language) “n is odd” “1 is odd” “Number of inversions is odd” Agent should be able to expand the knowledge base through inference “Puzzle is unsolvable”
Hunt the Wumpus Invented in the early 70s originally command-line (think black screen with greenish text)
Wumpus World
Wumpus World Environment: Actuators: 4x4 grid of rooms Agent starts in [1,1] Gold in a random room Wumpus in a different random room Bottomless pits in some rooms Wumpus can eat agent if in same room Agent can shoot Wumpus with arrow Actuators: Move left Move right Move up Move down Grab gold Shoot
Wumpus World Continued Sensors: Stench (adjacent square contains Wumpus) Breeze (adjacent square contains pit) Glitter (this square contains gold) Scream (Wumpus killed) Performance measures: Gold: +1000 Death: -1000 (falling into pit or eaten by Wumpus) -1 per step -10 for using the arrow
Wumpus world environment Fully Observable? No…unaware of environment until we explore Deterministic (state of environment determined by current state and action)? Yes Static? Adversarial?
Wumpus world environment Fully Observable? No…unaware of environment until we explore Deterministic (state of environment determined by current state and action)? Yes Static? Adversarial?
Exploring a wumpus world A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square Language to represent knowledge
Exploring a wumpus world Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold OK = Safe Square P = Pit S = Stench W = Wumpus
Exploring a wumpus world Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square What can we infer?
Exploring a wumpus world Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square
Exploring a wumpus world Breeze = yes, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square B What can we infer?
Exploring a wumpus world Breeze = yes, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B P?
Exploring a wumpus world Breeze = no, Glitter = no, Pit = no, Stench = yes, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B P? S What can we infer?
Exploring a wumpus world Breeze = no, Glitter = no, Pit = no, Stench = yes, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B P? OK S W?
Exploring a wumpus world Breeze = no, Glitter = no, Pit = no, Stench = no, Wumpus = no A = Agent B = Breeze G = Glitter/Gold P = Pit S = Stench W = Wumpus OK = Safe Square P? B OK S W?
Wumpus with propositional logic Using logic statements, we can determine all of the “safe” squares How to implement?
Propositional logic Syntax: Defines what makes a valid statement: Statements are constructed from propositions A proposition can be either true or false Proposition made up of symbols and connectives Semantics: Rules for determining the truth of a statement <Later>
Propositional Logic - Syntax Symbols: represents a proposition that can be true or false ex. Breeze in [2, 1] B2,1 Pit in [2, 2] P2,2 ex. n is Odd n_odd Connectives: proposition operators Negation: not, , ~ Conjunction: and, Disjunction: or, Implication: implies, => Biconditional: iff, <=>
Propositional Logic - Syntax Symbols: represents a proposition that can be true or false ex. Breeze in [2, 1] B2,1 Pit in [2, 2] P2,2 ex. n is Odd n_odd Connectives: proposition operators Negation: not, , ~ Conjunction: and, Disjunction: or, Implication: implies, => Biconditional: iff, <=>
Propositional Logic - Syntax Sentence: statement composed of symbols and operators ex. P2,2 P1,3 Formally: Sentence: True | False | Symbol | Sentence| Sentence Sentence | Sentence Sentence | Sentence => Sentence | Sentence <=> Sentence
Propositional logic Syntax: Defines what makes a valid statement: Statements are constructed from propositions A proposition can be either true or false Proposition made up of symbols and connectives Semantics: Rules for determining the truth of a statement Truth table Rules of logic
Propositional Logic Semantics Some Rules of Logic: Modus Ponens: P => Q, P: can derive Q deMorgan’s: (AB): can derive A B
Inference with Propositional Logic Suppose we want to infer something: Wumpus in (2, 2) Goal: Given initial knowledge base, use semantics to make inferences to expand knowledge and ultimately prove a proposition Look familiar?
Inference with propositional logic View it as a search problem: starting state: Initial Knowledge Base (KB) actions: all ways of deriving new propositions from the current KB result: add the new proposition to the KB goal: end up with the proposition we want to prove