Download presentation
Presentation is loading. Please wait.
Published byAntony Moore Modified over 8 years ago
1
Lecture 14, CS5671 Clustering Algorithms Density based clustering Self organizing feature maps Grid based clustering Markov clustering
2
Lecture 14, CS5672 Density based clustering Rationale –“Definition of city limits” –General version of single-linkage clustering (How?) –A cluster is a region where many points are concentrated –Density within each cluster is higher than cut-off –Sparse areas excluded from clustering Algorithm –Given: {i} and minDensity = minimum number of points within radius r of i –Pick all i that surpass minDensity to form initial clusters –Merge all clusters that can be connected through any j that surpasses minDensity Advantages –Don’t need to decide on number of clusters beforehand –Clusters of arbitrary shape can be picked up Limitations –Need to decide on radius and minDensity –Doesn’t differentiate between clusters that barely meet minDensity versus those that are much denser
3
Lecture 14, CS5673 Grid based clustering Rationale –“’Till death do us apart’, said the point to its neighbor” Algorithm –Given: {i} and grid of resolution s –Replace all i in each cell by summarized information, e.g., mean/mode –Cluster the grid cells rather than each object Advantages –Scalability, scalability, scalability –Performance independent of number of objects, as long as they exist in the same space “Takes equal amount of time to cluster New Yorkers and Icelanders” Limitations –Resolution confined to that of grid –Cluster boundary always corresponds to orientation of grid cell surfaces
4
Lecture 14, CS5674 Self-organizing feature maps (SOM) Rationale –“If you are truly my neighbor, say yes when I do, and say no when I do, and together we shall re-organize the world” –“Untangling puppet strings” Algorithm –Given: Input vectors {i} and NN with multiple output neurons {o} –Every time an output neuron o is activated by winner take-all, The neuron and its immediate neighbors are biased towards responding more strongly to the same example in future W new = W old - α(W old ) + α(P) (Kohonen) α(P) term makes weights approach the corresponding input vector per se Negative α(W old ) term ensures that weights do not increase infinitely Backpropagation not applicable here (Why?) –Finally, neighborhoods of output neurons map to each input vector Advantages –Unsupervised Neural Network (No targets necessary) –Used for expression array analysis (Different differences in expression level between multiple experiments are mapped onto different sets of output neurons) Limitations –Like the human brain, i.e., slow
5
Lecture 14, CS5675 Markov clustering algorithm (van Dongen) Rationale –“Let several drunks loose in a graph (Markov chain) with some chalk. Come back the next day and declare each well traveled area as a cluster” Principles: –No. of multi-step paths between two nodes is higher within clusters than between –A random walk between nodes will linger in a cluster before moving to a different one –Bridge edges will figure with high frequency in shortest paths between all nodes in data set Algorithm –Given: Markov chain of states (nodes) with symmetric probabilities of transitions –Alternate expansion (matrix squaring; “spread riches, but only within each kingdom”) and inflation (entry-wise squaring with rescaling for stochasticity; “more power to the mighty, and rob the weak of what little they have left”) till transition matrix converges –Each set of connected nodes constitutes a cluster Advantages –Numbers of clusters need not be specified explicitly –Scales well Limitations –Should be possible to transform distances into stochastic weight matrix
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.