Download presentation
Presentation is loading. Please wait.
1
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh
2
Amos Storkey, School of Informatics, University of Edinburgh Attribute Generalisation
3
Amos Storkey, School of Informatics, University of Edinburgh Prior work Tishby and Slonim Meila and Shi Coifman et al Nadler et al
4
Amos Storkey, School of Informatics, University of Edinburgh Example: Transition Matrix
5
Amos Storkey, School of Informatics, University of Edinburgh Example: 20 Iterations
6
Amos Storkey, School of Informatics, University of Edinburgh Example: 400 Iterations
7
Amos Storkey, School of Informatics, University of Edinburgh Argument A priori dependence on data. No generative model. Inconsistent with underlying density. Clusters are spatial characteristics that are properties of distributions. Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated.
8
Amos Storkey, School of Informatics, University of Edinburgh But we do know Know diffusion asymptotics, but probabilistic formalism inconsistent with data density: –Finite time-step, infinite data limit equilibrium distribution does not match data distribution.
9
Amos Storkey, School of Informatics, University of Edinburgh Density Traversal Clustering Define discrete time, continuous, diffusing Markov chain. Definition dependent on some latent distribution. Call this the Traversal Distribution.
10
Amos Storkey, School of Informatics, University of Edinburgh The Markov chain Transition with probability D(y,x) is Gaussian centred at x, P * is Traversal distribution. Here S is given by the solution of
11
Amos Storkey, School of Informatics, University of Edinburgh Generative procedure
12
Amos Storkey, School of Informatics, University of Edinburgh Problems Random walk in continuous space Each step involves many intractable integrals. Real Bayesians would... Good prior distributions over distributions is a hard problem, but need prior for traversal distributions.
13
Amos Storkey, School of Informatics, University of Edinburgh CHEAT Doing all the integrals is not possible, but... –All integrals are with respect to traversal distribution –Use empirical data proxy –All the integrals now become sample estimates: sums over the data points. –Everything is computable in the space of data points. –WORKS!: never need to evaluate the probability at a point, only integrals over regions.
14
Amos Storkey, School of Informatics, University of Edinburgh We get… Scaled likelihood P (x i | centre x j ) / P (x i ) = n (A D ) ij –A = WS -1 –W is usual affinity –S -1 is extra consistency term. More generally have out of sample scaled likelihood: –P( x | centre y) / P( x)= n a(x) T (A D-2 )b(y) where a(x) and b(x) are the traversal probabilities to and from x.
15
Amos Storkey, School of Informatics, University of Edinburgh Example: Scaled likelihoods
16
Amos Storkey, School of Informatics, University of Edinburgh Example: 20 Iterations
17
Amos Storkey, School of Informatics, University of Edinburgh Example: 400 Iterations
18
Amos Storkey, School of Informatics, University of Edinburgh Initial distribution Can consider other initial distributions. Specifically can consider delta functions at mixture centres. Variational Bayesian Mixture models…
19
Amos Storkey, School of Informatics, University of Edinburgh Demo
20
Amos Storkey, School of Informatics, University of Edinburgh Number of clusters Scaled likelihoods for three cluster problem.
21
Amos Storkey, School of Informatics, University of Edinburgh Number of clusters Scaled likelihoods for a five cluster problem.
22
Amos Storkey, School of Informatics, University of Edinburgh Cluster allocations
23
Amos Storkey, School of Informatics, University of Edinburgh Cluster allocations
24
Amos Storkey, School of Informatics, University of Edinburgh Conclusion A priori formulation of spectral clustering. Can be used as any other spectral procedure But also provides scaled likelihoods – can be combined with Bayesian procedures. Variational Bayesian formalism. Small sample approximation issues. Better to have a flexible density estimator.
25
Amos Storkey, School of Informatics, University of Edinburgh Generative Kernels Related to Seeger: Covariance Kernels from Bayesian Generative Models Gaussian Process over X space Data is obtained by diffusing in X space using the traversal process... Density, and corresponding traversal process. And then local averaging and Additive noise. X
26
Amos Storkey, School of Informatics, University of Edinburgh Generative Kernels Covariance K ij is Again use sample estimates. Presume measured target is local average. Just standard basis function derivation of GP.
27
Amos Storkey, School of Informatics, University of Edinburgh Motivation Generative model generates clustered data positions. Targets diffuse using traversal process. Target values suffer locality averaging influence: –Diffused objects locally influence one another’s target values so everyone becomes like their neighbours. E.g. Accents. Can add local measurement noise.
28
Amos Storkey, School of Informatics, University of Edinburgh Kernel Clustering Use sample estimates again to get kernel Can also encorporate a prior over iterations and integrate out. For example can use matrix exponential exp( A) instead of (A D ).
29
Amos Storkey, School of Informatics, University of Edinburgh Generating targets for rings data Can generate from the model: Across cluster covariance is low. Within cluster continuity.
30
Amos Storkey, School of Informatics, University of Edinburgh The point? Density dependence matters in missing data problems. Gaussian process: data with missing targets has no influence. Density Traversal Kernel: data with missing targets affects kernel, and hence has influence.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.