Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generating multidimensional embeddings based on fuzzy memberships

Similar presentations


Presentation on theme: "Generating multidimensional embeddings based on fuzzy memberships"— Presentation transcript:

1 Generating multidimensional embeddings based on fuzzy memberships
Stefano Rovetta, Francesco Masulli, Maurizio Filippone Department of Computer and Information Sciences University of Genova

2 Introduction Current data collection methods are high-throughput (general trend in all disciplines) Data analysis research is oriented toward similarity-based methods exploiting mutual relationships between data items Example: KERNEL METHODS Example: SPECTRAL METHODS Potential for more powerful/more compact representations Advantages depend on: Relationship between data cardinality and dimensionality Availability of efficient methods to exploit this data representation

3 The reference problem We address gene expression analysis with DNA microarray experiments As well-known, typical features of this problem are: High dimensionality Low cardinality High variability (noise) We are interested in the explorative phase, typically based on cluster analysis (or simply clustering) The above features have prompted the development of efficient clustering techniques We aim at improving these methods for better quality results more powerful methods

4 Outline of the talk Some problems in high dimensionality
Similarity-based representations: a short review Fuzzy modeling of data collections Embedding in the space of memberships Membership Embedding as a spectral problem The probe selection problem and strategies Experiments and results Closing remarks

5 Some problems in high dimensionality
The 2 most used, traditional clustering methods are k-Means Hierarchical Agglomerative Clustering (HAC) variants k-Means looks for data concentrations (approximations of mixture distributions) May not work well when data are very sparse

6 Some problems in high dimensionality
HAC variants use several linkage criteria Based on similarity structure (the data similarity matrix) A common problem: no attempt at directly indicating clusters Data or clusters are progressively joined in pairs regardless of their density Actual clustering may be performed only afterward and by additional criteria (dendrogram depth distribution, cophenetic matrix analysis, agglomerative coefficients) This and other drawbacks have prompted development of more sophisticated methods based on similarity structures

7 Similarity-based representations
A similarity matrix is wij = similarity between data items i and j Similarity has a suitable definition depending on the nature of data May be derived by a metric or by other functions, or even given as input Metric similarity matrices are symmetric and positive semidefinite Data directly given as a similarity matrix may have any type of inconsistence (see works by Buhmann et al.)

8 Similarity-based representations
Some solutions are: Hierarchical methods capable of actual clustering e.g. the Farthest Neighbour Approach by Rovetta and Masulli Generalized similarity-based methods e.g. the approach by Pekalska and Duin Kernel methods, where wij = k(xi , xj ) with k( ) a semidefinite positive (Mercer) kernel function Spectral methods, where Wij = weight of link connecting xi and xj on a complete graph built on all data points

9 Kernel methods Kernel methods have originally been adopted in pattern recognition for applying linear classification methods to nonlinearly separable problems (Support Vector Machines) k(xi , xj ) = f (xi )f (xj ) for a suitable nonlinear mapping f ( ) (possibly unavailable in explicit form) k ( ) measures similarity in the mapped space f (x ) (it is an inner product) Subsequently, several similar problems have been tackled with the so-called kernel approach : Principal Component Analysis Novelty detection (one-class classification) Clustering

10 Spectral methods Spectral graph theory studies properties of the Lagrangian spectrum of a graph, that is, the ordered set of the eigenvalues of the graph's Lagrangian matrix A Lagrangian matrix is defined as L = D – W, where: W the weight matrix (adjacency matrix of the graph, with weights applied to edges) D the degree matrix, a diagonal matrix such that Dii is the sum of edge weights incident on vertex i A data set with a similarity matrix corresponds to a complete graph where: vertex i corresponds to data point xi edge weight Wij is the similarity between points xi and xj

11 Spectral methods x3 x1 x2 x1 x2 x4 x5 x3 x4 x5

12 Spectral methods L is not full-rank and its first eigenvalue is zero. The multiplicity of the zero eigenvalue is equal to the number of connected components in the graph. For a similarity derived from a metric, the graph is undirected and L is symmetric (Lij = Lji ), and the graph is complete (no zeroes in W or L ). We have only one connected component and eigenvalue 0 has multiplicity 1 Therefore, a spectral clustering problems corresponds to analyzing the first few eigenvalues, excluding the first one, to find out the most strongly connected components which are clusters in data This procedure has computational disadvantages (solution of an eigenproblem) but can discover a wide range of cluster shapes

13 Notes about spectral methods
The actual clustering is often performed as k-Means on the embedding of data in the space of the first few eigenvectors This has been proved to be theoretically sound: clusters are made more evident by this mapping Spectral methods are a recent field and are currently actively studied Several connections between Spectral and Kernel methods have been recently pointed out In some cases even equivalences We can also study various normalized Laplacians such as L = D –1 (D – W ) = I – D –1 W This generalized definition has some properties not found in the standard Lagrangian

14 Fuzzy modeling of data collections
Let's come back to the problem of representing efficiently high-dimensional data for the purpose of clustering Suppose we have a data set X = { xi }, X  X X is not required to be a vector space, but we need a similarity measure s on X Let Y = { y1, ..., yc } be a set of c points in X We can characterize a point x  X in terms of how well Y represents (approximates) x We decide to term the points y1, ..., yc probes

15 Fuzzy modeling of data collections
We compute the similarity s ( x , yj ) for each point in Y using s ( x , yj ) = e –|| x – yj ||2b These can be organized into a vector u, such that uj = s ( x , yj ) INTERPRETATION: uj is the fuzzy membership of x to the fuzzy set yj The membership to each point is a mutually exclusive concept Therefore we should normalize these memberships to sum up to 1: Sj uj =  vj = uj / Sm um

16 Embedding in the space of memberships
We have now defined the matrix V such that Vij = normalized membership of data point xi to probe yj This is a new representation of data X embedded in the space of memberships to probes Y It is a similarity-based representation If X is a vector space, V has the advantage of being potentially lower-dimensional (c instead of the original data dimension d, with c ≤ d ) Even if X is not a vector space, V is always a vector representation

17 Clustering in the Membership Embedding space
Select a set Y of c probes from the data points in X Map X (n x d ) into V (n x c) Apply your preferred clustering algorithm to the mapped dataset V Why should this method give better clusters? One reason is simply lower dimensionality. This helps in applying methods like k-Means There is a more sound reason for this

18 Membership Embedding as a spectral approach
It can be shown that our membership embedding V can be used to define the Lagrangian of a data graph This is not a complete graph, but one where only probes have edges to every point in the set. Points which are not probes are linked only to probes. x3 x3 y1 x1 x2 x1 x2 y2 x4 x5 x4 x5

19 Membership Embedding as a spectral approach
By adding zero entries to appropriate locations, it is possible to use the entries of V to build a normalized Laplacian such that: Lik = Vij for all j such that yj coincides with xk Lii = 1 for all i Lik = 0 in all other cases This is the Laplacian of the non-complete graph seen earlier Therefore, by appropriate selection of the probes, this reduced graph spectral approach is exactly equivalent to the membership embedding approach

20 The probe selection problem and strategies
Analytic selection of the probes is an open problem It is a combinatorial problem with a search space of exponential size (exactly 2n) if the optimal number c of probes is not given, as is usually the case The cost function is also computationally demanding, since we are asking for a matrix that is able to separate well the clusters This condition can be stated in terms of eigenvalues, which requires solving an eigenproblem

21 Simulated annealing approach
Vector g is the state of a formal physical system Indicator variable gi selects point xi as a probe Energy: E = e + lc e clustering quality measure (obtained by experimental validation) lc complexity penalty, with parameter l State transition: some components of g are switched at random, 1 → 0 or 0 → 1 The number of switchings is bounded by parameters Probability of transition: T is gradually reduced until convergence

22 Simulated annealing approach

23 Experiments and results
Datasets: Leukemia by Golub et al. Clustering algorithm: Fuzzy c-Means Experiment setup: compare clustering results obtained with different embeddings: Original data space Distance embedding (Euclidean distance from probes) Membership embedding (as described earlier) Evaluation: e = Representation Error We label each cluster with the majority class found among its points We compute the number of mismatches (points in clusters having a different class)

24 Experiments and results
Comparative performance during training

25 Experiments and results
Final results: comparative performance and values of b

26 Experiments and results
Result of the probe selection process

27 Experiments and results
Dependence of RE on the Fuzzy c-Means fuzziness parameter m

28 Closing remarks The Membership Embedding method effectively reduces the dimensionality of data to be clustered It provides good experimental results in clustering tasks Equivalence with a spectral approach has been proved This may account for the measured improvement in performance Probe selection is done by simulated annealing Further work will address other techniques for probe selection and their algorithmic characterization More experiments are on plan (interesting problems to solve anyone?)


Download ppt "Generating multidimensional embeddings based on fuzzy memberships"

Similar presentations


Ads by Google