Clustered representations: Clusters, covers, and partitions
Outline The graph model. Clusters, covers, and partitions. Locality measures and neighborhoods. Sparsity measures. Example: A basic construction. Some additional variants.
The graph model Arbitrary weighted graph G=(V,E,w). The weights are assumed to satisfy the triangle inequality. If the graph is unweighted, then we assume a weight of 1 for each edge.
Clusters A collection of vertices in the graph as well as edges connecting them. Formally, given a set of vertices S V, let G(S) denote the subgraph included by S in G, namely, G(S)=(S,E’), where E’ consists of all the edges of G whose endpoints both belong to S. 1 5 5 4 2 4 6 3 6 Graph G Cluster S
Covers and partitions A cover of the graph G=(V,E,w) is a collection of clusters S ={S1,…, Sm} that contain all the vertices of the graph, i.e., such that S=V. A partial partition of G is a collection of disjoint clusters S ={S1,…, Sm}, i.e., with the property that SS’= for every S, S’S. A partition of G is a collection of clusters S that is both a cover and a partial partition.
Example Graph G Cover C Partition P 1 1 5 5 2 2 4 4 6 6 7 7 3 3 8 8 1
Evaluation Criteria We will use two types of evaluation criteria: Locality level for the clusters, which is usually measured by cluster’s radius or size. Sparsity (overlap) level of the clusters in a collection of clusters, which is measured by the degree of vertices or clusters in either the cover, the graph, or the induced graph.
Locality measures and neighborhoods Cluster radius and diameter: Locality level of a cluster is usually measured by distance parameters, such as radii and diameter. [Definition of radius and diameter]: For vertex vS, we define the radius of S w.r.t. v as in the induced graph G(S), namely, Rad(v,S)=Rad(v,G(S))=max{distG(S)(v,w)} wS
Radius and Diameter for a collection of clusters Given a collection of clusters S, Diam(S)=maxi{Diam(Si)}, and Rad(S)=maxi{Rad(Si)}
Neighborhoods Definition of [p-neighborhood cover]: Given a subset of vertices WV, the p-neighborhood cover of W is the collection of p-neighborhoods of the vertices of W, denoted p(W)={p(v) | vW} ^ 3 Example: The neighborhoods 0(v), 1(v), 2(v), and 3(v) in a weighted graph. 1 v 1 2 1 1 1 1 2
Sparsity Measures: Cover Sparsity Sparsity (Overlap) of a cover can be measured using: [Definition of Maximum degree]: vV, let degs(v) denote the number of occurrences of v in clusters SS, i.e. the degree of v in the hypergraph (V,S). [Definition of Average Degree]: The average degree of a cover S is: Δ(S)=
Sparsity Measures: Partition Sparsity Cluster Graph: Represent each cluster as vertex and combine each set of edges between two clusters into one edge between two clusters. [Definition of Vertex and Cluster-Neighborhood]: Given a partition S, a cluster SS and an integer p0, the p-vertex neighborhood of S is defined as the union of the p-neighborhoods of the vertices in S,
Example: A Basic Construction For a given unweighted graph G=(V,E) and parameter k1, we produce a partition S with clusters of radius at most k and with a small number of intercluster edges. Theorem: Given an n-vertex unweighted graph G=(V,E) and an integer k1, Algorithm Basic_Part constructs a partition S that satisfies the following properties: Rad(S)k-1, and The cluster graph G’(S) has at most n1+1/x intercluster edges.
Algorithm Basic_Part(G,k) Set S0 While V do: Select an arbitrary vertex vV. Set S {v}. While |v(S)|>n1/k|S| do: Set S (S). Endwhile Set S S {S} and V V – S. Output (S).