Download presentation
Presentation is loading. Please wait.
Published byWillis Patrick Modified over 6 years ago
1
Vertical Graph Analytics (summarizes SatNotes 06_15-pres) Most complex data is modelled as a graph or hypergraph (a table is a graph without edges, so ALL data is modelled as a graph!). We strive for max speed and accuracy in our graph analytics by using vertical structure. We consider the following topics: Vertical structuring of graph data (Edge pTree (E), PathPtree (PP), ShortestPathTrees…). Connectivity Component Partitioning. Community Mining (k-plexes, which include cliques as 0-plexes; k-cores, Density-communities, Degree-communities, Community existence theorems (determine if a given Induced SubGraph is a community) and community mining algorithms (find all communities) include: Vertex Count based Existence Thms. Inheritance (downward or upward closure based existence thms). Density Difference. Degree Difference. Graph and HyperGraph Clustering (Community based, Vertex betweenness, Edge betweenness Clustering). MultiPART graphs, HyperGraphs, MultiPART Hypergraphs nnd the Clique Tree construct (cTree) for MultiPART graphs and hypergraphs. PP(G), the Path Ptree of graph, G, is a vertical representation of all paths in G and is used to find diameter, shortest paths, communities, motifs... By modifying data structures (from horizontal to vertical) the analytics fit hardware strengths and allow do NP-hard/complete problems. A Path is a sequence of edges connecting a sequence of vertices, distinct except for end-vertices. A Simple Path (assumed) excludes loops, (v,v). We’ll always program using the pop-count (produces 1-counts during ANDs/ORs for free, timewise). C is a clique iff all C level 1counts are |VC|-1. COMMUNITIES (=~ a subgraph with more edges than expected): A k-plex is a [max] subgraph in which each vertex is adjacent to all subgraph vertices except at most k of them. A 0-plex is called a clique. A k-core is a [max] subgraph in which each vertex is adjacent to at least k subgraph vertices. An n-clique is a [max] subgraph s.t. the geodesic distance between any vertex pair is n. An n-clan is a [max] n-clique with diameter n. An n-club is [max] subgraph of diam=n. vC, kvint =#edges v to C; kvext=#edges v to C’. kvint IntDeg(C) kCint = vC ExtDeg(C), kCext =vC kvext InternalDensity of C δint(C)=|edges(C,C)|/(nc(nc−1)/2) External Density of C δext(C)= edges(C,C’)|/(nc(n-nc)). ExtDenC*n(n-nC)/2=ExtDegC IntDenC<<IntDegC. k-plex existence: C = k-plex iff vC|Cv| |VC|2–k k-plex inheritance: An induced subgraph of a k-plex is a k-plex. k-core inheritance: If cover by induced k-cores, G is k-core. k-core existence: C = k-core iff vC, |VC| k. Clique Existence: When is an induced SG a clique? Edge Count existence thm (EC): |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) SubGraph existence theorem: (VC,EC) is a k-clique iff every induced k-1 subgraph, (VD,ED) is a (k-1)-clique. A Clique Mining alg: Finds all cliques in a graph- uses an ARM-Apriori-like downward closure property: CLQkkCliqueSet, CCLQk+1Candk+1CliqueSet By SG, CCLQk+1= all s of CLQk-pairs having k-1 common vertices. Let CCCLQk+1 be a union of two k-cliques with k-1 common vertices. Let v,w be the kth vertices of the k-cliques, then CCLQk+1 iff (PE)(v,w)=1. (Just need to check a single bit in PE.) A good tradeoff between large δint(C) and small δext(C) is goal of density community mining algs. A simple approach is to maximize differences. Density Difference alg for Communities: δint(C)−δext(C) >Thresh? Degree Difference kCint – kCext > Thresh? Easy to compute even for Big Graphs. Giant Yahoo Data Dump Aims to Help Computers Know What You Want: (see “Here’s What Developers Are Doing With Google’s AI Brain”).
2
Next we build a ShortestPathtree, SPG1 for G1
A graph is a set of vertices, V, and a set of edges, E, each connecting a pair of those vertices. An edge from vertex h to vertex k is realized as the unordered set, {h,k} (or just hk), and can be viewed as an undirected line from h to k. We can either list the edges in a two column table (the Edge Table) or we can use a 3 column table in which the first 2 columns list all possible vertex pairs (in raster order) and the third column is a bit map indicating with a 1-bit the pairs that are edges and with a 0-bit the pairs that are not edges. This second option is called the edge map or edge mask and is shown below for a small graph, G1. The edge map obviously has |V|2 rows. If the raster ordering is always assumed, the edge map is just a single column of bits. The edge map can be compressed into a pTree (predicate Tree) by dividing the bits up into “strides” of |V| bits each (4 for G1). This forms the lowest level of the pTree (level_0) and an upper level (level_1), indicates the truth of the predicate, “Not Purely Zeros” for the respective level_0 pTrees, and can be used to avoid retrieving level_0 pTrees that are purely zeros. We use the notation Ek for the kth level_0 pTree (which bitmaps the endpoints of the edges adjacent to vertex k.) and call it the Edge pTree of k A Path is a sequence of edges connecting a sequence of vertices, distinct, except for endpts. A Simple Path (assumed throughout) disallows simple loops, (v,v) Next we build a ShortestPathtree, SPG1 for G1 It starts with Level_0 of the EdgeTree. vertex, k, this gives us a mask, Sk, of the end pts of edges adjacent to vertex k (shortest path of Length 1 starting at k). The complement of Ek (with k turned off) gives us the endpoints that never need to be considered again (since all shortest paths from k to these vertices hve been found). We call these pTrees the “Not Reached Yet masks” or “N masks”. M1 1 M2 M3 M4 Vertex Masks 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 Edges V1 V2 E 1 1_ Edge Map E1 1 E2 E3 E4 2-Level Stride=4, Edge pTree Level1 1 2 3 4 We use the notation, Shk for the map of the endpts of Shortest Paths thru h then k (obviously of length=2) and NLv for the map of vertices not reached by lengthL shortest paths from vertex v S1 1 2 S2 1 S3 1 2 S4 1 3 N11 1 N12 1 2 N13 1 N14 We can avoid these calculations by noting Ct(N14 )=0. S13 =N11&E3 S14 =N11&E4 1 S24 =N12&E4 1 2 S31 =N13&E1 S34 =N13&E4 1 S41 =N14&E1 S42 =N14&E2 S43 =N14&E3 N21=N11& (S13|S14)’ N22=N12& (S24)’ N23=N13& (S31|S34)’ N24=N14& (S41|S42|S43)’ This entire level is unnecessary to construct since |N2k|=0 k. The SPTree is shown by the green links. S142 =N21&E2 S241 =N22&E1 S243 =N22&E3 S312 =N23&E1 S342 =N23&E2 The connectivity components can be deduced from the zero set of the final NLks. Girvan and Newman started a flurry of research by suggesting the graph could be edge labelled by an edge_between-ness measurement (which counts the shortest path participations of the edge) and that a graph could be usefully partitioned (into strongly connected components) by the divisive hierarchical clustering of removing edges in desc order of between-ness. Btwn14= Btwn24= Btwn34=1
3
Counting SP participations: If a12= the count of 12 participations and a21=the count of 21 participations, then the full participation is BtwnGN=a12.. + a21.. + a12..a21 since 21 SP, 1—2 will participate in the middle of another participation extending a 12 participation. So the problem is computing all ahk.. Correctly (We don’t need the +1) Btwn1.5 = a12k + a21h Btwn2.5 = a12k + a21h + a12h * a21h hListEk Shk=Ek&N1h SP 1 2 4 3 6 7 5 N1 1 2 3 4 5 6 7 N2k = N1k & (ORhListEkSkh)’ So N21= S12|S13|S14|S16)’ G2 We can now deduce the graph is connected, |N21|=0 CC1=all. 1 2 3 4 b c d e h i f g 5 6 9 a 7 8 1 2 3 4 6 S 2 1 3 4 S S 3 1 2 4 4 1 2 3 S S 6 5 1 2 S 6 5 7 1 3 S 6 7 1 2 N2 1 N2 2 1 3 N2 1 2 4 N2 1 2 5 N2 1 3 N2 6 7 N2 1 3 2 4 So Btwn1—2 = 8 Btwn1—b = Btwn1—c = Btwn2—3 = Btwn2—4 = = 60 S 2 1 6 S 1 3 6 2 Btwnb—d = Btwnc—e = Btwn3—5 = Btwn4—6 = = 45 S 1 4 6 2 S 5 6 7 1 3 S 7 6 5 1 3 N3 2 N3 3 N3 4 N3 5 N3 6 Btwnd—f=Btwnd—g=Btwne—h=Btwne—i=Btwnr—7=Btwn5—8=Btwn6—9=Btwn6—a= = 17 Radius(x)MAXyExLength(SP(x,y)) is an interesting vertex label. MIN(ahk , akh ) is an interesting edge label. The min # of SP hops from an edge in either direction is an edge radius (like vertex radius?). 1 4 2 3 5 6 7 Alternatively, DATON (deleted all ties or none (none if a vertex is isolated))? Both have the same ordering as BtwnGN BtwnGN|1.5|2.5_DNIv_DATON BtwnGN b Btwn1.5 Btwn2.5 c 6 5 7 1 2 4 3 1 2 4 3 6 7 5 BtwnGN_DNIv Btwn1_DNIv_DATON Btwn1_DNIv V 6 6 5 7 5 7 1 2 1 2 4 3 4 3 Btwn1 = |Ph-1|+|Pk-1|+1 One way to get good dendogram partitions is to stop at large BtwnGN gaps: e.g., gaps>2*avg BtwnGNAvg=10/9=1.1 so ThreshGN=2.2 Btwn1.5Avg=10/5= so Thresh1.5=4 (there are no large Btwn1.5 gaps) Btwn2.5Avg=10/11=0.9 so Thresh2.5=1.8
4
Counting SP participations in G5
hListEk Shk = Ek&N1h Nk Nk & (ORhListEkSkh)’ S 1 3 2 4 5 6 7 8 N 1 4 2 5 3 6 7 8 BTWNGN_DNIv 1 2 3 4 5 6 8 7 G5 BTWNGN 4 7 8 S 1 2 5 4 7 3 6 8 1 5 7 2 3 6 4 8 1 2 8 3 7 4 S 1 2 5 7 4 3 6 8 N 1 3 2 5 4 6 7 8 6 5 BTWN1.5_DNIv BTWN1.5 4 7 8 1 2 8 3 7 4 1 S 2 4 5 7 6 5 N 1 3 2 5 4 6 7 8 BTWNGN_DNIv_DATON BTWN1.5_DNIv_DATON 1 2 1 2 8 3 8 3 CC1= N’1= N’2= N’4= N’5= N’7= 1 5 N’3= CC3= N’6= N’8= 3 7 4 5 S 1 2 4 7 7 4 6 5 6 5 N 1 3 2 5 4 6 7 8
5
SP partic for G6 E 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 a 1 b 1 c 1 1 5 S 1 3 2 4 5 6 7 8 9 a c b N 1 8 2 3 4 5 9 6 7 a c b G6 2 4 7 6 3 c b 9 a 8 S 1 2 3 4 c 7 5 6 8 9 a b N 1 6 2 3 4 5 7 8 9 a c b 1 S 3 c 4 7 2 9 a b 5 6 8 N 1 2 3 4 5 6 7 8 9 a c b 1 S 3 c 9 a b 4 7 5 6 2 8 N 1 2 3 4 5 6 7 8 9 a c b Diameter(G6)=4, G6 is one connected component. The radius of each point is 4.
6
BTWNGN with DNIv BTWN1 with DNIv V BTWN1 with DNIv ^ BTWN1 with DNIv and DATON S 1 3 2 4 5 6 7 8 9 a c b 1 5 1 5 1 5 1 5 G6 G6 G6 G6 2 4 7 6 2 4 7 6 2 4 7 6 2 4 7 6 3 c 3 c 3 c 3 c b 9 b 9 b 9 b 9 a 8 a 8 a 8 a 8 A way to get good dendogram partitions is to stop at large BtwnGN gaps: e.g., gaps>2*avg Avg=47/19=2.5 At this partition gap=42-35=7>5. BTWNGN(hk)=|hk..|+|kh..|+|hk..||kh..|+1 a b c 7 a b S 1 2 3 4 c 7 5 6 8 9 a b Here, 30-21=9. Here, 18-8=10. a b c BTWN1 7 a b All Btwn1 and Btwn1.5 gaps=1. Lexical orderings for breaking ties aren’t semantic; they depend on the number order of vertices, which is artificial. So DATON is better! 1 S 3 c 4 7 2 9 a b 5 6 8 a b c BTWN1.5 7 a b gaps>2*avg Avg=6-1/19=.26 All gaps are 1 so they all qualify. Stop after 6’s, no partition!. 1 S 3 c 9 a b 4 7 5 6 2 8 Stop after 5’s. BTWN1.5 with DNIv_DATON 1 5 Stop after 1’s (e.g., at the end). G6 2 4 7 6 3 c b 9 a 8 BTWN1 with DNIv V 1 5 G6 2 4 7 6 3 c b 9 a 8
7
G7: All Shortest Paths (SPs) 1-b
e k w s x x y y y q x y e k w w w a c b d f e g h j i k m l n p o q r s t v u w y x g e k v v e k v 6 7 w s x y y x y a s s s t t a e s t 6 7 w y y o p y w y a b e 6 7 w s x y e 6 a 5 1 7 b e k w s x x y y y q x y e k w w w a b d c e g f h j i k m l n p o q r t s u w v x 3 d y 6 1 7 b e k w s x x y y y q x y e k w w w 4 d e k w w w e k w s x x y y y q x y 4 d i c m 8 2 1 3 4 e k d 5 6 7 b h p r w q s t o g u f n l j a x v 9 y G7: Friendships in Zachery’s Karate Club e k w e k v 9 a e s t x e y y y y y y y y y y y y y e k w e k v 9 a e s t e a c b d f e g h i j l k m n p o q r s t v u w y x 4 c a a a a a a a 3 3 a a a 3 y s o s w 6 7 a a y y y 1 1 2 9 f b b b b b b b b b b b b b b b b e k w s x x y y y q x y b b b e k w w w 3 d 9 1 3 v x y 6 7 w s o w o s w x x y y y 5 e e
8
G7: All Shortest Paths (SPs) c-h
c c c c c c c c c c e k w w w c c c c c c c c c e k w s x x y y y q x y b a c e d f g h i j l k m o n p r q s u t v w x y 1 g d d d d d d d d d d d d d d d d d d d d d d d d e k w 2 3 e s x x y y y q x y s x y d d e k w w w 3 3 e 2 e e e e e e e e e e e e e y y y e y 6 7 w s o s w 5 c f f f x x x x x y y y y y y y f f f f f f f f f f f f f x y 3 9 o v w 9 e k o s v w 2 a f i i i i i i i i i i i i i i i i i i i i i i i i i i i i e k w w w 3 3 e k v v i e k w 3 e k v s x x y y y q x y s x y y x y 2 e g g x x x x x y y y y y y y g g g g g g g g g g g g g x y 3 9 o v w 9 e k o s v w 2 a f i c m 8 2 1 3 4 e k d 5 6 7 b h p r w q s t o g u f n l j a x v 9 y G7: All Shortest Paths (SPs) c-h 2 1 4 3 5 7 6 8 9 A B D C E G F H I K J L M N O Q P R T S U V W X y h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h e k w w w e k w w w h e k w e k w s x x y y y q x y s x x y y y q x y 2 2 2 c c l f x x x x x y y y y y y y l l l l l l l l l l l l l x y 3 9 o v w 9 e k o s v w 2 a f n x y 3 9 o v w 9 e k o s v w n n x x x x x y y y y y y y n n n n n n n n n n n n 2 a f j f x x x x x y y y y y y y j j j j j j j j j j j j j x y 3 9 o v w 9 e k o s v w 2 a f 3 2 1 4 5 6 7 9 8 A B D C E G F H I J K M L N P O Q R S T V U W y X
9
G7: All Shortest Paths (SPs) k-r
m m m m m m m m m m m m m m m m m m m m m m m m m m m m e k w w w 3 3 e k v v m e k w 3 e k v s x x y y y q x y s x y y x y 2 e p p p p p p p p p p w w p q s w o 3 o y 1 t x y 6 7 p p p q s s s w w w w 1 1 c e 0 8 c 1 1 q q q q q q q q q q q w w q q q o o o o p w w w w 1 1 q o p w s u x y s 1 t x y 6 7 e 2 f 2 9 e 1 1 k k k y y y y y y y y y y y k k k k k k k k k k k k k k k k k k k 1 2 y w 3 v 9 f g j l n o s u v w 3 e 7 f 1 2 4 3 5 6 8 7 9 B A C E D F H G I K J L M N O Q P R T S U V W X y o o o o o o o o o o o o o o o o o o o o o o o o o o o o o q s s s x x x x x x y y y y y y y y w w w w 9 9 e e k k w w o o o o o o o o o o o o o o o o o o q q s s x x x x x x y y y y y y y y o o o o o q s x x x x y y y y y w v w 9 e e e k k v w o q s u x y w v w 9 e k v w a e r r r r r r r y y y y r r r r r r r r r r r r r r r u y y y y y y 9 9 w w r u y o x 9 a e k o s t v w r r u u y y y y y y y y y x 9 9 s t v w 2 2 e 1 2 3 4 5 6 7 9 8 A C B D E F G I H J L K M N O P R Q S T V U W y X
10
G7: All Shortest Paths (SPs) s-y
s s s 3 3 s s s 3 o p y s s s s e t t t t t t t t 3 3 w w t 3 w y t t t w 3 9 4 e u u u u u u u u u u u u u u u u u u u u u u u u u u x x x x x x y y y y y y y y u u u u u u u u u u x x x x x y y y y y y y w w 9 9 e e k k w w u u u u o o x x x y y y y y w 9 e e e k k w u o r x y q s 3 9 w 9 e k s w d v v v v v v v v v v v v v v v v x x y y y v 2 9 x y o w o s w b a c e d f h g i k j l n m o p r q s u t v w x y e 2 1 4 3 5 7 6 8 9 A B D C E G F H I K J L M N O Q P R T S U V W X y x x x x x x x x x x x f g j l n o u v v w w x x x x x x x x x x x x x x x x w w x x x x x x x 3 9 f g j l n o u v w y y y y y y y y 2 y 1 y b y y y y y y y y y 9 9 k k w w y y y y y y y 9 a e f g j k l n o s t u v w y y y y y y y y y y y y y y y 9 9 a k k s t v w a c b d f e g h i j l k m n p o q r s t v u w y x g w w w w w w w w 1 1 w 1 p q t x y 6 7 6 f a e 1 1
11
G7: Friendships in Zachery’s Karate Club. All SPCs.
e k w w w e k w s x x y y y q x y g e k v v e k v 6 7 w s x y y x y a s s s t t a e s t 6 7 w y y o p y w y a b e e 6 7 w s x y 6 a e k w w w 5 1 7 b e k w s x x y y y q x y 3 d e k w w w 6 1 7 b e k w s x x y y y q x y 4 d e k w w w e k w s x x y y y q x y 4 d e k w e k v 9 a e s t e e k w e k v 9 a e s t x e y y y y y y y y y y y y y 4 c a a a a a a a a a 3 3 a a y y y 1 1 a 3 y s o s w 6 7 2 9 f x x y y y 9 1 3 v x y 6 7 w s o w o s w 5 e e b b b b b b b b b b b b b b b b b b e k w w w b e k w s x x y y y q x y 3 d c c c c c c c c c c c c c c c c c c e k w w w c e k w s x x y y y q x y 1 g d d d d d d d d d d d d d d d d d d d d d d d d d e k w w w 3 3 e d e k w 2 3 e s x x y y y q x y s x y 2 e e e e e e e e e e e e e y y y e y 6 7 w s o s w 5 c f f f f f f f f f f f f f f f x x x x x y y y y y y y f x y 3 9 o v w 9 e k o s v w 2 a f g g g g g g g g g g g g g g x x x x x y y y y y y y g x y 3 9 o v w 9 e k o s v w 2 a f h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h h e k w w w e k w w w h e k w e k w s x x y y y q x y s x x y y y q x y 2 2 2 c c i i i i i i i i i i i i i i i i i i i i i i i i i i i i e k w w w 3 3 e k v v i e k w 3 e k v s x x y y y q x y s x y y x y 2 e j j j j j j j j j j j j j f x x x x x y y y y y y y j x y 3 9 o v w 9 e k o s v w 2 a f k k k k k k k k k k k k k k k k k k k k k y y y y y y y y y y y k 1 2 y w 3 v 9 f g j l n o s u v w 3 e 7 f l l l l l l l l l l l l l f x x x x x y y y y y y y l x y 3 9 o v w 9 e k o s v w 2 a f m m m m m m m m m m m m m m m m m m m m m m m m m m m m e k w w w 3 3 e k v v m e k w 3 e k v s x x y y y q x y s x y y x y 2 e n n n n n n n n n n n n n n x x x x x y y y y y y y n x y 3 9 o v w 9 e k o s v w 2 a f o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o q q s s x x x x x x y y y y y y y y o o o o o o o o o o o q s s s x x x x x x y y y y y y y y w w w w 9 9 e e k k w w o o o o o q s x x x x y y y y y w v w 9 e e e k k v w o q s u x y w v w 9 e k v w a e p p p p p p p p p p w w p p p q s s s w w w w 1 1 p q s w o 3 o y 1 t x y 6 7 c e 0 8 c 1 1 r r r r r r r r r r r y y y y r r r r r r r r r r r u y y y y y y 9 9 w w r r u u y y y y y y y y y x 9 9 s t v w r u y o x 9 a e k o s t v w 2 2 e q q q q q q q q q q q w w q q q o o o o p w w w w 1 1 q o p w s u x y s 1 t x y 6 7 e 2 f 2 9 e 1 1 s s s s s 3 3 s s s s s 3 o p y e t t t t t t t t 3 3 w w t t t w t 3 w y 3 9 4 e u u u u u u u u u u u u u u u u u u u u u u u u u u x x x x x x y y y y y y y y u u u u u u u u u u x x x x x y y y y y y y w w 9 9 e e k k w w u u u u o o x x x y y y y y w 9 e e e k k w u o r x y q s 3 9 w 9 e k s w d v v v v v v v v v v v v v v v v x x y y y v 2 9 x y o w o s w e x x x x x x x x x x x x x x x x x x x x x x w w x x x x x x x x x x x f g j l n o u v v w w x 3 9 f g j l n o u v w y y y y y y y y 2 y 1 y b w w w w w w w w 1 1 w 1 p q t x y 6 7 6 f a e 1 1 y y y y y y y y y y y y y y y 9 9 k k w w y y y y y y y y y y y y y y y 9 9 a k k s t v w y 9 a e f g j k l n o s t u v w g
12
n r f a g d v k u y e 4 5 9 l 6 x j h 3 1 n 2 o 7 r f a t b s 8 g w m
Btwn1.5 Btwn2.5 y e 19 e y 79 y w 18 w y 74 9 y 17 x w 65 k y y a 16 y 9 59 y n 16 y k 47 s y 16 y o 44 l y 16 v y 44 w y s 44 y o 16 o x 43 v y 16 s 3 39 c v 35 f y 16 y a 31 y j 16 y j 31 y g 16 w 1 31 y g 31 w x 15 y f 31 1 k 15 n y 31 y t 15 l y 31 d x v 29 1 i 14 k 1 29 1 m 14 x u 29 y t 29 y u t 29 y r b y u 27 x o x 9 26 e e 25 3 s 12 x g 21 x l 11 x n 21 v j x 21 x f 11 l x 21 t f x 21 v x 11 q o 19 u x 11 a 3 19 n x x g 11 c 1 16 j x 11 k 2 15 o s 15 a i 1 14 9 x 10 d 1 14 m 14 r y 14 e 3 13 x k b 1 13 o q 3 e 7 u o 11 2 i 7 p s 11 m e e 2 11 o s q w 9 p s 5 p w 9 w t 5 w t 9 x 8 o u 5 i 2 7 w p 5 m 2 7 q w v 5 4 d 4 r u 5 b e 5 v b 5 u r e d 4 h b 5 3 q p 3 b q p 2 h 7 2 7 h 2 h 6 2 BtwnGN sort uniq k y 509 q o 494 t 1 w 395 2 v 341 s 9 y 303 o x 264 y w 239 k 3 a 215 o y 209 x u 199 w x 197 9 x 194 e u y 135 y v 113 y e 99 2 k 99 q w 93 v x 90 c 1 d 86 h 1 i 84 1 m 84 6 h 84 o s 83 b p w 81 p s 68 f x 62 g x 62 j x 62 l x 62 n x 62 e y s 59 y a 59 y r 59 m f y 55 g y 55 j y 55 l y 55 n y 55 i t y 44 t w 37 4 e 29 d 3 x 23 9 v 21 u o 17 3 e 15 q p 11 u r 11 6 b 5 5 b 3 n r f G7: Friendships in Zachery’s Karate Club Btwn1.5 a g v k d u y e 4 9 5 l 6 x j h 3 1 n 2 7 f G7: Friendships in Zachery’s Karate Club. Btwn2.5 o r a t b s g 8 q w m v k d i p c u y e 4 9 5 l 6 x j h 3 1 o 2 7 t b s 8 q w i m n r f p c a G7: Friendships in Zachery’s Karate Club. BtwnGN g v k d u y e 4 9 5 l 6 x j h 3 o 2 1 7 t b s 8 q w i m p c
13
Recomputing Between-nesses after every delete. GN does this
Recomputing Between-nesses after every delete? GN does this. Can it be done by just updating the existing btwn-nesses is some way? 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 Edges V1 V2 E 1 1_ Edge Map E1 1 E2 E3 E4 2-Level Stride=4, Edge pTree Level1 1 2 2 1 3 1 2 4 1 3 If 24 is deleted, it appears to be easy, namely, turn off bit 4 in 2 and bit 2 in 4, delete all three 2 hop SPs since they all three involve 24. So at this point it looks like it might just be a matter of deleting those SPs that involve the edge (either at the beginning, end or middle). However, if we delete 14 then 134 and 1342 are barnd new SPs which weren’t there before. It looks like one has to start fresh computing Between-ness??? 1 2 3 4 1 4 2 4 1 3 4 1 Btwn14=1 Btwn24=2 Btwn34=1 Ek is the map of edge endpoints from k (points adjacent to k). Shi is the map of endpoints of 2-hop Shortest Paths Through h then i. Of course, |Shi&Ek|=0 since there is a SP of length<2 to each vertex in Ek. Shij is the map of endpoints of 3-hop Shortest Paths Through h then i then j. Of course, |Shij&Ek|=|Shij&Shi|=0 since there is a SP of length<3 to each Ek and Shi. Etc. The Main Theorem: The full SP participation count of hk is a + b + ab where a is |Shk|+|Shki|+|Shkij|+… and b is |Skh|+|Skhi|+|Skhij|+… . Proof: Let s=c…dhke…f by any SP in which hk participates. If h=c then s is counted in |Shk|+|Shki|+|Shkij|+… If k=f then hk is counted in c then hk is counted in |Skh|+|Sikh|+|Sjikh|+… . If hc and kf then hk occurs in the middle of s and the “left half” of s (hk and left) is counted in a = |Skh|+|Sikh|+|Sjikh|+… and the “right half” (hk and right) is counted in b = |Shk|+|Shki|+|Shkij|+… therefore s is counted in ab. Thus Participation Count of hk a + b + ab. But every a + b + ab counts a SP, so we get =.
14
Clique Existence Thm (CLQe) Let G=(V,E) and WV with |W|=k and EW{ {x,y}E | x,y W}, then the induced subgraph, (W,EW)CLQk (is a k-clique) iff every induced (k-1)vertex subgraph of (W,EW)CLQk-1. key 1,1 1,3 1,2 1,5 1,4 1,6 2,1 1,7 2,3 2,2 2,5 2,4 2,6 2,7 3,1 3,3 3,2 3,5 3,4 3,7 3,6 4,1 4,3 4,2 4,5 4,4 4,7 4,6 5,2 5,1 5,3 5,5 5,4 5,7 5,6 6,2 6,1 6,3 6,4 6,6 6,5 6,7 7,2 7,1 7,3 7,4 7,6 7,5 7,7 1 2 4 3 6 G2 7 5 E EU CU C 8 10 9 20 30 40 E1 E2 E3 E4 E5 E6 E7 Clique Mining Thm (CLQm) finds all cliques using a closure property: Let Candk+1CliqueSet CCLQk+1. By the CLQe thm, CCLQk+1= all s of CLQk-pairs having k-1 common vertices. Let CCCLQk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be their kth (non-common) vertices respectively, then CCLQk+1 iff Evw=1 (Just check a single bit in PE.) CLQ2: 2 vertices, 1 edge, so just E, which as a list is: CLQ3: CLQ4: since 123,124 ,134 ,234 ,134 , 234 ,2341234. CLQ5=. Clique Existence Thm edge count (CLQec): C={1,2,3,4}, CU=C&EU. ct(CU)=comb(4,2)=4!/2!2!=6 CCLQ4. Is there an edge count Clique Mining Thm? have 124CS3 PE(1,4)=1 134CS3 PE(2,3)=1 234CS3 Have 123CS3 Have k=2: E= k=4: k=5: already have 567 PE(2,3)=1 So 123CS3 PE(2,4)=1 124CS3 PE(2,6)=0 PE(6,7)=1 567CS3 PE(1,7)=0 PE(1,5)=0 PE(3,4)=1 1234CS4 Have 1234 k=3: 1,2 1,1 key 1,4 1,3 1,7 1,6 1,5 2,2 2,1 2,4 2,3 2,6 2,5 3,1 2,7 3,2 3,4 3,3 3,5 3,6 4,1 3,7 4,4 4,3 4,2 4,6 4,5 5,1 4,7 5,4 5,3 5,2 5,6 5,5 6,1 5,7 6,3 6,2 6,4 6,6 6,5 7,1 6,7 7,3 7,2 7,4 7,5 7,7 7,6 1 PE 2 4 3 6 G3 7 5 Edge counting requires counting 1’s in mask pTree of each Subgraph (or candidate Clique, if we take the time to generate the CCSs – but then clearly the fastest way to finish up is simply to lookup the single bit position in E, i.e., use EC). Again, CLQec is: |UCC| = (k+1)!/(k-1)!2! iff CCLQk. The SG Clique Mining Alg only needs to find those pairs of subgraphs in CLQk that share k-1 vertices) then check E to see if the two non-shared vertices form an edge in G. The search for such pairs is standare in the Apriori ARM alg and has therefore been optimized and engineered ad infinitum!) 1,1 key 1,3 1,2 1,5 1,4 1,6 1,8 1,7 2,2 2,1 2,4 2,3 2,7 2,6 2,5 3,1 2,8 3,2 3,3 3,6 3,5 3,4 3,8 3,7 4,2 4,1 4,3 4,4 4,6 4,5 4,7 5,1 4,8 5,2 5,3 5,5 5,4 5,6 5,8 5,7 6,2 6,1 6,4 6,3 6,7 6,6 6,5 7,1 6,8 7,3 7,2 7,5 7,4 7,6 7,8 7,7 8,2 8.1 8,4 8,3 8,6 8,5 8.8 8,7 1 E 2 4 3 6 G4 7 5 8 PE(2,3)=1 234CS3 PE(1,4)=1 134CS3 Have PE(4,8)=1 248CS3 PE(4,8)=1 348CS3 PE(4,8)=1 12348CS5 k=2: k=4: PE(2,3)=1 123CS3 PE(2,4)=1 124CS3 PE(2,8)=1 128CS3 PE(2,6)=0 PE(3,8)=1 138CS3 PE(4,8)=1 148CS3 PE(1,5)=0 PE(1,7)=0 PE(6,8)=0 PE(3,8)=1 238CS3 have PE(6,7)=1 567CS3 k=5: k=6: PE(3,8)=1 1238CS4 PE(4,8)=1 1248CS4 PE(3,8)=1 1348CS4 k=3: PE(2,4)=1 1234CS4 PE(4,8)=1 2348CS4
15
Clique Mining Thm (CLQm)
157CS3 E36=0 368CS3 E27=0 EG5 2-level str=8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 8 7 G5 CLQ2: 2 vertices, 1 edge, so just E, which as a list is: So CLQ3: and CLQ4 = E25=0 E14=0 Calculating pairwise &s is unnecessary! The most efficient algorithm is to consider CCLQk+1 from lowest common vertex set to highest (i.e., start with the lowest k and work up always keeping the max of the shared sets as low as possible). For every found candidate pair from CCLQk+1 sharing k-1 vertices in which >1 unshared vertex is higher than said shared max, check for an edge connecting those unshared vertices. G5.1 1 2 4 3 6 7 5 8 9 a b c d e f g CLQ2: c ac bc df dg fg CCLQ3: ac abc dfg CLQ3: y y CCLQ4: CLQ4: 1 2 3 4 5 6 7 8 9 a c b E G6 CLQ2: c a 9a 9b 9c ab ac bc CCLQ3: c 23c a 89a CLQ3: y y y y 9ab 9ac 9bc abc y y y y CCLQ4: ab 89ac 9abc CCLQ5= CLQ4: y
16
Clique Mining Thm (CLQm)
1 g 2 9 3 a 4 6 5 7 8 b c e d h f j k l m n o p q r s t u v w x y i E I E 1 2 3 4 5 6 7 8 9 a b c d e f g h I j k l m n o p q r s t u v w x y A early exist for stealth programmers: (W,EW)CLQk iff every induced (k-1)vertex subgraph of (W,EW)CLQk-1. This tells us that 12348CLQ5. We know it is max containing {8}, since if there were other vertices in a bigger clique they would have shown up here. Can we now delete 12348??? Are their other early exits? Other execution time issues? abcdefghijklmnopqrstuvwxy i c m 8 2 1 3 4 e k d 5 6 7 b h p r w q s t o g u f n l j a x v 9 y 1 with b 2c 2d 2e 2i 2k 2m 2w b 3c 3d 3e 3i 3k 3m 3w b 4c 4d 4e 4i 4k 4m 4w b 5c 5d 5e 5i 5k 5m 5w b 6c y y y y y y y y y y y y y y y y y y 2 with e 1i 1k 1m 1v e 3i 3k 3m 3v 48 4e 4i 4k 4m 4v 8e 8i 8k 8m 8v ei ek em ev ik im iv km kv mv y y y y y y y y y y y y 3 with a 1e 1s 1t 1x a 2e 2s 2t 2x a 4e 4s 4t 4x 89 8a 8e 8s 8t 8x 9a 9e 9s 9t 9x ae as at ax es et ex st sx tx y y y y y y y y y y y 4 with d 1e d 2e 38 3d 3e 8d 8e ed y y y y y y y y y y 5 with 17 1b 7b y y 6 with 17 1b 1h 7b 7h bh y y y 7 with h 56 5h 6h y y y 8 with y y y y y y 9 with 13 1v 1x 1y 3v 3x 3y vx vy xy y y y a with 3y b with y y c with d with 14 y e with y y 34 3y 4y y y y y y y f with xy g with xy h with 67 y i with 12 y j with xy k with 12 1y 2y y l with xy m with 12 y n with xy o with qs qu qx qy su sx sy ux uy xy y p with qs qw sw y q with op ow pw y r with uy y s with 3o 3p 3y op oy py y t with 3w 3y wy y u with or ox oy rx ry xy y y y v with 29 2x 2y 9x 9y xy y y W: 1p 1q 1t 1x 1y pq pt px py qt qx qy tx ty xy y y x: 39 3f 3g 3j 3l 3n 3o 3u 3v 3w 9f 9g 9j 9l 9n 9o 9u 9v 9w gj gl gn go gu gv gw jl jn jo ju jv jw ln lo lu lv lw no nu nv nw ou ov ow uv uw vw y y y y: 9a 9e 9f 9g 9j 9k 9l 9n 9o 9r 9s 9t 9u 9v 9w ae af ag aj ak an ao ar as at au av aw ef eg ej ek en eo er es et eu ev ew fg fj fk fn fo fr fs ft fu fv fw gj gk gn go gs gt gu gv gw y y: jk jn jo jr js jt ju jv jw kn ko kr ks kt ku kv kw no nr ns nt nu nv nw or os ot ou ov ow rs rt ru rv rw st su sv sw tu tv tw uv uw vw y y y y We already know is a MCLQ5. What other CLQ3s? 12e 12i 12k 12m e 14d 14e b e 34e 39x 24e e 16b 67h 39x 3vy osy pqw ruy tvw osu oux ouy ruy 9vx 9vy pqw twy 39x 9vx oux 9vy
17
n r f a g d v k y u e 4 5 9 l 6 x h j 3 1 2 o 7 t b s 8 w m q i p c
Clique Mining Thm (CLQm) on G7 1 with b 2c 2d 2e 2i 2k 2m 2w b 3c 3d 3e 3i 3k 3m 3w b 4c 4d 4e 4i 4k 4m 4w b 5c 5d 5e 5i 5k 5m 5w b 6c y y y y y y y y y y y y y y y y y y 2 with e 1i 1k 1m 1v e 3i 3k 3m 3v 48 4e 4i 4k 4m 4v 8e 8i 8k 8m 8v ei ek em ev ik im iv km kv mv y y y y y y y y y y y y 3 with a 1e 1s 1t 1x a 2e 2s 2t 2x a 4e 4s 4t 4x 89 8a 8e 8s 8t 8x 9a 9e 9s 9t 9x ae as at ax es et ex st sx tx y y y y y y y y y y y 4 with d 1e d 2e 38 3d 3e 8d 8e ed y y y y y y y y y y 5 with 17 1b 7b y y 6 with 17 1b 1h 7b 7h bh y y y 7 with h 56 5h 6h y y y 8 with y y y y y y 9 with 13 1v 1x 1y 3v 3x 3y vx vy xy y y y a with 3y b with c with d with 14 y e with y y 34 3y 4y y y y y y y f with xy g with xy h with 67 i with 12 j with xy l with xy m with 12 k with 12 1y 2y n with xy o with qs qu qx qy su sx sy ux uy xy p with qs qw sw q with op ow pw r with uy s with 3o 3p 3y op oy py t with 3w 3y wy u with or ox oy rx ry xy y y y v with 29 2x 2y 9x 9y xy W: 1p 1q 1t 1x 1y pq pt px py qt qx qy tx ty xy y y x: 39 3f 3g 3j 3l 3n 3o 3u 3v 3w 9f 9g 9j 9l 9n 9o 9u 9v 9w gj gl gn go gu gv gw jl jn jo ju jv jw ln lo lu lv lw no nu nv nw ou ov ow uv uw vw y y y y: 9a 9e 9f 9g 9j 9k 9l 9n 9o 9r 9s 9t 9u 9v 9w ae af ag aj ak an ao ar as at au av aw ef eg ej ek en eo er es et eu ev ew fg fj fk fn fo fr fs ft fu fv fw gj gk gn y: go gs gt gu gv gw jk jn jo jr js jt ju jv jw kn ko kr ks kt ku kv kw no nr ns nt nu nv nw or os ot ou ov ow rs rt ru rv rw st su sv sw tu tv tw uv uw vw y y y y UCLQ3s p1 p2 p3 b 1 5 b 1 6 d 1 4 e 1 4 e 1 2 e 1 3 e 2 3 e 2 4 e 3 4 h 6 7 i 1 2 k 1 2 m 1 2 o s y o u x o u y p q w r u y t w y v x 9 v y 9 x 3 9 1 2 8 1 2 3 1 2 4 1 3 9 1 3 4 1 3 8 1 4 8 1 5 7 1 6 7 2 3 4 2 3 8 2 4 8 3 4 8 CCLQ4: b1 with 56 e1 with e2 with 34 ou with xy 12 with 13 with 23 with 48 e4 with e3 with 12 oy with su v9 with xy 18 with 14 with 17 with 28 with 34 14 with de 67 with 1h 12 with ik im km uy with ro 39 with 1x 38 with 12 48 with i c m 8 2 1 3 4 e k d 5 6 7 b h p r w q s t o g u f n l j a x v 9 y UCLQ4: 123e 124e 134e e CCLQ5: 123 with 48 4e 8e UMCLQ5: e UMCLQ3 b 1 5 b 1 6 d 1 4 h 6 7 i 1 2 k 1 2 m 1 2 o s y o u x o u y p q w r u y t w y v x 9 v y 9 x 3 9 1 3 9 1 5 7 1 6 7
18
No two share and edge so there are no 4Cliques.
Clique Mining on G10. At each step, we branch (in parallel?) to each of the lowest degree vertices. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 G10: Web graph of pages of a website and hyperlinks. Communities by color (Girvan Newman Algorithm). |V|=180 (1-i0) and |E|=478. We have unPTrees (undirected graph). inPTrees (showing all incoming edges and where they come in from) and outPTrees. 45 78 46 47 48 49 50 51 c5 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0 UCLQ3pTrees: for Max Ct=26 vertex=91. All & with 91 have Ct=0 so 91 is part of no 3cliques a0 a1 a2 a4 b1 b4 c6 c7 d9 e0 h8 h9 UCLQ3pTrees: for Ct=24 vertex=D2. All & with D2 have Ct=0 so D2 is part of no 3cliques a0 a1 a2 a4 b1 b4 c6 c7 d9 e0 h8 h9 G10 UCLQ3pTrees: for Ct=23 vertex=38. All & with 38 have Ct=0 so 38 is part of no 3cliques G for Ct=14 vertex=52 & Ct=0 so 52 part of no 3clique G for Ct=13 vertex=174 is part of 3cliques H0 H2 H4 and H3 H4 I0 4681d0g6h0h1h2h3h5h6h7h8i0 h4h4h4h4h4h4h4h4h4h4h4h4h4 G10 Ct(B2)=9 part of 3clique, B2 b2b2b2b2b2b2b2b2b2 b1b3c0h1 G10 Ct(45)=9 &cts=0 G10 Ct(78)=9 &cts=0 G10 Ct(49)=8 all 0s G10 Ct(81)=8 all 0s G10 Ct(C4)=7 all 0s G10 Ct(A7)=5 all 0s G10 Ct(H9)=5 all 0s G for Ct=13 vertex=46 & Ct=0 so 46 part of no 3clique d2h4 There are only three 3Cliques: {H0 H2 H4} {H3 H4 I0} {45 76 B2} (I quickly checked the rest). No two share and edge so there are no 4Cliques. The fact there are so few cliques may be a characteristic of web page link graphs. Was it worthwhile doing the Clique analysis? Yes! The 8 vertices involved in the three 3Cliques (and the three cliques themselves) are outliers! We can examine each to try to determine what’s unique about them. What does it mean that the three vertices {H3 H4 I0} are a 3Clique in the undirected graph of page references. In this case, after close examination, we see that they form a cycle (in the directed graph sense). Should there ever be circular references like that in web pages? The 3Clique {45 76 B2} appears to be a mistake (no edge from 45 to 76). The clique {H0 H3 H4} does not appear to be a cycle.
19
Topdown k-plex Mining Algorithm: If G isn’t a k-plex, Let H1 be an ISG of G which is simply G with a vertex of least degree removed. If H1 still isn’t a k-plex, let H2 be an ISG of H1 with a vertex of least degree (in H1) removed, etc., until we find Hj is a k-plex. Remove Hj and restart the algorithm until all vertexes are removed. Note, we know Hj exists since an edge is a 0-plex. Letting H be an ISG and |VH|=h, |EH|=H, H=h(h-1)/2. H is a k-plex iff H–Hk. Downward Closure: If H a k-plex and F is an ISG of H, then F is a k-plex (If F is missing an edge, H is missing it too. So, F can’t be missing more edges than H). Edges are 0-plexes. |E{123}| #edges_in_induced_subgraph_123 = 3 so 123 is a 0-plex(a clique). |E{124}| = 3 so 124 is a -0plex (clique) G=12*11/2=66 and G= so G is a kplex for k = 47. H1=ISG{ abc} (degG5=2). H1=11*10/2=55, H1=17. H1 is a kplex for k 38. H2=ISG{ abc} (degH16=2). H2=10*9/2=45, H2=15. H2 is a kplex for k 30. (Must we AND all Fx&E5’ x5 to get the degH1(x)s? No! We already retrieved E5={6,7} so we just decrement the 1Counts (of 6 and 7) by 1 each (to 2 and 2) ). 1 4 2 3 5 6 7 c 9 b a 8 G6 H3=ISG{123489abc} (degH27=1). H3=9*8/2=36, H3= H3 is a kplex for k 22. H4=ISG{12389abc} (degH34=2). H4=8*7/2=28, H4= H4 is a kplex for k 16. H5=ISG{1239abc} (degH48=2). H5=7*6/2=21, H5= H5 is a kplex for k 11. H6=ISG{239abc} (degH51=2). H6=6*5/2=15, H6= H6 is a kplex for k 7. H7=ISG{39abc} (degH62=1). H7=5*4/2=10, H7= H7 is a kplex for k 3. H8=ISG{9abc} (degH73=1). H8=4*3/2=6, H8= H8 is a kplex for k 0. So take {9abc} out of G (call it G1) and start over. G1={ } G1=8*7/2= G1= G1 is a kplex for k 18 =deg 1 3 2 4 5 6 7 8 9 a c b E H1=ISG{ } (degG18=1). H1=7*6/2=21, H1= H1 is a kplex for k 12. deg= H2=ISG{234567} (degH11=2). H2=6*5/2=15, H2= H2 is a kplex for k 9. deg=112223 H3=ISG{34567} (degH22=1). H3=5*4/2=10, H3= H3 is a kplex for k 6. deg=01222 H4=ISG{4567} (degH33=0) H4=4*3/2=6, H4= H4 is a kplex for k 2. deg=1222 H5=ISG{567} (degH44=1) H5=3*2/2=3, H5= H5 is a kplex for k 0. deg=222 So take {567} out of G1 (call it G2) and start over. G2={12348} G2=5*4/2= G2= G2 is a kplex for k 5. 33220=deg This is what we want ! 1234 is a 1-plex (missing 1 edge). 124 was determined to be a clique (0-plex) It’d have been great if 123 was revealed as a clique and if 89abc was detected as a 1plex before 9abc was detected as a clique and removed. Can we modify the algorithm to do that? We’ll try by returning to remove all degree ties before moving on (on the next slide). NOTE: We only used E, and never used SP2, SP3, SP4 and that’s significant because those structures are hard to generate! H1=ISG{1234} (degG28=0) H1=4*3/2=6, H1= H1 is a kplex for k 1. deg=3322 H2=ISG{124} (degH13=2) H2=3*2/2=3, H2= H2 is a kplex for k 0. deg=222 Miscellany: S(V,E) is a k-plex iff C(|V|,2)–|E|=|V|(|V|-1)/2-|E| k iff |V|2/2–|V|/2–|E|–k 0. Adding |V| to both sides iff |V|2/2+|V|/2–|E|–k |V|. If S’(V’,E’) adds 1 vertex, x, to S and adds only odx new edges out from x, S’ a k-plex iff (|V|+1)|V|/2–(|E’|+|E|) k iff |V|2/2+|V|/2 -|E|–k odx. odxv so can say only that S=k-plexS’=kplex if odx=v (obvious). If S is a k-plex missing h edges (so the slack is k-h more edges can be missing), and S’ is as above, S’ is a k-plex iff k-hv-odx or odxv-k+h. And odx=Ct(ES’,x). So a bottom up approach (larger and larger SuperGraphs) might use this fact??
20
Topdown k-plex k-core Mining Alg If G isn’t a k-plex, Let H1 be an ISG of G with a vertex of least degree removed. In parallel remove all degree ties before moving on. If H1 still isn’t a k-plex, let H2 be an ISG of H1 with a vertex of least degree (in H1) removed, etc., until we find Hj is a k-plex (usually our interest is in k=0). Remove Hj and restart the alg until all vertexes are removed. Note, we know Hj exists since an edge is a 0-plex. H an ISG and |VH|=h, |EH|=H, H=h(h-1)/2. H is a k-plex iff H–Hk. Downward Closure: If H a k-plex and F is an ISG of H, then F is a k-plex (If F is missing an edge, H is missing it too. So, F can’t be missing more edges than H). A k-core is a Subgraph containing k edges. A COMBO(|V|,2)-core is a clique. Upward Closure of k-cores: If H is a kcore and H is an ISG of F, then F is a kcore. 1 3 2 4 5 6 7 8 9 a c b E 1 4 2 3 5 6 7 c 9 b a 8 G6 { abc} 47plex 19core Topdown Mining all kplexes and kcores. At each step, we [potentially] branch to each of the lowest degree vertices. { abc} 37plex 17core { abc} 30plex 15core { abc} 30plex 15core { abc} 22plex 14core { abc} 22plex 14core same asstop { abc} 16plex 12core { abc} 16plex 12core { abc} 11plex 10core { abc} 11plex 10core { abc} 11plex 10core { abc} 11plex 10core same asstop { abc} 6plex 9core { abc} 6plex 9core same as stop { abc} 7plex 8core { abc} 7plex 8core { abc} 2plex 8core 24433 { abc} 3plex 7core { abc} 3plex 7core same as stop { abc} 0plex 6core 3333 { abc} 0plex 6core 3333 So take {9abc} and start over. { } 18plex 10core { } 12plex 9core { } 8plex 7core { } 8plex 7core { } 12plex 9core { } plex 5core 1 2223 { } plex 5core { } plex 5core { } plex 5core { } 4plex 6core { } 4plex 6core same as stop { } plex 4core 1223 { } plex 4core 1223 same as stop { } plex 4core { } plex 4core same as stop { } 1plex 5core 3322 { } plex 3core 222 { } plex 3core 22 2 { } 0plex 3core 22 2 same as stop { } 0plex 3core 222
21
Topdown kplex/kcore mining On G7. Delete lowest degree vertices.
1 g 2 9 3 a 4 6 5 7 8 b c e d h f j k l m n o p q r s t u v w x y i E I G7 abcdefghijklmnopqrstuvwxy 4 cliques not revealed (Son’t combine steps (e.g., del 0,1,2,3 at 1 time)? g9a bg 12333 1333 af 13 222233 ae ae 11223 1 223 3core 562 0plex 222 Del 25,26,32 restart ad ad same 1122 1 14core plex 555544 223 9891 1001 3 91 11 ad 11 6core plex 23322 core plex 1234e 10core plex Del 12348e restart core plex 332222 2333 5core 4034 1plex 3322 11 4cor 6717 2plx 3212 1 4cor 5677 2plx 1232 1 4cor 5671 2plx 2222 Del 5,6,7,11,17. restart. 222 0plx 67h 3cor 232 67h 67b 2cor 211 1plx 57b 2cor 56b 2cor 112 1plx 567 2cor 233 3cor 404 0plx 222 233 3cor 403 0plx 222 Del 24,30,33,34. Restart. g9a bg Here we stick with just one count deleted at a time, fully in parallel for each lowest count value. f9a bg f bf e9a bg f9a af f9a af f9a bg e8a bg f9a af f9a af e8a bg f9a af f9a bf This is one del at a time. Oof! Let’s try one count at a time on next slide
22
Topdown kplex/kcore Mining on G7. At each step, we branch (in parallel
Topdown kplex/kcore Mining on G7. At each step, we branch (in parallel?) to each of the lowest degree vertices. Here delete just one count value at a time, but we delete all occurrences of that value at one time. We use this to cluster. It does not separate the white from the light blue. Mine kplexes/cores each step, del low cts. Then cluster based on lo plex, hi core affinity. 1 6 2 9 3 4 5 7 8 G7 2 1 3 4 5 6 8 7 9 10 11 12 14 13 15 17 16 18 20 19 22 21 23 25 24 26 28 27 29 30 31 33 32 34 abcdefghijklmnopqrstuvwxy We try recursing the process on blue only. af g9a bg ae ae f9a bg ad c core plex 2333 4234 No help! 2222 Does not separate white from light blue 2333 cor plx 2333 4234 4core 2222 2plex 1333 333 234 2core 211 1plex 233 434 2core 211 1plex 233 424 2core 112 1plex 233 423 2core 112 1plex 13 Take our 0plexes, { e 67h ox oy}, move up 1 step at a time stopping prior to overlap, we move up 1 step in thread 1 and 2 in threads 2 and 3. getting {1,2,3,4,8,14 24,32,33,34 5,6,7,11,17}. If we move up one more level in threads 1 and 2 we encounter overlap on 9. We want a low plex and a high core, so if we score threads with an overlapping vertex by increase in core minus increase in plex, 9 has thread1 score of (16-14)-(5-1)=-2 and thread2 score of (6-4)-(4-2)=0 so we put 9 in thread2. We can move thread 2 up one more level now and get: {1,2,3,4,8,14} {9,24,25,26,28,30,31,32,33,34} {5,6,7,11,17} 33 24 1core 11 0plex 23 43 1core 21 0plex 1 cor plx 1 core plex 112 567175 233220 1 core plex core plex 11 core plex Now if we finish off by putting each remaining vertex with the core to which it is maximally connected (break ties with core size?), we get: {1,2,3,4,8,12,13,14,18,22} {9,10,15,16,19,21,23,24,25,26,27,28,29,30,31,32,33,34} {5,6,7,11,17} 11 6717 4cor 3212 2plx 1 5677 4cor 1232 2plx 1 5671 4cor 2222 2plx 1 677 3core 222 0plex
23
Topdown Mining all kplexes and kcores on G10 At each step, we branch to each of the lowest degree vertices. Here delete several count values at a time. a0a1a2a3a4a5a6a7a8a9 b0b1b2b3b4b5b6b7b8b9c0c1c2c3c4c5c6c7c8c9d0d1d2d3d4d5d6d7d8d9e0e1e2e3e4e5e6e7e8e9f0f1f2f3f4f5f6f7f8f9g0g1g2g3g4g5g6g7g8g9h0h1h2h3h4h5h6h7h8h9i0 If we treat B10 as undirected, these are unique listings of edges. Del 1,2 counts: v Ct 29 1 38 7 40 3 43 3 45 7 46 12 47 3 48 3 49 7 50 3 52 3 55 3 56 2 72 3 73 3 74 3 76 4 77 1 78 6 79 3 80 2 81 5 88 3 89 1 91 9 99 3 a7 1 b2 6 c4 0 d1 3 d2 8 d9 2 e4 1 e5 1 h0 1 h1 2 h4 5 h9 5 i0 2 v Ct 38 5 40 3 43 3 45 7 46 12 47 3 48 3 49 7 50 2 52 2 55 3 72 3 73 3 74 3 76 3 78 6 79 3 81 4 88 3 91 7 99 2 b2 5 d1 3 d2 5 h4 2 h9 4 v Ct 38 5 40 3 43 3 45 7 46 9 47 3 48 3 49 7 55 1 72 3 73 3 74 3 76 3 78 6 79 3 81 3 88 2 91 6 b2 5 d1 3 d2 5 h9 4 Del Cts 1,2,3 v Ct 38 3 45 4 46 6 49 3 78 2 91 2 b2 1 d2 1 h9 2 12cor 24plx Del Cts 1,2 v Ct 38 3 45 3 46 3 49 3 6core 0plex Del and restart Del Cts 123 v Ct 50 1 52 1 56 1 77 0 78 2 79 2 81 4 91 3 b2 0 c4 0 d2 1 d9 2 h4 2 h9 4 i0 2 Del Cts 12 v Ct 78 2 79 2 81 4 91 3 d9 1 h4 2 h9 4 i0 2 10core 18plex DelCt 1 v Ct 81 1 91 2 h9 1 2core 1plex Not very effective but then, G10 is known to be without many cliques. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 G10: Web graph of pages of a website and hyperlinks. Communities by color (Girvan Newman Algorithm). |V|=180 (1-i0) and |E|=266. Vertices with OutDeg=0 (leaves) do not have pTrees shown because pTrees display only OutEdges and thus those OD=1 have a pure0 pTree. 45 78 46 47 48 49 50 51 c5 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0
24
Topdown Mining all kplexes and kcores on G8 At each step, delete lowest degree vertices as indicated. 1 2 3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 G8 37 35 27 26 28 29 31 32 33 30 51 50 34 49 Overlapping communities in a network of word association. The groups, labeled by the colors, were detected with the Clique Percolation Method by Palla et al. 18 D 0-6 v Ct 6 6 7 7 9 5 12 10 13 6 14 7 16 5 17 5 19 5 20 5 23 7 24 7 25 8 27 7 29 4 31 7 39 3 40 4 42 3 43 5 44 5 51 4 54 11 D 1,2 v Ct 1 2 2 3 3 1 8 2 9 0 16 1 35 1 41 2 51 0 52 2 D 0-1 At each step, delete 1 lowest degree vertex but we do several rounds of that before restarting alg. Del 3 then 4 Then 5 Then 6 v Ct 7 2 12 3 23 4 24 6 25 5 27 4 31 5 54 5 Del 1 then 2 Then 3 v Ct 8 3 14 2 16 4 17 5 19 4 21 2 35 3 52 3 Del 1 then 2 Then 2 again v Ct 13 1 14 3 16 2 19 3 20 2 21 3 Del 1 Then 1 again Then 1 sgain v Ct 4 2 8cor 5 2 7plx 13 4 46 3 47 2 48 3 Del 1 Then 1 again Then 1 sgain v Ct 29 1 8cor 33 2 7plx 51 1 D 1 2 3cor 2 2 0plx 41 2 8 1 1cor 52 1 0plx D all 5 Del 2 8 3 9cor 16 2 6plx 17 5 19 2 35 3 52 3 Del 2 13 1 14 3 16 2 19 3 20 2 21 3 Del 0 1 2 2cor 2 2 1plx 3 1 D 1 2 3 All cts=0 done. Del 3 cor plx 25 5 27 4 31 5 54 3 v Ct 5 1 9 2 11 1 14 1 16 2 22 2 30 1 38 1 43 1 46 1 48 1 51 2 D 1 Del 2 13 2 2cor 46 1 1plx 48 1 D D 345 6 4 7 5 12 6 13 5 14 4 23 5 24 6 25 5 27 4 31 5 54 5 Del 2 8 2 5cor 17 3 1plx 35 2 52 3 Del 3 cor plx 25 4 27 4 31 4 D Del 2 14 1 2cor 19 2 1plx 21 1 D Del 2 17 1 0plx 52 1 1cor D Del 1 then 2 Then 3 Then 4 Then 5 6 4 7 6 12 6 14 4 17 4 19 3 40 2 43 3 44 5 54 7 v Ct 22 1 1cor 51 1 0plx D 22 51 D 4 7 3 12 4 13 3 23 4 24 5 25 4 31 4 54 5 16core 12plex D 0,1 5 1 9 2 11 1 16 2 35 1 38 1 43 1 46 1 53 2 D 1 2 Science 1 Scientist 3 Astronomy 4 Earth 6 Moon 5 Space 7 Star 9 Intelligent 8 Ray 10 Golden 11 Glare 12 Sun 14 Moonlight 13 Sky 15 Eyes 17 Light 16 Sunshine 18 Lit 20 Brown 19 Dark 22 Orange 21 Tan 23 Blue 24 Yellow 25 Color 26 Gray 27 Black 29 White 28 Race 30 Green 31 Red 32 Crayon 34 Velvet 33 Pink 35 Flashlight 37 Dim 36 Glow 38 Gifted 40 Smart 39 Genius 42 Einstein 41 Inventor 43 Brilliant 45 Laser 44 Shine 46 Telescope 48 Sunset 47 Horizon 49 Ribbon 50 Violet 51 Purple 52 Beam 54 Bright 53 Night 1 4 2 3 5 6 7 8 9 D 3,4 24 1 54 1 1core 0plex D 24,54 rest. Del 3 cor plx 12 6 14 3 17 3 44 4 54 4 v Ct 9 2 3cor 38 1 0plx 43 1 5 1 1cor 46 1 0plx D all 5 D 1-9 6 2 7 3 12 3 17 3 19 1 23 2 25 2 31 2 Del 3 again 6 3 9cor 7 4 1plx 12 4 44 4 54 3 v Ct 11 1 1cor 15 1 0plx 30 1 1cor 32 1 0plx 35 1 1cor 37 1 0plx 47 1 1cor 48 1 0plx D these 8 leaves all 0 counts! Del 3s 1 at a time 6 3 6cor 7 3 0plx 12 3 44 3 D 12 v Ct 7 1 1cor C 1 0plx D 7,c 6 0 cor plx cor plx 2 D 17,19,23 25,31 Del 3s 1 at a time 7 3 6cor 12 3 0plx 44 3 54 3 Del v Ct 14 2 16 1 2cor 53 1 1plx D 14,16,53 Del 1 then 2 Then 3 Then 4 Then 3 again Then 2 again cor plx 40 5 41 3 42 5 43 4 D 1-6 6 0 39 2 cor 2 0plx D 39,40,42 D 1-4 cor plx 44 1 20 2 cor plx 51 0 D all 6 Del 3 cor plx 40 4 42 4 43 4 D
25
Breadth-First Inductive Clique Search Alm: Let CLQK be the set of all Kcliques, 1st find CLQ3 using CS0 (Common Siblings=0) or using CCLQ3. Breadth-1st Clique Alg: Find CLQ3. Induction theorem: A Kclique and 3clique that share an edge form a (K+1)clique iff all K-2 edges from the non-shared Kclique vertices to the non-shared 3clique vertex exist. Next find CLQ4, then CLQ5, … 18 12 22 8 2 1 3 4 14 20 13 5 6 7 11 17 25 27 32 26 28 29 24 16 30 15 23 21 19 33 10 31 9 34 G7 on G7 ( List Version): 1 2 3 4 5 6 7 8 9 E 1 2 3 18,20,22CLQ4 since 3:18,20,22E3 1 1 1 2 2 2 3 4 3 4 since 34E3 1 2 3 8 1 2 3 14 1 1 1 3 3 3 2 4 2 4 Note checkback. Is it required? (No: if 132 in 4CLQ it’d show up already). Already in CLQ4 1 2 4 8 1 2 4 14 1 3 4 8 1 3 4 14 2 3 4 8 2 3 4 14 UCLQ4 done. pTree version faster? UCLQ3 Unique 3cliques as lists UCLQ5 MUCLQs 1 2 18 1 2 20 1 2 22 1 3 9 1 4 13 1 5 7 1 5 11 1 6 7 1 6 11 3 9 33 1 2 3 4 8 1 2 3 4 14 1 2 3 4 8 1 2 3 4 14 6 7 17 9 31 33 9 31 34 24 28 34 24 30 33 24 30 34 25 26 32 27 30 34 29 32 34 CLQ3 (as pTrees) Remaining edges after CS0 (removal of PURE0 edge endpoint pair ANDs). Is there a pTree Version of this Algorithm ? Is it faster?
26
n-clique (n-clan, n-club) Community Search on G6
n-clique = subgraph s.t. pairs a geodesic path of length n (all pairs are n-connected). Downward Closures: SubGraph, SGDC (Every subgraph of an n-clique is an n-clique.) Path Length, PLDC (If C is an n-clique then C is an (n-1)-clique.) Apriori, ADC (If C,D are n-cliques of size=k sharing k-1 vertices, then CD is (n+1)-clique) n-clan = n-clique with diameter n n-club = subgraph of diameter =n. 1 4 2 3 5 6 7 c 9 b a 8 G6 SP1 1 2 3 4 5 6 7 8 9 a b c 1-clique=clique We already have many clique search algorithms, but the pattern we are going to use for n=2… may give us another one?: 1 is 1conn to clique? (23 edge?). Yes! clique? (24 edge?) Yes! clique? (34 edge?) No! not a clique by SGDC. SP2 1 2 3 4 5 6 7 8 9 a b c 1 2 3 4 5 6 7 8 9 a b 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 2 is 1conn to clique? No! 3 is 1conn to just c. 4 is 1conn to just 7. 5 is 1conn to clique? (67 edge?) Yes! 6 is 1conn to clique? (78 edge?) No! 8 is 1conn to 9a. 89a 1clique? (9a edge?) Yes! 9 is 1conn to abc. 9ab 1clique? Yes! 9ac 1clique? Yes! 9bc 1clique? Yes! So 9abc is 1clique by ASGDC and abc is 1clique by SGDC. 1 is 2conn to 7c. 17c 2clique? 7c turned on in SP1 or SP2? No! 2 is 2conn to 7c. 27c 2clique? 7c turned on in SP1 or SP2? No! SP3 1 2 3 4 5 6 7 8 9 a b c 1 2 1 3 1 3 1 4 1 5 1 6 1 7 1 3 2con to 49abc clique? 49 on in SP1|2? N! 34a? 4a on SP1|2? N! b? 4b on SP1|2? N! c? 4c on SP1|2? N! 39a 2clique?Y! (9a on in SP1) b 2clique? Y! (9b on SP1) c 2clique? Y! (9c on SP1). 3ab 2clique? Y! (ab on in SP1). 3ac 2clique? Y! (ac on in SP1). 3bc 2clique? Y! (bc on in SP1). 39ab, 39ac, 3abc, are 2cliques by ASGDC. Research questions: Did I miss anything by using the lower triangular matricies? What about n-clans and n-clubs? Would it be better to start with SP4, then SP3, then SP2, then SP1 by virtue of PLDC or some other closure or property? Keeping in mind that it is a big task to creat SPk’s for large graphs, is there a better way? 4 2con to clique?Y! (56 on in SP1). 5 2con to 8 only. 6 2con to 9a a 2clique?Y! (9a on in SP1). 7 2con to 8 only. 1 2 1 3 1 8 2con to bc bc 2clique?Y! (bc on in SP1). SP4 1 2 3 4 5 6 7 8 9 a b c 4 1 5 5 1 7 1 7 9 a b are not 2con to anybody new. Upward Closures: Apriori, ADC (If C,D are n-cliques of size=k sharing k-1 vertices, then CD is (n+1)-clique) All SubGraphs, ASGDC (C a subgraph size=k and all subgraphs of C of size (k-1) are n-cliques then C is a n-clique?? n-clan = n-clique with diameter n n-club = subgraph of diameter =n. SP 1 2 3 4 5 6 7 8 9 a b c 1 2 1 3 2 4 1 3 2 2 1 3 4 2 1 2 1 3 3 2 1 3 2 1 3 2 1 3 2 4 1 4 2 1 4 2 1 3 4 2 1 3 5 1 2 5 1 2 3 5 1 2 3 4 6 1 2 6 1 2 3 7 2 7 2 3 7 2 3 4 8 1 2 1 is 4conn to 8 so 18 is a 4-clique. 2 is 4conn to 8 so 28 is a 4-clique. 3 is 4conn to 56, 5 is 1conn to 6 so 356 is a 4-clique. 4 is 4conn to 9ab, 9a 9b ab are edges so 49ab is a 4-clique. 5 is 4conn to bc, bc is an edge so 5bc is a 4-clique. 7 is 4conn to bc bc, bc is an edge so 7bc is a 4-clique. 1 is 3conn to 569ab checking it out, 169ab is a 3-clique.
27
G7 G6 has 4 max cliques: abc. 8 is the only point not in an MCLQ and 8 is maximally connected to 9abc. So the G6 Clique Aura Partition of: abc 1 3 2 4 5 6 7 8 9 a c b E G6 1 c g 2 9 3 a 4 6 5 7 8 b e d h f j k l m n o p q r s t u v w x y i E I 1 c 7 2 6 3 8 4 5 9 e o s v w x y E a b d f g h j I k m l n p q r t u 1 c b 2 7 3 9 4 5 6 8 e k o p q s t u v w x y E a d g f h I j m l n r 1 c 6 2 3 7 4 5 8 9 e v x y E a b d f g I h j l k m o n p q s r t u w abcdefgh ijklmnopqrstuvwxy
28
K-Degree-Difference Community Search: H SG s.t. ddHIntDegH-ExtDegHk.
Thm: If hH, ddH-h = ddH – (2idh - edh). So want to remove h s.t. (2idh – edh) is min. 1 4 2 3 5 6 7 c 9 b a 8 G6 1 3 2 4 5 6 7 8 9 a c b E H=G= { abc} ddH=38 id= ddH/|VH|=38/12=3.16 ed= Remove 5 Very Simple Weighted SP1 and SP2 K-plex Search Weighting: 0,1path nbrs of x times 3; 2path nbrs of x times 2; Until all degrees are weighted, then back to actual subgraph degrees UNWEIGHTED Degrees H={ abc deg H= { abc} ddH=34 id= ddH/|VH|=34/11=3.09 ed= 2id-ed= Remove 6,7 H= {123489abc} ddH=26 id= ddH/|VH|=26/9=2.88 ed= 2id-ed= Remove 4,8 H={ abc deg x=1 H={ c H=15 H=7 kplex k8 deg x=1 after cutting 234 H={12345H=6 H=5 kplex k1 deg99992x=1, after cut 23468 H= {1239abc} ddH=16 id= ddH/|VH|=16/7=2.28 ed= 2id-ed= Remove 1,2 H={ abc deg x=2 H={ c H=15 H=7 kplex k8 deg x=2 after cutting 234 H={12345H=6 H=5 kplex k1 deg99992x=2, after cut 23468 H={ abc H=3 H=3 0plex deg x=3 after cut 1 (actual SG degrees) H= {39abc} ddH=10 id= ddH/|VH|=10/5= 2 ed= 21100 2id-ed=05568 Remove 3 H={ abc deg c x=3 H={123 c H=6 H=4 2plex deg x=3, after cut 2368 H={ abc deg x=4 H={ abc H=3 H=3 0plex deg x=4 after cut 2346 H= {9abc} ddH=9 id= 3333 ddH/|VH|=9/4=2.25 ed= 1101 2id-ed=5565 CLQ. Start over w H={ abc deg x=5 H={ abc H=10 H=5 5plex deg x=5 after cut 34 H={ abc H=3 H=3 0plex deg x=5 after cut 1 from SG degs H= { } ddH=17 id= ddH/|VH|=17/8=2.13 ed= 2id-ed= Remove 8 H={ abc deg x=6 H={ abc deg x=6 after cut 34 H={ abc H=3 H=2 1plex deg x=6 after cut 12 SG degs 211 H= { } ddH=17 id= ddH/|VH|=16/7=2.28 ed= 2id-ed= Remove 3,6 H={ abc deg x=7 H={ abc deg x=7 after cut 34 H={ abc H=3 H=3 0plex deg x=7 after cut 1 SG degs H= {12457} ddH=6 id= ddH/|VH|=6/5=1.2 ed= 11011 2id-ed=33613 Remove 5 H={ abc deg cc68 x=8 H={ abc deg cc68 x=8 after cut 34 H={ abc plex deg x=8 after cut12 SG degs H= {1247} ddH=4 id= 2231 ddH/|VH|=4/4=1 ed= 1102 2id-ed=3360 Remove 7 H={ abc deg cc9c x=9 H={ abc H=10 H=8 H a kplex k 2 deg cc9c x=9 after Cutting 2,3,6 H= {124} ddH=3 id= 222 ddH/|VH|=3/3=1 ed= 111 2id-ed=333 CLQ. Start over w 35678 H={ abc deg cc9c x=a H={ abc H=10 H=8 H a kplex k 2 deg cc9c x=a after cut 2,3,6 H={35678} id=02321 ed=30012 ddH=2 2id-ed= ddH/|VH|=2/5=.4 Remove 3 H={ abc deg cc9c x=b H={ abc H=6 H=6 H a kplex k 0 deg cc9c x=b after cut 2,3,6 H={ abc deg ccpc x=c H={ abc H=6 H=6 H a kplex k 0 deg cc9c x=c after cut 2,3,6 H={5678} id=2321 ed=0012 ddH=5 2id-ed= 4630 ddH/|VH|=5/4=1.2 Remove 8 By weighting the initial round we have gotten nearly perfect information for this example (G6). Weightings, 3 and 2, were arbitrarily chosen but worked here. In general, one should devise a formula to determine them. Also we could weight SP3 and etc. as well? If we have paid the price of constructing SPk k>1, this is a much simpler way to do it, as compared to the Clique Percolation method of Palla. H= {567} id=222 ed=011 ddH=4 2id-ed= 433 ddH/|VH|=4/3=1.33 Clique. remove 567. Start over w 38 (but it has 0 id)
29
G7 Very Simple Weighted SP1 k-plex Search on G7 Weighting:
0,1path nbrs of x times 1; 2path nbrs of x times 0; 1 2 1 3 1 2 1 3 2 1 4 5 1 5 2 1 6 2 1 7 2 1 8 2 1 9 2 2 1 3 2 1 2 1 2 3 1 4 2 1 5 5 2 1 3 6 2 1 3 2 7 1 8 2 1 4 9 2 1 3 3 1 4 3 1 4 2 3 1 6 3 1 4 3 1 6 SP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 1 6 2 1 9 3 1 4 1 6 5 1 3 6 1 4 7 1 4 8 1 4 9 1 5 H= H=561 H=77 kplx k484 D g9a bg kcore k77 Cut 123: H= H=120 H=38 kplx k82 D kcore k38 Cut 23: H= H=55 H=26 kplx k24 D kcore k26 Cut 24: H= H=15 H=12 kplx k3 D kcore k12 Cut 2: H= H=10 H=10 kplx k0 D kcore k10 {1,2,3,4, 14} is a clique. {1,2,3,4,9,14} is a 3plex. Cut0: H= H=21 H=4 kplx k17 D kcore k4 Cut 1 leaves 25 only. H= D af Cut012: H= H=55 H=19 kplx k36 D kcore k19 H= H=19 H=4 kplex k15 D kcore k4 Cut03: H= H=6 H=4 kplx k2 D kcore k6 {24,32,33,34} is a 2plex G7 Cut0: H= H=19 H=4 kplex k15 D kcore k4 Cut 0 leaves {9,31} as a 0plex H= D H= H=17 H=2 kplex k15 D kcore k2 Cut 0 leaves {27,30} as a 0plex Cut01: H= H=15 H=6 kplx k9 D kcore k6 Cut0: H= H=10 H=6 kplx k4 D kcore k6 {5,6,7,11,17} is a 4plex H= H=14 H=0 kplex k14 D kcore k0 no edges left H= D The expected communities are mostly not detected as kplexes or kcores. Cut0: H= H=21 H=4 kplx k17 D kcore k4 (Symbols for base 65 )
30
Simple Weighted SP1, SP2 K-plex Search on G8
3 4 5 6 40 41 42 46 7 13 12 14 44 53 17 48 54 8 16 52 45 9 43 39 38 10 20 21 24 11 15 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 G8 Weighting ,1path neighbors (12012) times 5 334 2 path nbrs (39893) times 3 next cut<18 x=1 instead cut<19 x=1 This gives C0={1,2,9,39,40,41,42,43} which is exactly the Intelligence Class except that v=38 (gifted) is missing. It is a kplex k8 (not that strong of a community!) x=1 Within the Intelligence Class this is the 1plex, C1={1, 2,40,41,42} ( only edge missing is (2,40) ) with C1-degrees: Thus if we cut next using C1-degrees (cut 2,40) leaves the clique (0plex) C2={1,41,42} Cutting C0 and starting over: G-C0 degs x=3 Weighting 0,1path neighbors (367) times 5 2 path nbrs ( ) times 3 next cut<10 x=3 next cut<12 x=3 This gives C2={3,4,5,6,7, ,13,14,15,17,23,25,31,44, , 53} Whereas, Astronomy is 3,4,5,6,7,8,10,11,12,13,14,16,17, ,45,46,47,48,52,53 so, not a good fit! With replacement but using as starting vertex, the remaining vertex of highest degree (first, v=12). Weighting 0,1 SP nbrs times 5 2 SP nbrs times 3 cut<20 x=12 cut<20 x=12 Astronomy is Weighting 0,1 SP nbrs times 6 2 SP nbrs times 3 cut<30 Astronomy is Weighting 0,1 SP nbrs times 6 2 SP nbrs times 1 5 astronomy vertices missing (3,5,45,46,53} and 2 non-astronomy included {21,24} x=25 Weighting 0,1 SP nbrs times 6 Colors is 4 colors missing but zero non-colors included. 44444ba5645g9746b2b864f9f49386d x=1
31
G9 K-plex search ANalyst TickerSymbol Relationship with labels
10 01 00 11 TS SA0 SA1 SS SB B C0 C1 C2 C3 Dow? AN 1 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 S F C A H a k-plex and F is a ISG, F is a kplex G=(V,E) a k-plex iff |V|(|V|-1)/2 – |E| k Women abcdefghi 18*17/2=153 degs=hfhfbffghhgghhhgcc |Edges| =139 kplex k14 k-plex=SG missing k edges. Women abcdefgh degs=gfgfbfffggffgggfc |14 Women abcdefg 1 degs=ffffbffeffeefffe |Edg Women abcdefg 15*14/2=105 degs=eeeeeeeeeeeeeee |Edges| =105 15kplex k0 15Clique So take out { abcdefg} and start over. Women5hi 3*2/2=3 degs=011 |Edges| =1 kplex k2 Womenhi 2*1/2=1 degs=11 |Edges| =1 kplex k0 Clique No info from kplex search alg to WSP2. Avoid the work? Notice the very high 1-density of the pTrees? (only 28 zeros)? Events abcde 14*13/2=91 degs=88888dddd88888 |Edge|=66 kplex k25 1 TS a e c AN pTree Ct AN 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ANalyst TickerSymbol Relationship with labels C Sal TS pTree Ct TS SA H B B SB B SS S S H H B B B SB Buy-Hold-Sell SA Dow? Events abcde Not calculating k degs= 7777cccc til it gets lower Events456789abcde degs= 55aaaa88888 Events abcde degs= 666bbbb88888 Events56789abcde 1 degs= Events6789abcde *8/2= A 9Clique! degs= |Edges|=36 kplex k0 So take out {6789abcde} and start over. Events *4/2=10 |Edges|=10 kplex k 0 A 5clique! degs: 44444 Community structure in multipartite networks. This bipartite graph refers to the Southern Women Event Participation data set. Women are represented as open symbols with black labels, events as filled symbols with white labels. The illustrated vertex partition has been obtained by maximizing a modified version of the modularity by NewmanGirvan, tailored on bipartite graphs (Barber07). Note that most bipartite graphs involve a subject part and an object part such as women-events, Investors-stocks, Subjects-Objects in a access restriction system. If we had used the full algorithm which pursues each minimum degree tie path, one of them would start by eliminating 14 instead of 1. That will result in the 9Clique and the 5Clique abcde. All the other 8 ties would result in one of these two situations. How can we know that ahead of time and avoid all those unproductive minimum degree tie paths? Every ISG of a Clique is a Clique so 6789 and 789 are Cliques (which seems to be the authors intent?) If the goal is to find all maximal Cliques, how do we know that CA= is maximal? If it weren’t then there would be at least one of abcde which when added to CA= would results in a 10Clique. Checking a: PCA&Pa would have to have count=9 (It doesn’t! It has count=5) and PCA(a) would have to be 1 (It isn’t. It’s 0). The same is true for bcde. The same type of analysis shows 6789abcde is maximal. I think one can prove that any Clique obtained by our algorithm would be maximal (without the above expensive check), since we start with the whole vertex set and throw out one at a time until we get a clique, so it has to be maximal? The Women associated strongly with the blue EventClique, abgde are { } and associated but loosely are { } associated strongly with the green EventClique, are { } and associated but loosely are {6 7 9}
32
Most bipartite graphs involve a subject part and an object part, e. g
Most bipartite graphs involve a subject part and an object part, e.g., women-events, Investors-Stocks, Customers-Items, Subject-Accessibility in system access . Most tripartite hypergraphs involve a subject, object and circumstance part, eg, Investor-Stock-Day. Most quadrapartite hypergraphs involve a subject, object and 2 circumstance parts, e.g., Customer-Item-Store-Day. Most multi-partite hypergraphs can be realized as a subject-object bipartite graph with circumstances as edge labels. A biclique is a complete subgraph of a bipartite graph (having every edge it is allowed to have and at least 1 (rules out the trivial case of no edges)). Every induced subgraph of a biclique is a biclique and thus the downward closure: If two Kbicliques overlap in K-2 verticies and if the other 2 vertices form an edge, then the union of the two k-biciques is a (k+1)biclique. B So we could start with two 3-bicliques sharing 2 points, x,y and not sharing two points, u,v (unless u and v are from different parts, there is nothing to check), except that the downward closure gives us a good way to get the set of 3bicliques, 3bCLQ. 1 A 2 Note, 2 2-bicliques (edges) that share a point form a 3biclique 3BCLQs 1AB 1AC 1AD 1AE 1AF 1AH 1AI 1BC 1BD 1BE 1BF 1BH 1BI 1CD 1CE 1CF 1CH 1CI 1DE 1DF 1DH 1DI 1EF 1EH 1EI 1FH 1FI 1HI 2AB 2AC 2AE 2AF 2AG 2AH BC 2BE 2BF 2BG 2BH CE 2CF 2CG 2CH EF 2EG 2EH 2FG 2FH 2GH 3BC 3BD 3BE 3FB 3BG 3BH 3BI 3CD 3CE 3CF 3CG 3CH 3CI 3DE 3DF 3DG 3DH 3DI 3EF 3EG 3EH 3EI 3FG 3FH 3FI 3GH 3GI 3HI 4AC 4AD 4AE 4AF 4AG 4AH CD 4CE 4CF 4CG 4CH DE 4DF 4DG 4DH EF 4EG 4EH 4FG 4FH 4GH 5CD 5CE 5CG 5DE 5DG 5EG 6CE 6CF 6CH 6EF 6EH 6FH 7DE 7DF 7DG 7EF 7EG 7FG 8FH 8FI 8HI 9EG 9EH 9EI 9GH 9GI 9HI aGH aGI aGL aHI aHL aIL bHI bHJ bHL bIJ bIL bJL cHI cHJ cHL cHM cHN cIJ cIL cIM cIN cJL cJM cJN cLM cLN cMN dGH dGI dGJ dGL dGM dGN dHI dHJ dHL dHM dHN dIJ dIL dIM dIN dJL dJM dJN dLM dLN dMN eFG eFH eFI eFJ eFL eFM eFN eGH eGI eGJ eGL eGM eGN eHI eHJ eHL eHM eHN eIJ eIL eIM eIN eJL eJM eJN eLM eLN eMN fGH fGJ fFK fFL fHJ fHK fHL fJK fJL fKL gHI hIK iIK The of each pair is a 2HUB-kSPOKE (k+2)biclique (nothing to check). The union of each of these 17 sets (e.g., 1AB..1HI) is a 1HUB-kSPOKE (k+1)biclique 12AB 12AC 12AE 12AF 12AH 12BC 12BE 12BF 12BH 12CE 12CF 12CH 12EF 12EH 12FH 13BC 13BD 13BE 13BF 13BH 13BI 13CD 13CE 13CF 13CH 13CI 13DE 13DF 13DH 13DI 13EF 13EH 13EI 13FH 13FI 13HI 38FH 38FI 38HI 14AC 14AD 14AE 14AF 14AH 14CD 14CE 14CF 14CH 14DE 14DF 14DH 14EF 14EH 14FH 15CD 15CE 15DE 16CE 16CF 16CH 16EF 16EH 16FH 17DE 17DF 17EF 18FH 18FI 18HI 19EH 19EI 19HI 1aHI 1bHI 1cHI 1dHI 1eFH 1eFI 1eHI 1gHI 29EG 29EH 29GH 2eFG 2eFH 2eGH 2aGH 2dGH 23BC 23BE 23BF 23BG 23BH 23CE 23CF 23CG 23CH 23EF 23EG 23EH 23FG 23FH 23GH 26CE 26CF 26CH 26EF 26EH 26FH 27EF 27EG 27FG 28FH 24AC 24AE 24AF 24AG 24AH 24CE 24CF 24CG 24CH 24EF 24EG 24EH 24FG 24FH 24GH 25CE 25CG 25EG 2fGH 35CD 35CE 35CG 35DE 35DG 35EG 34CD 34CE 34CF 34CG 34CH 34DE 34DF 34DG 34DH 34EF 34EG 34EH 34FG 34FH 34GH 36CE 36CF 36CH 36EF 36EH 36FH 37DE 37DF 37DG 37EF 37EG 37FG 39EG 39EH 39EI 39GH 39GI 39HI 3aGH 3aGI 3aHI 3bHI 3cHI 3dGH 3dGI 3dHI 3eFG 3eFH 3eFI 3eGH 3eGI 3eHI 3fGH 3gHI 4eFG 4eFH 4eGH 45CD 45CE 45CG 45DE 45DG 45EG 46CE 46CF 46CH 46EF 46EH 46FH 47DE 47DF 47DG 47EF 47EG 47FG 48FH 39EG 39EH 39GH 4aGH 4dGH 4fGH 56CE 57DE 57DG 57EG 59EG 67EF 68FH 69EH 6eFH 79EG 7eFG 89HI 8aHI 8bHI 8cHI 8dHI 8eFH 8eFI 8eHI 8gHI 9aGI 9aHI 9bHI 9cHI 9dGH 9dGI 9dHI 9eGH 9eGI 9eHI 9fGH 9gHI abHI abIL acHI acHL acIL adGH adGI adGL adHI adIL aeGH aeGI aeGL aeHI aeHL aeIL afGH bcHI bcHJ bcHL bcIJ bcIL bcJL bdHI bdHJ bdHL bdIJ bdIL bdJL beHI beHJ beHL beIJ beIL beJL bfHJ bfJL bgHI dgHI cdHI cdHJ cdHL cdHM cdHN cdIJ cdIL cdIM cdIN cdJL cdJM cdJN cdLM cdLN cdMN cfHJ cfHL cfJK cfJL cgHI dfGH dfGJ dfHJ dfHL dfJL ceHI ceHJ ceHL ceHM ceHN ceIJ ceIL ceIM ceIN ceJL ceJM ceJN ceLM ceLN ceMN efGH efGJ efFL efHJ efHL efJL egHI deGH deGI deGJ deGL deGM deGN deHI deHJ deHL deHM deHN deIJ deIL deIM deIN deJL deJM deJN deLM deLN deMN G9 A B C D E F G J K L M N 1 2 3 4 5 6 7 8 9 17 h 10 a 11 b 13 d 14 e 15 f 16 g G9: Bipartite graph of the Southern Women Event Participation. Women are numbers (18), events are letters (14) (89 edges) Or Investors are numbers, stocks are letters in a recommends graph 12 c 18 i H I We need to start with a smaller bipartite graph to get a feel for efficiencies and shortcuts. We will come back to G9 after that. a b c d e f g h i A B C D E F G H I J K L M N A B C D E F G H I J K L M N
33
G9B1 A B C D E F 1 2 3 4 5 6 7 Clique Mining Thm (CLQm)
Use AN ARM-like downward closure alg: bCCLQk = pairs from bCLQk-1 that share k-2 vertices. Each such bCLQk iff the non-shared points E (THROW OUT IF NON-SHARED PAIR COULD FORM AN EDGE BUT DOESN’T) Each such bCLQk iff the non-shared points are in the same part (do not qualify to be an edge). 1 2 3 4 A B D C G5b G5.1b 1 2 B C A 3 4 5 D 6 E 7 F 8 G H bCLQ3 1AC 4AC 5BC 6DE 7DE 8FG 8FH 8GH bCCLQ4 14AC 67DE 8FGH = bCLQ4 bCCLQ5 = 1 2 3 4 A B C D 3 1 2 bCLQ3 1AC 2AC 3BD 4AD A12 A14 A24 C12 D34 bCCLQ4 12AC 12AC 14AC 12AC 12AC 24AC 34BD 24AD 34AD 124A 12AC bCLQ4 12AC 124A bCCLQ5 12A4C 1 3 2 4 5 A B C D E F G G6 bCLQ3 1AD 1AE 1DE 2AD 2AE 2DE 3EF 3EG 3FG 4CF 4CG 4FG 5AB 5AC 5BC bCCLQ4 1ADE 12AD 12AE 12DE 2ADE 3EFG 34FG 4CFG 5ABC = CLQ4 bCCLQ5 12ADE 34EFG 34CFG F E D C B 1AC 1AD 1AE 1AF 1BC 1BD 1BE 1BF 1CD 1CE 1CF 1DE 1DF 1EF A 2AC 2AE 2AF 2BC 2BD 2BE 2BF 2CE 2CF 2EF 3BC 3BD 3BE 3BF 3CD 3CE 3CF 3DE 3DF 3EF 1AB 2AB 4AC 4AD 4AE 4CD 5CD 5CE 5DE 6CE 6CF 6EF 7EF Bclq3 G9B1 A B C D E F 1 2 3 4 5 6 7 A B C D E F 1AB 1AC 1AD 1AE 1AF 1BC 1BD 1BE 1BF 1CD 1CE 1CF 1DE 1DF 1EF 2AB 2AC 2AE 2AF 2BC 2BE 2BF CE 2CF EF 3BC 3BD 3BE 3BF 3CD 3CE 3CF 3DE 3DF 3EF 4AC 4AD 4AE 4AF CD 4CE 4CF 4DE 4DF 4EF 5CD 5CE DE 6CE 6CF EF 7EF Bclq3 12A 14A A 12B 13B B 12C 13C 14C 15C 16C C 24C 25C 26C C 35C 36C C 46C C 13D 14D 15D D 35D D 12E 13E 14E 15E 16E 17E 23E 24E 25E 26E 27E 34E 35E 36E 37E 45E 46E 47E 56E 57E 67E 12F 13F 14F F 17F 23F 24F F 37F 34F F 37F F 47F F
34
1AB 1AC 1AD 1AE 1AF 1BC 1BD 1BE 1BF 1CD 1CE 1CF 1DE 1DF 1EF
2AB 2AC 2AE 2AF 2BC 2BE 2BF CE 2CF EF 3BC 3BD 3BE 3BF 3CD 3CE 3CF 3DE 3DF 3EF 4AC 4AD 4AE 4AF CD 4CE 4CF 4DE 4DF 4EF 5CD 5CE DE 6CE 6CF EF 7EF Bclq3 F E D C B 1AC 1AD 1AE 1AF 1BC 1BD 1BE 1BF 1CD 1CE 1CF 1DE 1DF 1EF A 2AC 2AE 2AF 2BC 2BD 2BE 2BF 2CE 2CF 2EF 3BC 3BD 3BE 3BF 3CD 3CE 3CF 3DE 3DF 3EF 1AB 2AB 4AC 4AD 4AE 4CD 5CD 5CE 5DE 6CE 6CF 6EF 7EF G9B1 A B C D E F 1 2 3 4 5 6 7 A B C D E F A B C D E F 12A A A 12B 13B B 12C 13C 14C 15C 16C C 24C 25C 26C C 35C 36C C 46C C 13D 14D 15D D 35D D 12E 13E 14E 15E 16E 17E 23E 24E 25E 26E 27E 34E 35E 36E 37E 45E 46E 47E 56E 57E 67E 12F 13F 14F F 17F 23F 24F F 37F 34F F 37F F 47F F 17F 27F 37F 47F 67F 7 6 5 4 3 2 16F 26F 36F 46F 14F 24F 34F 13F 23F F E D C B A 12F 17E 27E 37E 47E 57E 67E 16E 26E 36E 46E 56E 15E 25E 35E 45E 14E 24E 34E 13E 23E 12E 15D 35D 45D 14D 34D 13D 16C 26C 36C 46C 56C 15C 25C 35C 45C 14C 24C 34C 13C 23C 12C 13B 23B 12B 7 6 5 4 3 2 Let’s take and even smaller bipartite graph to get a feel for how datacube technology might help us catalog bicliques, since there are clearly going to be a large number of them! 14A 24A 12A
35
Remember the Sales Data Cube? Each cell contains a sales measurement, e.g., the number of sales (may contain many other measurements of product-date-country instances) We will attempt to apply this technology to the task of finding bicliques later, after reviewing the technology. Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC U.S.A VCR Canada Country Mexico
36
Total of all product sales by country and quarter
Total sales by country and date Rollup (aggregate under +) along product (e.g., using the aggregate, sum) Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product Total of all product sales by country and quarter PC U.S.A VCR Canada Country Mexico
37
Rollup along date (e.g., using the aggregate, sum)
Total annual sales by country and product Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC U.S.A VCR Canada Country Mexico
38
Rollup along country (e.g., using the aggregate, sum)
Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product PC U.S.A VCR Canada Country Mexico Total of all product sales by product and date Total of all product sales by product and date
39
All rollups (e.g., using the aggregate, sum)
Date 1Qtr 2Qtr 3Qtr 4Qtr TV Product sales by product, country PC U.S.A sales by product, country and quarter VCR sales by country, date sales by country sales by country Canada Country Mexico sales by product sales by product, country sales by product sales by date sales by date Total sales Total sales Total sales
40
Date Product Country 1Qtr 2Qtr 3Qtr 4Qtr TV U.S.A VCR PC Canada Mexico
Partial Rollup: climbing up a concept hierarchy (instead of eliminating Product altogether by summing over all products, rollup partially on Product, from (VCR, PC, TV) to computer (includes PC only) and non-computer (includes VCR + TV) Date 1Qtr 2Qtr 3Qtr 4Qtr Product TV U.S.A non-comp comp VCR PC Canada Country Mexico
41
SLICE e.g., slice off PC Date Product Country 1Qtr 2Qtr 3Qtr 4Qtr TV
U.S.A VCR PC Canada Country Mexico
42
DICE (e.g. dice off PC, the last two quarters, the country Mexico)
Date 1Qtr 2Qtr 3Qtr 4Qtr Product TV U.S.A VCR PC Canada Country Mexico
43
Pivot/Rotate Country Date Product Date Country Product Mexico Canada
secondary Pivot/Rotate Date Product Country TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico tertiary primary Date Product Country TV VCR PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico
44
Now let’s apply this technology to finding all bicliques.
1 2 3 A B C G5b1 bCLQ3s centered on numbers. 1AC 2AB 2AC 2BC 1 2 3 A A B 2AB 2AB C 1AC 2AC B 2BC C
45
1 2 3 A A B C B C 2ABC 1AC 2AC 2BC 2AB 1AC 2AC
2 3 A B C G5b1 bCLQ3s centered on numbers. 1AC 2AB 2AC 2BC 1 2 3 A 2ABC RollUp along the front-to-back dimension using the hub intersection and spoke union gives the expanded hub-and-spoke biclique, hub={2}, spokes={A,B,C} or hub={2A}, spokes={B,C} or the hub-union (of hubs {B},{C}), spoke-intersection (of spokes {2,A}). Rather than view it as an intersection-union of hubs and spokes, I think it suffices to just take the union??? A B 2AB C 1AC 2AC 1AC 2AC B 2BC C
46
1 2 3 A B C G5b1 bCLQ3s centered on numbers. 1AC 2AB 2AC 2BC 1 2 3 A 12AC RollUp along the left-right dimension using the hub intersection and the spoke union gives the one expanded biclique, (hub={AC}, spokes={1,2} B 2AB C 1AC 2AC A 2BC B C
47
1 2 3 A B C A B C 1AC 2AC 2AB 2BC 2ABC 2ABC
2 3 A B C G5b1 bCLQ3s centered on numbers. 1AC 2AB 2AC 2BC 1 2 3 A 1AC 2AC B 2AB C 2BC A B 2ABC RollUp along the top-bottom dim using hub intersection and spoke union gives the expanded hub-and-spoke biclique, (hub={2}, spokes={A,B,C} 2ABC RollUp along the top-bottom dim using hub intersection and spoke union gives the expanded hub-and-spoke biclique, (hub={2}, spokes={A,B,C} C
48
1 2 3 A A B C B C 12 AC 2AB 1AC 2AC 2ABC 2BC 2ABC
2 3 A B C G5b1 bCLQ3s centered on numbers. 1AC 2AB 2AC 2BC 1 2 3 A 12 AC A B 2AB C 1AC 2AC 2ABC B 2BC C 2ABC
49
1 2 3 A A B C B C 2AB 1AC 2AC 3AC 1BC 2BC 3BC 1AB 2AB 3AB
G5b2 bCLQ3s centered on numbers. 1AB 1AC 1BC 2AB 2AC 2BC 3AB 3AC 3BC 1 2 3 A A B 1AB 2AB 2AB 3AB C 1AC 2AC 3AC B 1BC 2BC 3BC C
50
1 2 3 A B C A B C 2AB 1AC 2AC 3AC 1BC 2BC 3BC 1AB 2AB 3AB
G5b2 bCLQ3s centered on numbers. 1AB 1AC 1BC 2AB 2AC 2BC 3AB 3AC 3BC 1 2 3 A B 1AB 2AB 2AB 3AB C 1AC 2AC 3AC A 1BC 2BC 3BC B C
51
1 2 3 A B C A B C 1ABC 2ABC 3ABC 1AC 2AC 1BC 2BC 3BC 1AB 2AB 3AB 1AC
G5b2 bCLQ3s centered on numbers. 1AB 1AC 1BC 2AB 2AC 2BC 3AB 3AC 3BC 1 2 3 A 1ABC 2ABC 3ABC B 1AB 2AB 3AB C 1AC 2AC 1AC 2AC 3AC A 1BC 2BC 3BC B C
52
AB 1 2 3 A B C A B C 1A BC 2A 3A 1AC 2AC 1BC 2BC 3BC 1AB 2AB 3AB 123
G5b2 bCLQ3s centered on numbers. 1AB 1AC 1BC 2AB 2AC 2BC 2AB 3AC 3BC 1 2 3 A 1A BC 2A 3A B 1AB 2AB 3AB 123 BC AB AC UnionRollUp along front-back dim gives expanded bicliques, hub={1,A} spoke={B,C}. hub={2,A} spoke={B,C}, hub={3,A} spoke={B,C}. UnionRollUp along left-right dim hub={A,B} spokes={1,2,3}, hub={A,C} spokes={1,2,3}, hub={B,C} spokes={1,2,3}. Note: hub is always the combo of fixed values. C 1AC 2AC 1AC 2AC 3AC A 1BC 2BC 3BC B C
53
1 2 3 A B C A B C 1A BC 2A 3A 123 BC AB AC 1AC 2AC 123A BC 1BC 2BC 3BC
G5b2 bCLQ3s centered on numbers. 1AB 1AC 1BC 2AB 2AC 2BC 3AB 3AC 3BC 1 2 3 A 1A BC 2A 3A B 1AB 2AB 3AB 123 BC AB AC UnionRollUp along front-back dim gives expanded bicliques, hub={1,A} spoke={B,C}. hub={2,A} spoke={B,C}, hub={3,A} spoke={B,C}. UnionRollUp along left-right dim hub={A,B} spokes={1,2,3}, hub={A,C} spokes={1,2,3}, hub={B,C} spokes={1,2,3}. Note: hub is always the combo of fixed values. C 1AC 2AC 1AC 2AC 3AC 123A BC A 1BC 2BC 3BC B C 1C AB 2C 3C UnionRollUp along the top-bottom dim gives the expanded biclique, hub={1,C} spokes={A,B}; hub={2,C} spokes={A,B}; hub={3,C} spokes={A,B}.
54
DICE (e.g. dice off 3 AND C.) 1 2 3 A B C A B C 1AB 2AB
G5b2 bCLQ3s centered on numbers. 1AB 1AC 1BC 2AB 2AC 2BC 3AB 3AC 3BC DICE (e.g. dice off 3 AND C.) 1 2 3 A B 1AB 2AB C A B C 1 2 A B G5b3 bCLQ3s 1AB 2AB
55
Clique Mining Thm (CLQm)
157CS3 E36=0 368CS3 E27=0 EG5 2-level str=8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 8 7 G5 CLQ2: 2 vertices, 1 edge, so just E, which as a list is: So CLQ3: and CLQ4 = E25=0 E14=0 Calculating pairwise &s is unnecessary! The most efficient algorithm is to consider CCLQk+1 from lowest common vertex set to highest (i.e., start with the lowest k and work up always keeping the max of the shared sets as low as possible). For every found candidate pair from CCLQk+1 sharing k-1 vertices in which >1 unshared vertex is higher than said shared max, check for an edge connecting those unshared vertices. G5.1 1 2 4 3 6 7 5 8 9 a b c d e f g CLQ2: c ac bc df dg fg CCLQ3: ac abc dfg CLQ3: y y CCLQ4: CLQ4: 1 2 3 4 5 6 7 8 9 a c b E G6 CLQ2: c a 9a 9b 9c ab ac bc CCLQ3: c 23c a 89a CLQ3: y y y y 9ab 9ac 9bc abc y y y y CCLQ4: ab 89ac 9abc CCLQ5= CLQ4: y
56
More Complex Graph Structures? The vertex-labelled, edge-labelled graph
1 TS a e c AN pTree Ct AN 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ANalyst TickerSymbol Relationship with labels C Sal TS pTree Ct TS SA H B B SB B SS S S H H B B B SB Buy-Hold-Sell SA Dow? We can interpret this structure many ways, 1. as a relationship with entity tables; 2. as a AN[lysist] Table with attributes, the AN attributes (SA, Ct, C, Sal) plus each TickerSymbol pTree as an additional attribute (the TS attributes (Dow?,Ct,BHS,SA) are not captured in this interpretation); 3. as a T[icker] S[ymbol] or Stock Table with attributes, the TS attributes (Dow?, Ct, BHS, SA) plus each Analyst pTree as an additional attribute (the AN attributes (SA, Ct, F, Sal) are not captured in this interpretation); 2 1 3 In full pTree form: H S 10 01 00 11 TS SA0 SA1 SS SB B C0 C1 C2 C3 Dow? AN 1 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 S F C A We can include this relationship with other relationships sharing entities by using the RoloDex Model (next slide). The graph could be 3D, 4D (i.e., edges are triples, quadruples), etc. The graph could also be edge labelled. A convenient way to capture edge labels is by making the cell content of each matrix cell into the label structure rather than just a yes/no bit. As a simple but pertinent example, suppose we have a 0-3 rating of each Analyst-Stock pair which measure how much that Analysts know about that stock. We just change each bit to a decimal number in [0,3] (or bitslice those using two bits instead of on, so that the matrix columns are 2-bit pTreeSets rather than just one pTree). If C measures the “Correctness Level” of the Analyst over recent days or weeks over all stock (e.g., based on backward analysis of previous sentiment analysis and the actual performance of the stock) and the cell numbers measure the correctness of that Analyst on that Stock, then a signal might be to mask C>=2 and for those Analysts find the average Correctness for each stock, then mask out those Stock for which the number of Analysts is between two thresholds (want a high average but also more than one analyst but not too many).
57
The Multi-Relationship Model
Every Entity (Gene, Term, Experiment, Person, Document, Item, Stock, Course, Movie) has an EntityTable of many descriptive attributes (columns). They aren’t shown here. For example, on the previous slide we show the descriptive columns of Stocks(Dow?, Count, BHS, SA) and Analysts(SA,Count,Female?,SalaryInBillions), not shown here. 7 6 5 4 3 2 Stock 1 Stock-Investor relationship Tweets are Documents, so the Tweet-Tweeter relationship is a Document-Author relationship (Tweetee, hashtag, etc. are Edge Labels). In looking for signals that no one else uses: What if an Investor BUYS an island in the Mediterranean? What if an Investor’s best friend buys lots of stock in an Online University? Supp(A) = CusFreq(ItemSet) Conf(AB) =Supp(AB)/Supp(A) Friends relationship 5 6 16 ItemSet ItemSet antecedent 1 2 3 4 5 6 16 itemset itemset Customer 1 2 3 4 Item 1 customer rates movie as 5 relationships BUYS 5 6 7 People 1 2 3 4 Author movie 2 3 1 5 4 customer rates movie 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enroll 1 Doc TermDocument 1 3 2 Doc AuthDoc 1 2 3 4 Gene genegene rel (ppi) docdoc People term 7 1 2 3 4 G 5 6 7 6 5 4 3 2 t 1 ShareStem termterm rel CellLabel=stem 1 3 Exp expPI Expgene The Multi-Relationship Model
58
More Complex Graph Structures
More Complex Graph Structures? HyperGraphs, cliqueTrees (cTrees), Motifs GRAPH (linear edges, 2 vertices) kHyperGraph (edges=k vertex set) kPARTITE Graph or just kPART Graph (V=!Vi i=1..k (x,y)Ex,ysame Vi ) kPART HyperGraph (V=!Vi i=1..k (x1..xk)E xj,xjsame Vi ) BiPartClique Mining finds MaxCliques (bicliques) at cost of pairwise &s. Each LETpTreeMCLQ unless pairwise & with same ct. A&B, B w Ct(A&B)=Ct(A) is MCLQ potential for a k-plex [k-core] mining alg here. Instead of Ct(A&B)=Ct(A), consider. E.g., Ct(A&B)=Ct(A)-1. Each such pTree, C, would be missing just 1vertex (1 edge). Taking any MCLQ as above, ANDing in CpTree would produce a 1-plex. ANDing in k such C’s would produce a k-plex. In fact, suppose we have produced a k-plex in such a manner, then ANDing in any C with Ct(C)=Ct(A)-h would produce a (K+h)-plex. &i=1..nAi is a [i=1..nCt(Ai)]-Core TriPART Clique Mining Algorithm? In a Tripartite Graph edges must start and end in different vertex parts. E.g., PART1=tweeters; PART2=hashtags; PART3=tweets. Tweeters-to-hashtags is many-to-many? Tweeters-to-tweets is many-to-many (incl. retweets)?; hashtags-to-tweets is many-to-many? MultiPART Graphs BiPART, TriPART (have 2,3 PARTs respectively but still an edge is a linear (between two vertices) … No edge can start and end in the same PART. Conjecture: KmultiCliques and KhyperCliques are in 1-1 correspondence (both are defined by a K PART vertex set)? So, only one mining process needed? We will represent these common objects with cliqueTrees (cTrees). A cTree bitmaps each PART of the clique. E.g., the cTree for Inv={2,3}; Stock={A,B} Day={,}: 1 2 D I S Cts HyperClique Mining: A 3hyperGraph has 3 vertex PARTS and each edge is a planar triangle (defined by a vertex triple, one from each PART). Stock recommender is 3hyperGraph (Investors, Stocks, Days). A triangular edge connects Investor k, Stock X, and Day n if k recommends X on day n. A 3hyperClique is a community s.t. all investors in clique recommend all stocks in the clique on each day in clique. Tweet ex: PART1=tweeters; PART2=hashtags; PART3=tweets. Cliques, Kplexes and Kcores are subgraphs (communities) defined using an internal edge count. A Motif is a subgraph defined using external “isomorphism into the graph” count. A motif must occur (isomorphically) in the graph more times than “expected”. Criticism: Some authors argue[62] that a motif structure does not necessarily determine function. Recent research[64] shows the connections of a motif to the network, is too important to draw function inferences just from local structure.[65] Research shows certain topological features of biological networks naturally give rise to canonical motifs,.[66] Most find induced Motifs. A graph, G′, is a subgraph of G (G′⊆G) if V′⊆V and E′⊆E∩(V′×V′). If G′⊆G and G′ contains all ‹u,v›∈E with u,v∈V′, G′ is induced sub-graph. G′ and G are isomorphic (G′↔G), if a bijection f:V′→V with ‹u,v›∈E′⇔‹f(u),f(v)›∈E u,v∈V′. G″⊂G and an isomorphism between G″ and G′, G′ appears in G). The number of appearances G′ in G is the frequency FG of G′ in G, FG(G’). G is recurrent or frequent in G, when FG(G’)>threshold (pattern=frequent subgraph). Motif discovery includes exact counting, sampling, pattern growth. Motif discovery has 2 steps: calculate the # of occurrences; evaluating the significance. Are Stock-Inv or Stock-Inv-Day Motifs useful? Some questions/theorems/thoughts: All K-Paths are isomorphic (thus, there’s alway a Kpath motif). A ShortestKPath is an Induced subgraph. What does sequence Frequency(1PathMotif)=|V|, Frequency(2PathMotif),…tell? Sequence of Frequency(Shortest1Path), Frequency(Shortest2Path), …? Sequence Frequency(MaxShortest1Path), Frequency(MaxShortest2Path)… tell us? where a MaxS2P is not part of a S3P. Extend to HyperEdges? What is a path in, e.g., a 3HyperGraph? Both? 2HGInterface3HyperGraphPath. 1HGI3HGP. (In general, hHGIkHGP, where 0<h<k) At the other extreme (all SPs are length=1: Or? I’ll bet most important motifs, M(V’,E’) in G are “Shortest Path Motifs”: x,yV’, a G-ShortestPath in M running from x to y. I.e., M is made up of G-SPs. A Clique is a SPMotif (made up entirely of Shortest1Paths) A 4PARThyperGraph or just 4HyperGraph has 4 vertex PARTS and each edge is a solid tetrahedron (defined by a vertex quadruple, one from each PART). Stock Recommender 4hyperGraph (Investors, Stocks, Strengh(StronBuy,…), Days). A tetrahedral "edge" connects Investor k, Stock X, Strength B and Day n iff k recommends X as a Buy on day n. A 4hyperClique is a community s.t. all the investors recommend all the stocks as strength=B on each day in the clique. some degeneracy since the Strength will always be singleton? One might argue that this is just a series of 3HyperGraphs, one for each strength level.) A Tweet 4HyperGraph: PART1=tweeters; PART2=hashtags; PART3=tweets, PART4=day. A 4hyperClique: all tweeters send all tweets on all hashtags each day of the clique. A MBR 4HyperGraph: PART1=customers; PART2=items; PART3=days, PART4=store. A 4hyperClique: all customers buy all items at all stores on each day of the clique.
59
Introduction to 2PART Graph Community Search:
For a multipartite graph the concept of community is still related to a large density of edges between members of the same group. A clique in a 2PART (bipartite) graph to be a bipartite subset of vertices with all possible edges. 2PART Induction thm: In a bipartite graph, a Kclique and 3clique that share an edge form a (K+1)clique iff all edges that can exist, from the non-shared Kclique vertices to the non-shared 3clique vertex, do exist. 2PART 3Clique thm: a pair of vertices from part1, a,b and a vertex from the part2, 1, form a 3Clique iff both possible edges a1, b1 exist. CLQ3 is constructed by listing each vertex pair in each pTree along with the naming vertex of the pTree. a b 1 2 The 2 3cliques ab1 and b12 sharing b1 form a 4clique iff the non-shared vertex pair a2 is an edge The 2 3cliques ab1 and bc1 sharing b1 form a 4clique. c a b 1 2 The 4clique ab12 and 3clique bc2 sharing b2 form a 5clique iff the non-shared vertex pair c1 is an edge. The 4clique abc1 and 3clique cd1 sharing c1form a 5clique c d a b 1 2 5clique abc12 and 3clique c23 sharing c2 form a 6clique iff the non-shared vertex pairs a3and b3 are edges. 5clique abc12 and 3clique d12 sharing vertices 1 and 2 form a 6clique. 5clique abcd1 and 3clique de1 sharing edge e1 form a 6clique. c 3 d e a b 1 2 6clique abc123 and 3clique cd3 sharing c3 form a 7clique iff the non-shared vertex pairs d1 and d2 are edges. 6clique abc123 and 3clique d23 sharing vertices 2 and 3 form a 7clique iff vertex pair d1 is an edge. 6clique abcd12 and 3clique de2 sharing edge d2 form a 7clique iff vertex pair e1 is an edge 6clique abcde1 and 3clique ef1 sharing edge e1 form a 7clique. c 3 d e f Although the pattern seems complex, the 2PART Clique Algorithm can be stated: A Kclique and 3clique sharing 2 vertices form a K+1clique iff all edges from the non-shared 3clique vertex to each non-shared Kclique vertex (from the other PART) exist. That is, check edge existence between all non-shared vertices.
60
Most bipartite graphs involve a subject part and an object part, e. g
Most bipartite graphs involve a subject part and an object part, e.g., women-events, Investors-Stocks, Customers-Items, Subject-Accessibility in the system access case. Most tripartite hypergraphs involve a subject, object and circumstance part, eg, Investor-Stock-Day, Customer-Item-Day, Customer-Item-Store, Subject-Accessibility-Day Most quadrapartite hypergraphs involve a subject, object and 2 circumstance parts, e.g., Customer-Item-Store-Day. Most multi-partite hypergraphs can be realized as a subject-object bipartite graph with the circumstances as edge labels. A biclique is a complete subgraph of a bipartite graph (having every edge it is allowed to have and at least 1 (rules out the trivial case of no edges)). Every induced subgraph of a biclique is a biclique and therefore, we get the same downward closure: If two bicliques with k vertices each overlap in k-2 verticies and if the other two verticies form an edge, then the union of the two k-biciques is a (k+1)biclique. The only difference is that with bicliques, it is easier because we only need to check that the unshared pair is an edge when those two points are from different parts. Otherwise the union is a (k+1)biclique by default. pTree BiClique Existence Thm (BCLQep) |W|=k. (W,EW)CLQk iff x,yW |Wx&Wy|=k (k*(k-1)/2 ANDs) OR |W|=k. (W,EW)CLQk iff xyEW s.t. |Wx&Wy|=k-2 and u,vWx&Wy uWv (1 AND (but which one?) and k-2 lookups). OR BiClique Existence Thm (BCLQe) Let G=(S,O,E) be a bipartite graph and US and VO with |U|=k and |V|=h then the induced subgraph, (U,V,EUV)BCLQk (is a k-biclique) iff every induced (k-1)vertex subgraph of (W,EW)CLQk-1. Clique Mining Thm (CLQm) finds all cliques using a closure property: Let Candk+1CliqueSet CCLQk+1. By the CLQe thm, CCLQk+1= all s of CLQk-pairs having k-1 common vertices. Let CCCLQk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be their kth (non-common) vertices respectively, then CCLQk+1 iff Evw=1 (Just check a single bit in PE.) Clique Existence Thm edge count (CLQec): C={1,2,3,4}, CU=C&EU. ct(CU)=comb(4,2)=4!/2!2!=6 CCLQ4. Is there an edge count Clique Mining Thm?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.