Presentation is loading. Please wait.

Presentation is loading. Please wait.

GAIO threshold = 15 become: V= D2 H4 GAIO-Ct=

Similar presentations


Presentation on theme: "GAIO threshold = 15 become: V= D2 H4 GAIO-Ct="— Presentation transcript:

1 GAIO threshold = 15 become: V= 38 46 91 D2 H4 GAIO-Ct= 23 17 30 29 21
V Ct 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 23 1 24 1 25 1 26 1 27 1 28 1 29 2 30 1 31 1 32 1 33 1 34 1 35 1 36 1 37 0 38 1 39 0 40 2 41 2 42 1 43 4 44 1 45 4 46 12 47 3 48 4 49 7 50 2 51 2 52 1 53 1 54 1 55 2 56 1 57 1 58 1 59 1 60 1 61 2 62 1 63 1 64 1 65 0 66 2 67 1 68 1 69 1 70 1 71 1 72 2 73 1 74 2 75 1 76 4 77 4 78 7 79 6 80 1 81 5 82 1 83 1 84 1 85 2 86 1 87 1 88 2 89 2 90 1 91 10 92 0 93 0 94 0 95 1 96 1 97 1 98 2 99 2 a0 1 a1 1 a2 1 a3 1 a4 2 a5 1 a6 1 a7 3 a8 1 a9 1 b0 1 b1 1 b2 7 b3 1 b4 2 b5 1 b6 1 b7 1 b8 1 b9 1 c0 1 c1 1 c2 1 c3 1 c4 5 c5 0 c6 2 c7 2 c8 1 c9 1 d0 0 d1 2 d2 19 d3 1 d4 0 d5 2 d6 0 d7 0 d8 1 d9 2 e0 1 e1 1 e2 1 e3 1 e4 2 e5 2 e6 1 e7 2 e8 1 e9 1 f0 1 f1 1 f2 1 f3 1 f4 1 f5 1 f6 1 f7 2 f8 1 f9 1 g0 1 g1 2 g2 0 g3 0 g4 0 g5 0 g6 1 g7 1 g8 1 g9 1 h0 2 h1 3 h2 1 h3 1 h4 9 h5 2 h6 2 h7 2 h8 2 h9 2 i0 2 GAO V Ct 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 0 18 1 19 1 20 1 21 1 22 1 23 1 24 1 25 1 26 1 27 1 28 1 29 1 30 1 31 1 32 1 33 1 34 1 35 1 36 1 37 2 38 22 39 1 40 1 41 0 42 1 43 1 44 0 45 6 46 5 47 2 48 3 49 3 50 4 51 1 52 13 53 1 54 1 55 1 56 3 57 1 58 1 59 1 60 1 61 1 62 0 63 1 64 0 65 1 66 1 67 0 68 0 69 0 70 0 71 1 72 2 73 2 74 2 75 1 76 1 77 2 78 5 79 5 80 2 81 6 82 0 83 0 84 0 85 0 86 1 87 1 88 2 89 1 90 1 91 20 92 1 93 1 94 2 95 1 96 1 97 1 98 1 99 1 a0 1 a1 1 a2 1 a3 0 a4 1 a5 2 a6 0 a7 4 a8 0 a9 0 b0 0 b1 1 b2 2 b3 1 b4 1 b5 1 b6 1 b7 0 b8 0 b9 0 c0 2 c1 2 c2 3 c3 1 c4 7 c5 2 c6 2 c7 2 c8 2 c9 1 d0 2 d1 2 d2 20 d3 1 d4 1 d5 1 d6 3 d7 2 d8 3 d9 3 e0 3 e1 3 e2 1 e3 2 e4 2 e5 2 e6 2 e7 0 e8 0 e9 0 f0 0 f1 0 f2 0 f3 0 f4 0 f5 0 f6 1 f7 1 f8 0 f9 0 g0 0 g1 1 g2 1 g3 1 g4 1 g5 1 g6 2 g7 1 g8 0 g9 0 h0 2 h1 2 h2 2 h3 1 h4 12 h5 2 h6 1 h7 0 h8 2 h9 4 i0 3 GAU V Ct 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 2 19 2 20 2 21 2 22 2 23 2 24 2 25 2 26 2 27 2 28 2 29 3 30 2 31 2 32 2 33 2 34 2 35 2 36 2 37 2 38 23 39 1 40 3 41 2 42 2 43 4 44 1 45 9 46 13 47 3 48 5 49 8 50 5 51 2 52 14 53 2 54 2 55 3 56 4 57 2 58 2 59 2 60 2 61 2 62 1 63 2 64 1 65 1 66 2 67 1 68 1 69 1 70 1 71 1 72 3 73 3 74 3 75 2 76 4 77 4 78 9 79 8 80 3 81 8 82 1 83 1 84 1 85 2 86 2 87 1 88 3 89 3 90 2 91 26 92 1 93 1 94 2 95 1 96 2 97 2 98 2 99 3 a0 2 a1 2 a2 2 a3 1 a4 2 a5 2 a6 1 a7 5 a8 1 a9 1 b0 1 b1 2 b2 9 b3 2 b4 2 b5 1 b6 2 b7 1 b8 1 b9 1 c0 2 c1 2 c2 2 c3 2 c4 7 c5 2 c6 2 c7 2 c8 2 c9 2 d0 2 d1 3 d2 24 d3 1 d4 1 d5 2 d6 2 d7 2 d8 2 d9 3 e0 2 e1 2 e2 1 e3 2 e4 3 e5 3 e6 2 e7 1 e8 1 e9 1 f0 1 f1 1 f2 1 f3 1 f4 1 f5 1 f6 1 f7 2 f8 1 f9 1 g0 1 g1 2 g2 1 g3 1 g4 1 g5 1 g6 2 g7 1 g8 1 g9 1 h0 3 h1 3 h2 2 h3 2 h4 13 h5 2 h6 2 h7 2 h8 2 h9 5 i0 4 GAIO V Ct 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 2 19 2 20 2 21 2 22 2 23 2 24 2 25 2 26 2 27 2 28 2 29 3 30 2 31 2 32 2 33 2 34 2 35 2 36 2 37 2 38 23 39 1 40 3 41 2 42 2 43 5 44 1 45 10 46 17 47 5 48 7 49 10 50 6 51 3 52 14 53 2 54 2 55 3 56 4 57 2 58 2 59 2 60 2 61 3 62 1 63 2 64 1 65 1 66 3 67 1 68 1 69 1 70 1 71 2 72 4 73 3 74 4 75 2 76 5 77 6 78 12 79 11 80 3 81 11 82 1 83 1 84 1 85 2 86 2 87 2 88 4 89 3 90 2 91 30 92 1 93 1 94 2 95 2 96 2 97 2 98 3 99 3 a0 2 a1 2 a2 2 a3 1 a4 3 a5 3 a6 1 a7 7 a8 1 a9 1 b0 1 b1 2 b2 9 b3 2 b4 3 b5 2 b6 2 b7 1 b8 1 b9 1 c0 3 c1 3 c2 4 c3 2 c4 12 c5 2 c6 4 c7 4 c8 3 c9 2 d0 2 d1 4 d2 39 d3 2 d4 1 d5 3 d6 3 d7 2 d8 4 d9 5 e0 4 e1 4 e2 2 e3 3 e4 4 e5 4 e6 3 e7 2 e8 1 e9 1 f0 1 f1 1 f2 1 f3 1 f4 1 f5 1 f6 2 f7 3 f8 1 f9 1 g0 1 g1 3 g2 1 g3 1 g4 1 g5 1 g6 3 g7 2 g8 1 g9 1 h0 4 h1 5 h2 3 h3 2 h4 21 h5 4 h6 3 h7 2 h8 4 h9 6 i0 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 G10: Web graph of website hyperlinks. Communities are shown by color (GN). |V|=180 (1-i0), |E|=478. Have GAUpTrees (undirected graph). GAIpTrees (incoming edges and where they come from) GAOpTrees. 45 78 46 47 48 49 50 51 c5 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0 The first thing to notice about directed graphs is that treating them as undirected (GAU) can distort the vertex participation numbers, so to find the cluster centers, it seems better to add GAI+GAO=GAIO. The cluster centers (1st round) using: GAIO threshold = 15 become: V= D2 H4 GAIO-Ct= GAU-Ct= (would have just 3 centers). H4 GAIO threshold = 12: V= C4 D2 H4 GAIO-Ct= GAU-Ct= (don’t pick up C4) H4 GAIO threshold = 6: V= A7 B2 C4 D2 H4 H6 GAIO-Ct= GAU-Ct= H4 H6 By reducing the threshold all the way down to 6 we finally get at least one vertex in each of the 8 GN clusters, but of course we also get multiple centers I several GN clusters. If we use an Agglomerative Clustering method, TH=6 is probably best, but if we use a Divisive method, probably 15 ? I haven’t calculated a set of ShortestPath pTrees for G10, but if SPpTrees(G10) existed, it would be instructive to see how well clustering by shortest path to a center works in each threshold case?

2 No two share and edge so there are no 4Cliques.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 35 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 G10: Web graph of pages of a website and hyperlinks. Communities by color (Girvan Newman Algorithm). |V|=180 (1-i0) and |E|=478. We have unPTrees (undirected graph). inPTrees (showing all incoming edges and where they come in from) and outPTrees. 45 78 46 47 48 49 50 51 c5 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 c0 c1 c2 c3 c4 c6 c7 c8 c9 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 i0 UCLQ3pTrees: for Max Ct=26 vertex=91. All & with 91 have Ct=0 so 91 is part of no 3cliques a0 a1 a2 a4 b1 b4 c6 c7 d9 e0 h8 h9 UCLQ3pTrees: for Ct=24 vertex=D2. All & with D2 have Ct=0 so D2 is part of no 3cliques a0 a1 a2 a4 b1 b4 c6 c7 d9 e0 h8 h9 G10 UCLQ3pTrees: for Ct=23 vertex=38. All & with 38 have Ct=0 so 38 is part of no 3cliques G for Ct=14 vertex=52 & Ct=0 so 52 part of no 3clique G for Ct=13 vertex=174 is part of 3cliques H0 H2 H4 and H3 H4 I0 4681d0g6h0h1h2h3h5h6h7h8i0 h4h4h4h4h4h4h4h4h4h4h4h4h4 G10 Ct(B2)=9 part of 3clique, B2 b2b2b2b2b2b2b2b2b2 b1b3c0h1 G10 Ct(45)=9 &cts=0 G10 Ct(78)=9 &cts=0 G10 Ct(49)=8 all 0s G10 Ct(81)=8 all 0s G10 Ct(C4)=7 all 0s G10 Ct(A7)=5 all 0s G10 Ct(H9)=5 all 0s G for Ct=13 vertex=46 & Ct=0 so 46 part of no 3clique d2h4 There are only three 3Cliques: {H0 H2 H4} {H3 H4 I0} {45 76 B2} (I quickly checked the rest). No two share and edge so there are no 4Cliques. The fact there are so few cliques may be a characteristic of web page link graphs. Was it worthwhile doing the Clique analysis? Yes! The 8 vertices involved in the three 3Cliques (and the three cliques themselves) are outliers! We can examine each to try to determine what’s unique about them. What does it mean that the three vertices {H3 H4 I0} are a 3Clique in the undirected graph of page references. In this case, after close examination, we see that they form a cycle (in the directed graph sense). Should there ever be circular references like that in web pages? The 3Clique {45 76 B2} appears to be a mistake (no edge from 45 to 76). The clique {H0 H3 H4} does not appear to be a cycle.

3 Edge Count Clique Alg (EC): A graph C is a clique iff |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!)
SubGraph existence thm (SGE): (VC,EC) is a k-clique iff every induced k-1 subgraph, (VD,ED) is a (k-1)-clique. Apriori Clique Mining Alg (AP): finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet. By SGE, CCSk+1= all s of CSk pairs having k-1 common vertices. Let CCCSk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be the kth vertices (different) of the two k-cliques, then CCSk+1 iff (PE)(v,w)=1. Breadth-1st Clique Alg: CLQK=all Kcliques. Find CLQ3 using CS0. Induction theorem: A Kclique and 3clique that share an edge form a (K+1)clique iff all K-2 edges from the non-shared Kclique vertices to the non-shared 3clique vertex exist. Next find CLQ4, then CLQ5, … Depth-1st Clique Alg: Find a Largest Maximal Clique v. Let (x,y)CLQ3pTree(v,w). If (x,y)E and Count(NewPtSet(v,w,x,y)CLQ3pTree(v,w)&CLQ3pTree(x,y)) is: 0, the 4 vertices form a maximal 4Clique (i.e., v,w,x,y). 1, the 5 vertices form a maximal 5Clique (i.e., v,w,x,y and the NewPt) 2, the 6 vertices form a maximal 6Clique if the NewPair is an edge, else they form 2 maximal 5Cliques. 3, the 7 vertices form a maximal 7Clique if each of the 3 NewPairs is an edge, elseif 1 or 2 of the NewPairs are edges then each of the 6VertexSets (vwxy + 2 EdgeEndpts) form Max6Clique, elseif 0 NewPairs is an edge, then each 5VertexSet (vwxy + 1 NewVertex) forms a maximal 5Clique…. Theorem:  hCliqueNewPtSet, those h vertices together with v,w,x,y form a maximal h+4Clique, where NPS(v,w,x,y)=CLQ3(v,w)&CLQ3(x,y). We can determine if each maximal kClique found is a “Largest” from counts (or find them all) but determining “Largest” early can save time (can move on to another v immediately). E.g., if there aren’t enough siblings left or a large enough 1-count among CLQ3pTrees…

4 From Fortunato: For a multipartite graph the concept of community is still related to a large density of edges between members of the same group. Let’s define a clique in a multipartite graph to be a bipartite subset of vertices with all possible edges. Bipartite Induction thm: In a bipartite graph, a Kclique and 3clique that share an edge form a (K+1)clique iff all edges that can exist, from the non-shared Kclique vertices to the non-shared 3clique vertex, do exist. Bipartite 3Clique thm: a pair of vertices from part1, a,b and a vertex from the part2, 1, form a 3Clique iff both possible edges a1, b1 exist. CLQ3 is constructed by listing each vertex pair in each pTree along with the naming vertex of the pTree. a b 1 2 The 2 3cliques ab1 and b12 sharing b1 form a 4clique iff the non-shared vertex pair a2 is an edge The 2 3cliques ab1 and bc1 sharing b1 form a 4clique. c a b 1 2 The 4clique ab12 and 3clique bc2 sharing b2 form a 5clique iff the non-shared vertex pair c1 is an edge. The 4clique abc1 and 3clique cd1 sharing c1form a 5clique c d a b 1 2 5clique abc12 and 3clique c23 sharing c2 form a 6clique iff the non-shared vertex pairs a3and b3 are edges. 5clique abc12 and 3clique d12 sharing vertices 1 and 2 form a 6clique. 5clique abcd1 and 3clique de1 sharing edge e1 form a 6clique. c 3 d e a b 1 2 6clique abc123 and 3clique cd3 sharing c3 form a 7clique iff the non-shared vertex pairs d1 and d2 are edges. 6clique abc123 and 3clique d23 sharing vertices 2 and 3 form a 7clique iff vertex pair d1 is an edge. 6clique abcd12 and 3clique de2 sharing edge d2 form a 7clique iff vertex pair e1 is an edge 6clique abcde1 and 3clique ef1 sharing edge e1 form a 7clique. c 3 d e f Although the pattern seems complex, the Bipartite Clique Algorithm can be stated: A Kclique and 3clique sharing 2 vertices form a K+1clique iff all edges from the non-shared 3clique vertex to each non-shared Kclique vertex (from the other part) exist. That is, check edge existence from all non-shared letters to all non-shared numbers. Thus, the theorem isn’t different for bipartite graphs except that edges cannot run within a part.

5 G9 Bipartite Clique Thm on G9 (#pTrees) H I F E C K A L D M J B G N
Maximal Cliques containing 1 Bipartite Clique Thm on G9 (#pTrees) 2 1 3 5 4 6 7 8 9 a b c e d f g h i 1 2 6 3 7 4 5 8 9 a b c d e f g h i W 1 8 B A C D E F G H I J K L M N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 B A C D E F G H I J K L M N B A C D E F G H I J K L M N Next I x out those that are contained in the next level (or have Ct=0 which means there is no subgraph). Each WpTree is itself a 1-many clique, maximal iff no other contains it, Only qualify based on Ct but none contains (“& with 1” count<8). Many of these3-many are contained in uncomputed also. 13a,13b,13c,13d,13g13abcde and 13h,13i13hi Each 2W pTree is a 2-many clique, maximal iff no 3-many contains it (check later) 2 1 3 4 5 6 7 8 9 a b c e d f g There are 2-many cliques that are contained in 3-many which weren’t computed, such as 15135, 18 138, 19139, 1a13a, 1b13b, 1c13c, 1e13e, 1d13d, 1g13g, 1h13h, 1i13i. 2 1 3 4 5 6 7 8 9 a b c e d f g G9 A B C D E F G H I J K L M N Bipartite graph of the Southern Women Event Participation. Women are numbers, events are letters. B A C D E F G H I J K L M N 1 2 3 4 5 6 7 9 1 3 a b c d e 2 1 2 3 4 5 6 8 9 a b c e d f g 7 3 1 h i 1 3 5 1 3 8 1 3 9 B A C D E F G H I J K L M N B A C D E F G H I J K L M N If numbers=investors and letters=stock (recommends relationship), would we like a clique with many investors and many stocks? eg, MaxClique 12346CEFH (5 investors, 4 stocks). Is E (8 investor, 1 stock) better? A K-1 clique  an original pTree (e.g., E  LETTERpTree (E) (actually =). Thus, we can remove if Ct= 0 or 1.

6 Bipartite Clique Thm on G9 (LETpTrees; &ing w highest Ct; elim if Ct=0|1)
I H G B 1 C D E 2 F J L M N I H A 1 B 2 C D E 3 F G 4 J L M N A H 1 3 B C 5 D E 7 F G 8 I 9 J 4 K L M 2 N A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 1 2 3 4 5 6 7 8 9 a b c d e f g h i Using LETpTrees; & w lowest Ct; elim Ct=0|1) 6-2 CLQ: ABCEFH12 Of course we should do an exhaustive search! B A C 1 2 D E F G H B A 1 2 C 3 D E 4 F G H I J L M N K 5 6 7 8 9 a b c d e f g h i A B C D E F G H I J K L M N G9 Bipartite graph of Southern Women Event Participation. Women=#s, events=letters. Or a recommender: Analyst=#s, stocks=letters

7 Bipartite Clique Thm on G9 (LETpTrees; exhaustive search; elim if Ct=0|1
AAC; BBC; CCE; DCD; MIM; NIN; B A 1 2 C A 1 3 D A 1 2 E A 1 4 F A 1 3 G A 1 2 H A 1 3 C B 1 3 D B 1 2 E B 1 3 F B 1 3 G B 1 2 H B 1 3 I B 1 2 D C 1 4 E C 1 6 F C 1 5 G C 1 4 H C 1 4 I C 1 2 E D 1 4 F D 1 3 G D 1 3 H D 1 3 I D 1 2 F E 1 6 G E 1 6 H E 1 7 I E 1 3 G F 1 4 H F 1 7 I F 1 4 H G 1 8 I G 1 5 J G 1 3 K G 1 2 L G 1 4 M G 1 2 N G 1 2 I H 1 9 J H 1 4 L H 1 5 M H 1 2 N H 1 2 J I 1 4 K I 1 3 L I 1 5 M I 1 3 N I 1 3 K J 1 2 L J 1 2 L K 1 2 M L 1 3 N L 1 3 N M 1 3 A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 Cliques: ABCEFH12; ADEFH13; BDEFHI13; GIJLMN(13,14); HLMN(12,13); ILMN(12,13,14); B A C 1 2 B A E 1 2 B A F 1 2 B A H 1 2 C A D 1 2 C A E 1 3 C A F 1 3 C A G 1 2 C A H 1 3 D A E 1 2 D A F 1 2 D A H 1 2 E A F 1 3 E A G 1 2 E A H 1 3 F A G 1 2 F A H 1 3 G A H 1 2 C B D 1 2 C B E 1 3 C B F 1 3 C B G 1 2 C B H 1 3 C B I 1 2 D B E 1 2 D B F 1 2 D B H 1 2 D B I 1 2 E B F 1 3 E B G 1 2 E B H 1 3 E B I 1 2 F B G 1 2 F B H 1 3 F B I 1 2 G B H 1 2 H B I 1 2 D C E 1 4 D C F 1 3 D C G 1 3 D C H 1 3 D C I 1 2 E C F 1 5 E C G 1 4 E C H 1 4 E C I 1 2 F C G 1 3 F C H 1 4 F C I 1 2 G C H 1 3 H C I 1 2 E D F 1 3 E D G 1 3 E D H 1 3 E D I 1 2 F D G 1 2 F D H 1 3 F D I 1 2 G D H 1 2 H D I 1 2 F E G 1 4 F E H 1 6 F E I 1 2 G E H 1 5 G E I 1 2 H E I 1 3 G F H 1 4 H F I 1 3 A B C E F H 1 2 G I J L M N 1 2 A D E F H 1 2 B D E F H 1 2 A C D E 1 2 A C D F 1 2 A C D H 1 2 A C E F 1 3 A C E G 1 2 A C E H 1 3 A C F G 1 2 A C F H 1 3 A C G H 1 2 I L M N 1 2 H G I 1 4 H G J 1 2 H G L 1 3 I G J 1 2 I G L 1 2 I G M 1 2 I G N 1 2 J G K 1 2 J G L 1 3 J G M 1 2 J G N 1 2 K G L 1 2 L G M 1 2 L G N 1 2 M G N 1 2 I H J 1 3 I H L 1 4 I H M 1 2 I H N 1 2 J H L 1 4 J H M 1 2 J H N 1 2 L H M 1 2 L H N 1 2 M H N 1 2 J I L 1 5 J I M 1 3 J I N 1 3 L I M 1 3 L I N 1 3 M I N 1 3 K J L 1 2 M L N 1 3

8 Depth 1st Bipartite Clique Search on G9
2 C A 1 3 D A 1 2 E A 1 4 F A 1 3 G A 1 2 H A 1 3 C B 1 3 D B 1 2 E B 1 3 F B 1 3 G B 1 2 H B 1 3 I B 1 2 D C 1 4 E C 1 6 F C 1 5 G C 1 4 H C 1 4 I C 1 2 E D 1 4 F D 1 3 G D 1 3 H D 1 3 I D 1 2 F E 1 6 G E 1 6 H E 1 7 I E 1 3 G F 1 4 H F 1 7 I F 1 4 H G 1 8 I G 1 5 J G 1 3 K G 1 2 L G 1 4 M G 1 2 N G 1 2 I H 1 9 J H 1 4 L H 1 5 M H 1 2 N H 1 2 J I 1 4 K I 1 3 L I 1 5 M I 1 3 N I 1 3 K J 1 2 L J 1 2 L K 1 2 M L 1 3 N L 1 3 N M 1 3 A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 Find MaxCLQ(E). E={ } is 1-many CLQ. Max? No! Ct(E&H)<8. Cts with E: A B C D F G H I & EC EF EG EH E C F 1 5 E C G 1 4 E C H 1 5 F E G 1 4 F E H 1 6 G E H 1 6 C E F G 1 3 C E F H 1 5 C E G H 1 3 E F G H 1 4 C E F G H 1 3 1 2 6 1 3 7 1 4 6 1 5 3 1 6 4 1 7 3 1 8 3 1 9 3 1 a 2 1 b 2 1 c 2 1 d 2 1 e 2 1 f 1 g 2 1 h 1 i 1 8 2 1 7 3 1 8 4 1 8 5 1 4 6 1 4 7 1 4 8 1 3 9 1 4 a 1 4 b 1 4 c 1 6 d 1 7 e 1 8 f 1 5 g 1 2 h 1 2 i 1 2 B A C D E F G H I J K L M N Depth 1st Bipartite Clique Search on G9

9 Depth 1st Bipartite Clique Thm on G9
2 C A 1 3 D A 1 2 E A 1 4 F A 1 3 G A 1 2 H A 1 3 C B 1 3 D B 1 2 E B 1 3 F B 1 3 G B 1 2 H B 1 3 I B 1 2 D C 1 4 E C 1 6 F C 1 5 G C 1 4 H C 1 4 I C 1 2 E D 1 4 F D 1 3 G D 1 3 H D 1 3 I D 1 2 F E 1 6 G E 1 6 H E 1 7 I E 1 3 G F 1 4 H F 1 7 I F 1 4 H G 1 8 I G 1 5 J G 1 3 K G 1 2 L G 1 4 M G 1 2 N G 1 2 I H 1 9 J H 1 4 L H 1 5 M H 1 2 N H 1 2 J I 1 4 K I 1 3 L I 1 5 M I 1 3 N I 1 3 K J 1 2 L J 1 2 L K 1 2 M L 1 3 N L 1 3 N M 1 3 A 1 3 B 1 3 C 1 6 D 1 4 E 1 8 F 1 8 G 1 a H 1 e I 1 c J 1 5 K 1 4 L 1 6 M 1 3 N 1 3 1 2 6 1 3 7 1 4 6 1 5 3 1 6 4 1 7 3 1 8 3 1 9 3 1 a 2 1 b 2 1 c 2 1 d 2 1 e 2 1 f 1 g 2 1 h 1 i 1 8 2 1 7 3 1 8 4 1 8 5 1 4 6 1 4 7 1 4 8 1 3 9 1 4 a 1 4 b 1 4 c 1 6 d 1 7 e 1 8 f 1 5 g 1 2 h 1 2 i 1 2 B A C D E F G H I J K L M N Depth 1st Bipartite Clique Thm on G9

10 18 12 22 8 2 1 3 4 14 20 13 5 6 7 11 17 25 27 32 26 28 29 24 16 30 15 23 21 19 33 10 31 9 34 APPENDIX APRIORI Clique Search Algorithm (may be faster since, e.g., a candidate 4clique which survives the “all sub3sets are 3ciques” is automatically a 4clique). G7 Cand4Cliques no 2 3 18 no 2 3 20 no 2 3 22 no 2 4 18 no 2 4 20 no 2 4 22 no 2 8 14,20,22 No LAST 3 no 3 4 9 no 3 8 9,14 no 3 9 14 no 4 8 13 no 4 8 14 no 4 13 14 no 5 7 11 no 6 7 11 No LAST 3 Survivor 4Cliqs Cand5Cliqs 1 1 1 2 2 2 3 3 3 4 4 8 sh 1 2 3 Surviv5Cliqs 1 1 2 2 3 3 4 4 8 14 Can6Clqs 1 2 3 4 8 14 1 2 4 8 14 s1 2 3 4 8 14 no 2 3 4 8 14 1 2 3 4 5 6 7 8 9 E no 2 3 8 14 no 2 4 8 14 no 3 4 8 14 The Clique Search Algorithms are: 1 This list APRIORI method. 2 The pTree version o f this APRIORI Unique 3Cliques (in set form) 3 The Induction Clique Search Alg (list version). 4 The Induction Clique Search Alg (pTree version). Which is fastest? Simplest? (Accuracy should be the same at 100%). Do we need 100% or can we get great time savings by relaxing that? How do these methods perform on a Big Graph? On Friends? On G7, 2 max5cliques and 1234e. All 4cliques are subset of these 5cliques, One can chose at random from v’s 3Cliques (1st ?): MSMC(1,2,3,4,8)= {1,2,3,4,8} MSMC(14) = {1,2,3,4,14} MSMC(5,7)= {1,5,7} Remaining pairwise ANDs after removal of PURE0s (i.e., after CS0). So these are the 3cliques in pTree form. MSMC(9,31,33)= {9,31,33} MSMC(10)= {3,10} MSMC(11)= {1,5,11} MSMC(12)= {1,12} MSMC(13)= {1,4,13} MSMC(*)= {33,*} *=15,16,19,21,23 MSMC(17)= {6,7,17} MSMC(18)= {1,4,18} MSMC(20)= {1,2,20} MSMC(22)= {1,2,22} MSMC(24,28,34)= {24,28,34} MSMC(25,26,32)= {25,26,32} MSMC(27,30)= {27,30,34} MSMC(27,30)= {27,30,34} MSMC(28)= {3,28} MSMC(29)= {29,32,34}

11 The vertex-labelled, edge-labelled graph
1 TS a e c AN pTree Ct AN 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ANalyst TickerSymbol Relationship with labels C Sal TS pTree Ct TS SA H B B SB B SS S S H H B B B SB Buy-Hold-Sell SA Dow? We can interpret this structure many ways, 1. as a relationship with entity tables; 2. as a AN[lysist] Table with attributes, the AN attributes (SA, Ct, C, Sal) plus each TickerSymbol pTree as an additional attribute (the TS attributes (Dow?,Ct,BHS,SA) are not captured in this interpretation); 3. as a T[icker] S[ymbol] or Stock Table with attributes, the TS attributes (Dow?, Ct, BHS, SA) plus each Analyst pTree as an additional attribute (the AN attributes (SA, Ct, F, Sal) are not captured in this interpretation); 2 1 3 In full pTree form: H S 10 01 00 11 TS SA0 SA1 SS SB B C0 C1 C2 C3 Dow? AN 1 2 3 4 5 6 7 8 9 12 13 14 15 16 17 18 S F C A We can include this relationship with other relationships sharing entities by using the RoloDex Model (next slide). The graph could be 3D, 4D (i.e., edges are triples, quadruples), etc. The graph could also be edge labelled. A convenient way to capture edge labels is by making the cell content of each matrix cell into the label structure rather than just a yes/no bit. As a simple but pertinent example, suppose we have a 0-3 rating of each Analyst-Stock pair which measure how much that Analysts know about that stock. We just change each bit to a decimal number in [0,3] (or bitslice those using two bits instead of on, so that the matrix columns are 2-bit pTreeSets rather than just one pTree). If C measures the “Correctness Level” of the Analyst over recent days or weeks over all stock (e.g., based on backward analysis of previous sentiment analysis and the actual performance of the stock) and the cell numbers measure the correctness of that Analyst on that Stock, then a signal might be to mask C>=2 and for those Analysts find the average Correctness for each stock, then mask out those Stock for which the number of Analysts is between two thresholds (want a high average but also more than one analyst but not too many).

12 The Multi-Relationship Model
Every Entity (Gene, Term, Experiment, Person, Document, Item, Stock, Course, Movie) has an EntityTable of many descriptive attributes (columns). They aren’t shown here. For example, on the previous slide we show the descriptive columns of Stocks(Dow?, Count, BHS, SA) and Analysts(SA,Count,Female?,SalaryInBillions), not shown here. 7 6 5 4 3 2 Stock 1 Stock-Investor relationship Tweets are Documents, so the Tweet-Tweeter relationship is a Document-Author relationship (Tweetee, hashtag, etc. are Edge Labels). In looking for signals that no one else uses: What if an Investor BUYS an island in the Mediterranean? What if an Investor’s best friend buys lots of stock in an Online University? Supp(A) = CusFreq(ItemSet) Conf(AB) =Supp(AB)/Supp(A) Friends relationship 5 6 16 ItemSet ItemSet antecedent 1 2 3 4 5 6 16 itemset itemset  Customer 1 2 3 4 Item 1 customer rates movie as 5 relationships BUYS 5 6 7 People  1 2 3 4 Author movie 2 3 1 5 4 customer rates movie 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enroll 1 Doc TermDocument 1 3 2 Doc AuthDoc 1 2 3 4 Gene genegene rel (ppi) docdoc People  term  7 1 2 3 4 G 5 6 7 6 5 4 3 2 t 1 ShareStem termterm rel CellLabel=stem 1 3 Exp expPI Expgene The Multi-Relationship Model

13 G7 Breadth-First Inductive Clique Search Algorithm:
Let CLQK be the set of all Kcliques, 1st find CLQ3 using CS0. Induction Step: CLQK+1 is obtained by applying the ECKCT to CLQK and CLQ3. 18 12 22 8 2 1 3 4 14 20 13 5 6 7 11 17 25 27 32 26 28 29 24 16 30 15 23 21 19 33 10 31 9 34 G7 Breath-First Edge-Check K-Clique Thm A Kclique and a 3clique that shares an edge form a (K+1)clique iff All K-2 edges from the non-shared Kclique vertices to the non-shared 3clique vertex exist in the graph. on G7 ( List Version): 1 2 3 4 5 6 7 8 9 E 1 2 3 18,20,22CLQ4 since 3:18,20,22E3 1 1 1 2 2 2 3 4 3 4 since 34E3 1 2 3 8 1 2 3 14 1 1 1 3 3 3 2 4 2 4 Note checkback. Is it required? (No: if 132 in 4CLQ it’d show up already). Already in CLQ4 1 2 4 8 1 2 4 14 1 3 4 8 1 3 4 14 2 3 4 8 2 3 4 14 UCLQ4 done. pTree version faster? UCLQ3 Unique 3cliques as lists UCLQ5 MUCLQs 1 2 18 1 2 20 1 2 22 1 3 9 1 4 13 1 5 7 1 5 11 1 6 7 1 6 11 3 9 33 1 2 3 4 8 1 2 3 4 14 1 2 3 4 8 1 2 3 4 14 6 7 17 9 31 33 9 31 34 24 28 34 24 30 33 24 30 34 25 26 32 27 30 34 29 32 34 CLQ3 (as pTrees) Remaining edges after CS0 (removal of PURE0 edge endpoint pair ANDs). Is there a pTree Version of this Algorithm ? Is it faster?

14 1 2 3 4 5 6 7 8 9 E 18 12 22 8 2 1 3 4 14 20 13 5 6 7 11 17 25 27 32 26 28 29 24 16 30 15 23 21 19 33 10 31 9 34 1 2 3 4 1 2 3 8 1 2 3 14 1 2 4 8 1 2 4 14 1 3 4 8 1 3 4 14 2 3 4 8 2 3 4 14 1 2 3 4 8 1 2 3 4 14 G7 Depth-First kClique Thm (pTree version) : Find a Largest Maximal Clique v. Let (x,y)CLQ3pTree(v,w) where w produces the largest count. If (x,y)E and CLQ3pTree(x,y) is the largest such and the Count(NewPtSet(v,w,x,y)CLQ3pTree(v,w)&CLQ3pTree(x,y)) is: 0, the 4 vertices form a maximal 4Clique (i.e., v,w,x,y). 1, the 5 vertices form a maximal 5Clique (i.e., v,w,x,y and the NewPt) 2, the 6 vertices form a maximal 6Clique if the NewPair is an edge, else they form 2 maximal 5Cliques. 3, the 7 vertices form a maximal 7Clique if each of the 3 NewPairs is an edge, elseif 1 or 2 of the NewPairs are edges then each of the 6VertexSets (the 4 original vertices and 2 EdgeEndpoints) form a maximal 6Clique, elseif 0 of the NewPairs is an edge, then each 5VertesSet (original 4 plus 1 NewVertex) forms a maximal 5Clique…. Theorem is:  hCliqueNewPointSet, those h vertices together with v,w,x,y form a maximal h+4Clique, where NPS(v,w,x,y)=CLQ3(v,w)&CLQ3(x,y). With each maximal kClique found, we can determine if it’s a “Largest” by examining counts (or we can find them all and then pick out a “Largest”) but determing “Largest” early can result in significant time savings (can move on to another v immediately). E.g., if there aren’t enough siblings left or a large enough 1-count among CLQ3pTrees… CLQ5pTrees CLQ3pTrees: edge (u,v) CLQ3(u,v)pTree(u)&pTree(v). Removing those with Ct=0 gives all 3Cliques, each listed thrice (Each CLQ3(u,v) 1bit is the 3rd vertex of the 3Clique formed with u and v. (Every edge is uniquely listed as the header of a pTree. CLQ4 pTrees 1 2 & 3 4 8 14 n o t a e d g E i C L Q 1 2 & 3 4 1 5 & 7 11 n o t a e d g E i C L Q 3 1 6 & 7 11 n o t a e d g E i C L Q 3 1 7 & 5 6 n o t a e d g E i C L Q 3 1 11 & 5 6 n o t a e d g E i C L Q 3 3 9 & 1 33 n o t a e d g E i C L Q 6 7 & 1 17 n o t a e d g E i C L Q 3 9 31 & 33 34 n o t a e d g E i C L Q 3 9 33 & 3 31 n o t a e d g E i C L Q 24 30 & 33 34 n o t a e d g E i C L Q 3 24 34 & 28 30 n o t a e d g E i C L Q 3 30 34 & 24 27 n o t a e d g E i C L Q 3

15 1 2 3 4 5 6 7 8 9 E 18 12 22 8 2 1 3 4 14 20 13 5 6 7 11 17 25 27 32 26 28 29 24 16 30 15 23 21 19 33 10 31 9 34 G7 [vw=12 xy=34] Depth-First kClique Thm (pTree ) :  edge (v,w) find a largest max clique. Let (x,y)CLQ3pTree(v,w)largest count. If CLQ3xy=z xyz LMC. If (x,y)E, vwx and vwy maximal, else let NewPtSet(v,w,x,y)CLQ3pTree(v,w)&CLQ3pTree(x,y)) If Ct(NPS(vwxy))= 0, the 4 vertices form a maximal 4Clique (i.e., v,w,x,y). 1, the 5 vertices form a maximal 5Clique (i.e., v,w,x,y and the NewPt) 2, the 6 vertices form a maximal 6Clique if the NewPair is an edge, else they form 2 maximal 5Cliques. 3, the 7 vertices form a maximal 7Clique if each of the 3 NewPairs is an edge, elseif 1 or 2 of the NewPairs are edges then each of the 6VertexSets (the 4 original vertices and 2 EdgeEndpoints) form a maximal 6Clique, elseif 0 of the NewPairs is an edge, then each 5VertesSet (original 4 plus 1 NewVertex) forms a maximal 5Clique…. 1 2 & 3 4 [NPS(1234)=CLQ3p(12)&CLQ3p(34)={8,14}] [(8,14)E so and 1234e are max5cliques] Theorem is:  hCliqueNewPointSet, those h vertices together with v,w,x,y form a maximal h+4Clique, where NPS(v,w,x,y)=CLQ3(v,w)&CLQ3(x,y). Determine if a maximal kClique is Largest from counts [Both ,1234e largest. They use 348e. Any other must use the 7-4=3 remaining pts  5Clique] [So 12348,1234e are LMC for edges, e e e 48 4e. Start over with: 15? 7bE so 157,15b LMCs; 16? 7bE so 167,16b LMCs; UCLQ3pTrees: edge (u,v) CLQ3(u,v)pTree(u)&pTree(v) diagonalized. Removing those with Ct=0 gives all 3Cliques uniquely. (3,9,33) LMC(3,9,33) (24,28,34) LMC(24,28,34) (24,30)? (33,34)E so (24,30,33), (24,30,34)LMCs; (25,26,32) LMC(25,26,32) (27,30,34) LMC(27,30,34) (29,32,34) LMC(29,32,34)

16 1 2 3 4 5 6 7 8 9 E 18 12 22 8 2 1 3 4 14 20 13 5 6 7 11 17 25 27 32 26 28 29 24 16 30 15 23 21 19 33 10 31 9 34 G7 Maximal Clique Theorem:  edge vw, and xyClique3vw, hCliqueNPS(v,w,x,y)=UCLQ3{v,w}&UCLQ3{x,y}, those h vertices together with v,w,x,y form maximal (h+4). Recursive Depth-First Theorem (pTree) to find the clique structure around an edge; (v,w) If UCLQ3vw has no edges, then xUCLQ3vw, vwx is a MCLQ, else pick a UCLQ3vw edge, xy [with maximal UCLQ3-count?]. Find the clique structure of NPS(vwxy) (recursively). Then apply the Theorem to get the MaxClique({v,w,x,y}UCLQ3vw). Pick vw=12 from UCLQ3. Pick edge xy=34 from UCLQ3vw. Clique structure of NPS(1234)={8,14} has two 1Cliques, {8}, {14}. Thus, {1,2,3,4}UCLQ312 has 2 MaxCliques {1,2,3,4,8}, {1,2,3,4,14} UCLQ3 1 & 2 3 4 1 2 7 1 3 4 1 4 3 1 5 2 1 6 2 2 3 1 2 4 1 4 3 2 1 3 9 1 6 7 1 31 9 2 1 24 28 1 30 24 2 1 25 26 1 27 30 1 29 32 1 3 2 1 4 5 6 7 9 8 10 12 11 13 14 15 16 18 17 19 20 22 21 23 25 24 26 27 28 29 31 30 32 34 33


Download ppt "GAIO threshold = 15 become: V= D2 H4 GAIO-Ct="

Similar presentations


Ads by Google