Download presentation
Presentation is loading. Please wait.
Published byEdith Blake Modified over 6 years ago
1
All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5)
SP1 =1deg 1 2 3 4 5 6 7 8 9 SP2 =2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 SP4 =4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 SP3 =3dg G7
2
G8 Trying Hamming Similarity to detect communities on G7 and G8 40 41
Zachary's karate club, a standard benchmark in community detection. (best partition found by optimizing modularity of Newman and Girvan) =1deg =2deg =3deg =4deg =5deg Hamming similarity: S(S1,S2)=DegkDif(S1,S2) To produce an [all?] actual shortest path[s] between x and y: Thm: To produce a [all?]: S2P[s], take a [all?] middle vertex[es], x1, from SP1x & SP1y, produce: xx1y; S3P[s], take a [all?] vertex[es], x1, from SP1x and a [all?] vertex[es], x2, from S2P(x1,y): xx1x2y etc. Is it productive to actually produce (one time) a tree of [all?] shortest paths? I think it is not! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 14 20 17 15 16 24 30 27 18 39 28 42 Can see that this Works Poorly At 1. 17 25 2 24 18 1 14 3 7 Not working! On the other hand, our standard community mining techniques (for kplexes) worked well on G7. Next slide let’s try Hamming on G8. G7 Deg b a g b 2 b f 9 f d Deg 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 40 41 42 46 44 53 48 54 52 45 43 39 38 20 21 24 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 G8
3
G9 G9, Agglomerative clustering of ESP2 using Hamming Similarity
In ESP2, using Hamming similarity, we get three Event clusters, clustering events iff pTrees [Hamming] identical: EventCluster1={1,2,3,4,5} EventCluster2={6,7,8,9} EventCluster3={10,11,12,13,14} 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W ESP E WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E The Degree % of affiliation of Women with R,G,B events is: R G B 1 100% 75% 0% % 75% 0% % 100% 0% % 75% 0% 5 60% 25% 0% % 50% 0% % 75% 0% % 75% 0% % 75% 0% % 75% 20% 11 0% 50% 40% 12 0% 50% 80% 13 0% 75% 80% 14 0% 75% 100% 15 0% 50% 60% 16 0% 50% 0% 17 0% 25% 20% 18 0% 25% 20% W 1 e e e e ESP E 2 3 4 5 6 7 8 9 10 11 12 13 14 E WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W G9 ESP3=ESP1’ and ESP4=ESP2’ so again, in this case, all info is already available in ESP1 and ESP2 (all shortest paths are of length 1 or 2). We don’t need ESPk k>2) WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E WSP3=WSP1’ and WSP4=WSP2’ so, in this case, all information is already available in WSP1 and WSP2 (All shortest paths are of length 1 or 2) (We don’t need WSPk k>2) Clustering Women using Degree% RGB affiliation: WomenClusterR={1,2,4,5} WomanClusterG={3,6,7,8,9,10,11,16,17,18} WomanClsuterB={12,13,14,15} This clustering seems fairly close to the authors. Other methods are possible and if another method puts event6 with 12345, then everything changes and the result seem even closer to the author’s intent..
4
Association Rule Mining (ARM)
Horizontal Trans tbl T T(I) t1 i1 t2 i1, i2, i4 t3 i1, i3 t4 i1, i2, i4 t5 i3, i4 Given any relationship between entities, T (e.g., a set of Customer Transactionss which are involved in those relationship instances). and I (e.g., a set of Items which are involved in those relationship instances). The itemset, T(I), associated with (or related to) a particular transaction, T, is the subset of the items found in the shopping cart or market basket that the customer is bringing through check out at that time). An Association Rule, AC, associates 2 disjoint Itemsets. (A=antecedent, C=consequent) T I A t1 t2 t3 t4 t5 i1 i2 i3 i4 C Its graph The support [ratio] of itemset A, supp(A), is the fraction of Ts such that A T(I), e.g., if A={i1,i2} and C={i4} then supp(A) = |{t2,t4}| / |{t1,t2,t3,t4,t5}| = 2/ Note: | | means set size. The support [ratio] of rule AC, supp(AC), is the support of {A C}=|{T2,T4}|/|{T1,T2,T3,T4,T5}|=2/5 The confidence of rule AC, conf(AC), is supp(AC) / supp(A) = (2/5) / (2/5) = 1 Data Miners typically want to find all STRONG RULES, AC, with supp(AC) ≥ minsupp and conf(AC) ≥ minconf (minsupp, minconf are threshold levels) Note that conf(AC) is also just the conditional probability of t being related to C, given that t is related to A). Given a two entity relationship, we can do ARM with either entity taking the role of the transaction set APRIORI Association Rule Mining: Given a Transaction-Item Relationship, the APRIORI algorithm for finding all Strong I-rules can be done by: Processing a Horizontal Transaction Table (HTT) through vertical scans to find all Frequent I-sets ( e.g., I-sets "frequently" found in baskets). Processing a Vertical Transaction Table (VTT) through horizontal operations to find all Frequent I-sets Then each Frequent I-set found is analyzed to determine if it is the support set of a strong rule. Finding all Frequent I-sets is the hard part. To do this efficiently, the APRIORI Algorithm takes advantage of the "downward closure" property for Frequent I-sets: If a I-set is frequent, then all its subsets are also frequent. E.g., in the Market Basket Example, If A is an I-subset of B and if all of B is in a given Transaction's basket, the certainly all of A is in that basket too. Therefore Supp(A) Supp(B) whenever AB. First, APRIORI scans to determine all Frequent 1-item I-sets (contain 1 item; therfore called 1-Itemsets), next APRIORI uses downward closure to efficiently find candidates for Frequent 2-Itemsets, next APRIORI scans to determine which of those candidate 2-Itemsets is actually Frequent, next APRIORI uses downward closure to efficiently find candidates for Frequent 3-Itemsets, next APRIORI scans to determine which of those candidate 3-Itemsets is actually Frequent, ... Until there are no candidates remaining (on the next slide we walk through an example using both a HTT and a VTT)
5
Example ARM using uncompressed ItemPtrees
HTT Scan D C1 F1 = L1 C2 C2 Scan D F2 = L2 C3 itemset {2 3 5} {1 2 3} {1,3,5} F3 = L3 Scan D {123} pruned since {12} not frequent {135} pruned since {15} not frequent It seems the pruning step in purple above is unnecessary here since root count will show up below the threshold and that root count (using PopCount) is almost free anyway??? Example ARM using uncompressed ItemPtrees (the 1-count at the root of each Ptree) P1^P2^P3 1 //\\ 0010 P1^P3 ^P5 1 P2^P3 ^P5 2 0110 P1 2 //\\ 1010 P2 3 0111 P3 3 1110 P4 1 1000 P5 3 Build Item Ptrees: Scan D P1^P2 1 //\\ 0010 P1^P3 2 1010 P1^P5 1 P2^P3 2 0110 P2^P5 3 0111 P3^P5 2 TID 1 2 3 4 5 100 200 300 400 F1={1}{2}{3}{5} cts: F2={13}{23}{25}{35} cts F3={235} cts All we need to do ARM are theses FrequentItemTables with Counts.
6
L1 L3 Data_Lecture_4.1_ARM L2
1-ItemSets don’t support Association Rules (They will have no antecedent or no consequent). 2-Itemsets do support ARs. Are there any Strong Rules supported by Frequent=Large 2-ItemSets (at minconf=.75)? {1,3} conf({1}{3}) = supp{1,3}/supp{1} = 2/2 = 1 ≥ .75 STRONG conf({3}{1}) = supp{1,3}/supp{3} = 2/3 = .67 < .75 {2,3} conf({2}{3}) = supp{2,3}/supp{2} = 2/3 = .67 < .75 conf({3}{2}) = supp{2,3}/supp{3} = 2/3 = .67 < .75 {2,5} conf({2}{5}) = supp{2,5}/supp{2} = 3/3 = 1 ≥ .75 STRONG! conf({5}{2}) = supp{2,5}/supp{5} = 3/3 = 1 ≥ .75 STRONG! {3,5} conf({3}{5}) = supp{3,5}/supp{3} = 2/3 = .67 < .75 conf({5}{3}) = supp{3,5}/supp{5} = 2/3 = .67 < .75 Are there any Strong Rules supported by Frequent or Large 3-ItemSets? {2,3,5} conf({2,3}{5}) = supp{2,3,5}/supp{2,3} = 2/2 = 1 ≥ .75 STRONG! conf({2,5}{3}) = supp{2,3,5}/supp{2,5} = 2/3 = .67 < .75 No subset antecedent can yield a strong rule either (i.e., no need to check conf({2}{3,5}) or conf({5}{2,3}) since both denominators will be at least as large and therefore, both confidences will be at least as low. conf({3,5}{2}) = supp{2,3,5}/supp{3,5} = 2/3 = .67 < .75 No need to check conf({3}{2,5}) or conf({5}{2,3}) DONE!
7
G9 Using ARM to find kplexes on the bipartite graph, G9? Does it work?
WSP W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W ESP E a e c G9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 1 e e e e ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E WomenSet ARM: MinSup=6 Mincnf=.75 EventSet ARM MnSp=9 Mncf=.75 Frequent 1WomenSets: Frequency (#events attended) Freq 1EventSets: c Freq (# attended) Candidate 2WomenSets: c 1d 1e c 2d 2e 34 3c 3d 3e 4c 4d 4e cd ce de Freq #events co-attended Cand 2EventSets: c c c c 89 8c 9c Freq=#attended Frequent 2WomenSets: c 1d 1e c 2d 2e 34 3c 3d 3e 4c 4d 4e cd ce de Freq #events co-attended freq 2EventSets: c c c c 89 Freq=#attended Cand 3EventSets all others excl because a sub2 not freq Freq # attended Cand3WSets: (cde is excluded since ce is infreq) Freq #events co-attended Frequent 3WomenSets: Freq #events co-attended Strong Erules 35 53 56 65 57 58 68 78 98 567 657 567 576 675 (Says 567 is a strong Event community?) Freq 3ESets: 567 Freq= StrongWrules 21 12 13 31 14 41 23 32 24 42 34 43 134 314 413 134 143 341 Says 1234 is a strong Women community? Confidence: But 134 is a very strong Women Commun? Note: When I did this ARM analysis, I had several degrees miscounted. None-the-less, I think the same general negative result is expected. Next we try using the WSP2 and ESP2 relationships for ARM??
8
G9 ESP2 EventSet ARM All rule confidences are either
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 1 e e e e ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E ESP2 EventSet ARM MnSp=9 Mncf=.75 WSP2 WomenSet ARM MinSup=18 Mincnf=.75 Freq1WSets: 1349adef Frequencies all 18 C2WSets: a 1d 1e 1f a 3d 3e 3f a 4d 4e 4f 9a 9d 9e 9f ad ae af de df ef Freq all This is not interesting! Go to ESP2 Eset ARM Freq1EventSets E1 Freq abcde 99999eeee99999 Freq2EventSets E1 E2 Freq 1 2 3 456789 999999 4 56789 99999 5 6789 9999 6 789abcde eee99999 7 89abcde ee99999 8 9abcde e99999 9 abcde a bcde b cde 999 c de 99 d e Freq3EventSets E1 E2 E3 Freq 1 2 1|2 3 456789 999999 1|2|3 4 56789 99999 1|2|3|4 5 6789 9999 1|2|3|4|5 6 789 999 7 89 99 8 9 89abcde ee99999 6|7 9abcde e99999 6|7|8 abcde 6|7|8|9 a bcde 6|7|8|9|a b cde 6|7|8|9|a|b c de 6|7|8|9|a|b|c d e Freq4EventSets E1 E2 E3 E4 Freq 1 2 3 456789 999999 1|2 4 56789 99999 1|2|3 5 6789 9999 1|2|3|4 6 789 999 1|2|3|4|5 7 89 99 8 9 9abcde e99999 6|7 abcde 6|7|8 a bcde 6|7|8|9 b cde 6|7|8|9|a c de 6|7|8|9|a|b d e Freq5EventSets E1 E2 E3 E4 E5 Freq 1 2 3 4 56789 99999 1|2 5 6789 9999 1|2|3 6 789 999 1|2|3|4 7 89 99 1|2|3|4|5 8 9 abcde 6|7 a bcde 6|7|8 b cde 6|7|8|9 c de 6|7|8|9|a d e Freq8EventSets E1 E2 E3 E4 E5 E6 E7 E8 Freq 1 2 3 4 5 6 7 89 99 1|2 8 9 a b c de 6|7 d e Freq6EventSets E1 E2 E3 E4 E5 E6 Freq 1 2 3 4 5 6789 9999 1|2 6 789 999 1|2|3 7 89 99 1|2|3|4 8 9 a bcde 6|7 b cde 6|7|8 c de 6|7|8|9 d e All rule confidences are either 100% (9/9 or e/e) or 9/e=64% Freq9EventSets E1 E2 E3 E4 E5 E6 E7 E8 E9 Freq 1 2 3 4 5 6 7 8 9 a b c d e Freq7EventSets E1 E2 E3 E4 E5 E6 E7 Freq 1 2 3 4 5 6 789 999 1|2 7 89 99 1|2|3 8 9 a b cde 6|7 c de 6|7|8 d e ARM on either SP1 or SP2 (W or E) does not seem to help much in identifying communities.
9
G9 K-plex search on G9 (A k-plex is a SG missing k edges
If H is a k-plex and F is a ISG, then F is a kplex A graph (V,E) is a k-plex iff |V|(|V|-1)/2 – |E| k 1 d d d d ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 h f h f b f f g h h g g h h h g c c Events abcde 14*13/2=91 degs=88888dddd88888 |Edge|=66 kplex k25 Events abcde Not calculating k degs= 7777cccc Until it gets lower Events abcde 14*13/2=91 degs= 666bbbb88888 |Edges|=66 kpl Events456789abcde 14*13/2=91 degs= 55aaaa88888 |Edges|=66 kplex k25 Women abcdefghi 18*17/2=153 degs=hfhfbffghhgghhhgcc |Edges| =139 kplex k14 Events56789abcde 14*13/2=91 degs= |Edges|=66 kplex k25 Women abcdefgh 18*17/2=153 degs=gfgfbfffggffgggfc |Edges| =139 kplex k14 Events6789abcde *8/2= A 9Clique! degs= |Edges|=36 kplex k0 Women abcdefg 18*17/2=153 degs=ffffbffeffeefffe |Edges| =139 kplex k14 So take out {6789abcde} and start over. Women abcdefg 15*14/2=105 degs=eeeeeeeeeeeeeee |Edges| = kplex k0 15Clique Events *4/2=10 |Edges|=10 kplex k 0 A 5clique! degs: 44444 So take out { abcdefg} and start over. If we had used the full algorithm which pursues each minimum degree tie path, one of them would start by eliminating 14 instead of 1. That will result in the 9Clique and the 5Clique abcde. All the other 8 ties would result in one of these two situations. How can we know that ahead of time and avoid all those unproductive minimum degree tie paths? Women5hi 3*2/2=3 degs=011 |Edges| =1 kplex k2 Womenhi 2*1/2=1 degs=11 |Edges| =1 kplex k0 Clique We get no information from applying our kplex search algorithm to WSP2. Again, how could we know this ahead of time to avoid all the work? Possibly by noticing the very high 1-density of the pTrees? (only 28 zeros)? Every ISG of a Clique is a Clique so 6789 and 789 are Cliques (which seems to be the authors intent?) If the goal is to find all maximal Cliques, how do we know that CA= is maximal? If it weren’t then there would be at least one of abcde which when added to CA= would results in a 10Clique. Checking a: PCA&Pa would have to have count=9 (It doesn’t! It has count=5) and PCA(a) would have to be 1 (It isn’t. It’s 0). The same is true for bcde. The same type of analysis shows 6789abcde is maximal. I think one can prove that any Clique obtained by our algorithm would be maximal (without the above expensive check), since we start with the whole vertex set and throw out one at a time until we get a clique, so it has to be maximal? The Women associated strongly with the blue EventClique, abgde are { } and associated but loosely are { }. The Women associated strongly with the green EventClique, are { } and associated but loosely are {6 7 9}
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.