All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5.

Slides:

Advertisements

Similar presentations

Association Rule Mining

Advertisements

Recap: Mining association rules from large datasets

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Analysis: Basic Concepts and Algorithms.

Data Mining Association Analysis: Basic Concepts and Algorithms

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

Entity Tables, Relationship Tables We Classify using any Table (as the Training Table) on any of its columns, the class label column. Medical Expert System:

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining 1 Data Mining is one aspect of Database Query Processing (on the "what if" or pattern and trend end of Query Processing, rather than the "please.

Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Association Rule Mining

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Association Analysis (3)

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.

Reducing Number of Candidates

Data Mining Association Analysis: Basic Concepts and Algorithms

ANalyst TickerSymbol Relationship with labels (1=“recommends”)

Graph Clustering Algorithms: Divisive Girvan and Neuman delete edges with max “betweenness”, i.e., max participation in shortest paths (of all lengths).

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Association rule mining

Association Rules Repoussis Panagiotis.

Frequent Pattern Mining

Association Rules.

DIVISIVE ALGORITHMS A simple way to identify communities in a graph is to detect the edges that connect vertices of different communities and remove them,

Girvan and Newman (Girvan and Newman,02; 04)

Dynamic Itemset Counting

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

DIRECT HASHING AND PRUNING (DHP) ALGORITHM

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

Entity Tables, Relationship Tables is in Course Student Enrollments

Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad

Farzaneh Mirzazadeh Fall 2007

Association Analysis: Basic Concepts and Algorithms

Shortest Path Trees Construction

Frequent-Pattern Tree

The Multi-hop closure theorem for the Rolodex Model using pTrees

Department of Computer Science National Tsing Hua University

Divisive Graph Clustering: Girvan and Neuman delete edges with max “betweenness”, i.e., max participation in shortest paths (of all lengths). We look.

Girvan and Newman (Girvan and Newman,02; 04)

Association Analysis: Basic Concepts

Concepts of Computation

Presentation transcript:

All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP1 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16=1deg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP2 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6=2dg 10,25,26,28,29,33,34 not shown (only 17 on, 1=4dg) 1 5 6 7 11 2 3 5 6 7 8 9 21 2 3 4 7 30 SP4 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8=4dg 15,16,19,21,23,24,27,30 only 17 on, 5deg=1 17 SP5 8=5dg 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 SP3 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10=3dg G7

G8 Trying Hamming Similarity to detect communities on G7 and G8 40 41 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1 2 3 4 Zachary's karate club, a standard benchmark in community detection. (best partition found by optimizing modularity of Newman and Girvan) 16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 11 16 =1deg 9 13 19 16 13 12 13 17 24 19 14 25 14 25 15 15 3 15 16 26 15 16 16 15 6 6 13 20 21 15 20 26 11 6 =2deg 8 11 4 11 8 8 8 12 3 11 8 8 9 3 6 6 12 8 6 4 6 8 6 4 23 23 6 7 8 5 8 1 10 10 =3deg 8 8 8 8 8 8 9 10 8 8 8 8 8 8 8 10 8 =4deg 1 1 8 1 1 1 1 1 1 =5deg Hamming similarity: S(S1,S2)=DegkDif(S1,S2) To produce an [all?] actual shortest path[s] between x and y: Thm: To produce a [all?]: S2P[s], take a [all?] middle vertex[es], x1, from SP1x & SP1y, produce: xx1y; S3P[s], take a [all?] vertex[es], x1, from SP1x and a [all?] vertex[es], x2, from S2P(x1,y): xx1x2y etc. Is it productive to actually produce (one time) a tree of [all?] shortest paths? I think it is not! 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 14 20 17 15 16 24 30 27 18 39 28 42 Can see that this Works Poorly At 1. 17 25 2 24 18 1 14 3 7 Not working! On the other hand, our standard community mining techniques (for kplexes) worked well on G7. Next slide let’s try Hamming on G8. G7 Deg1 4 4 4 4 4 b a 5 6 4 5 g 9 7 4 6 b 2 b 8 6 4 f 9 f 4 9 3 8 6 d 4 5 4 5 4 2 3 6 7 5 7 6 7 3 5 3 5 3 4 9 6 5 19 Deg2 5 8 12 17 8 16 17 16 4 24 21 21 26 20 20 20 19 16 19 23 30 13 15 22 14 20 18 11 14 15 10 15 14 21 14 17 10 4 3 2 4 3 10 21 8 10 15 18 15 15 10 17 18 35 1 2 3 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 40 41 42 46 44 53 48 54 52 45 43 39 38 20 21 24 47 23 22 19 25 36 18 37 35 27 26 28 29 31 32 33 30 51 50 34 49 G8

G9 G9, Agglomerative clustering of ESP2 using Hamming Similarity In ESP2, using Hamming similarity, we get three Event clusters, clustering events iff pTrees [Hamming] identical: EventCluster1={1,2,3,4,5} EventCluster2={6,7,8,9} EventCluster3={10,11,12,13,14} 3 3 6 4 8 8 10 14 12 5 4 6 3 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W ESP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E WSP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E 8 7 8 8 4 4 4 3 4 4 4 6 7 8 5 2 2 2 The Degree % of affiliation of Women with R,G,B events is: R G B 1 100% 75% 0% 2 80% 75% 0% 3 80% 100% 0% 4 80% 75% 0% 5 60% 25% 0% 6 40% 50% 0% 7 20% 75% 0% 8 0% 75% 0% 9 20% 75% 0% 10 0% 75% 20% 11 0% 50% 40% 12 0% 50% 80% 13 0% 75% 80% 14 0% 75% 100% 15 0% 50% 60% 16 0% 50% 0% 17 0% 25% 20% 18 0% 25% 20% W 1 9 9 9 9 9 e e e e 9 9 9 9 9 ESP2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E 2 3 4 5 6 7 8 9 10 11 12 13 14 E 18 16 18 18 12 16 16 17 18 18 17 17 18 18 18 17 13 13 WSP2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W G9 ESP3=ESP1’ and ESP4=ESP2’ so again, in this case, all info is already available in ESP1 and ESP2 (all shortest paths are of length 1 or 2). We don’t need ESPk k>2) 6 7 6 6 10 10 10 11 10 10 10 8 7 6 9 12 12 12 WSP3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E WSP3=WSP1’ and WSP4=WSP2’ so, in this case, all information is already available in WSP1 and WSP2 (All shortest paths are of length 1 or 2) (We don’t need WSPk k>2) Clustering Women using Degree% RGB affiliation: WomenClusterR={1,2,4,5} WomanClusterG={3,6,7,8,9,10,11,16,17,18} WomanClsuterB={12,13,14,15} This clustering seems fairly close to the authors. Other methods are possible and if another method puts event6 with 12345, then everything changes and the result seem even closer to the author’s intent..

Association Rule Mining (ARM) Horizontal Trans tbl T T(I) t1 i1 t2 i1, i2, i4 t3 i1, i3 t4 i1, i2, i4 t5 i3, i4 Given any relationship between entities, T (e.g., a set of Customer Transactionss which are involved in those relationship instances). and I (e.g., a set of Items which are involved in those relationship instances). The itemset, T(I), associated with (or related to) a particular transaction, T, is the subset of the items found in the shopping cart or market basket that the customer is bringing through check out at that time). An Association Rule, AC, associates 2 disjoint Itemsets. (A=antecedent, C=consequent) T I A t1 t2 t3 t4 t5 i1 i2 i3 i4 C Its graph The support [ratio] of itemset A, supp(A), is the fraction of Ts such that A  T(I), e.g., if A={i1,i2} and C={i4} then supp(A) = |{t2,t4}| / |{t1,t2,t3,t4,t5}| = 2/5 Note: | | means set size. The support [ratio] of rule AC, supp(AC), is the support of {A C}=|{T2,T4}|/|{T1,T2,T3,T4,T5}|=2/5 The confidence of rule AC, conf(AC), is supp(AC) / supp(A) = (2/5) / (2/5) = 1 Data Miners typically want to find all STRONG RULES, AC, with supp(AC) ≥ minsupp and conf(AC) ≥ minconf (minsupp, minconf are threshold levels) Note that conf(AC) is also just the conditional probability of t being related to C, given that t is related to A). Given a two entity relationship, we can do ARM with either entity taking the role of the transaction set APRIORI Association Rule Mining: Given a Transaction-Item Relationship, the APRIORI algorithm for finding all Strong I-rules can be done by: Processing a Horizontal Transaction Table (HTT) through vertical scans to find all Frequent I-sets ( e.g., I-sets "frequently" found in baskets). Processing a Vertical Transaction Table (VTT) through horizontal operations to find all Frequent I-sets Then each Frequent I-set found is analyzed to determine if it is the support set of a strong rule. Finding all Frequent I-sets is the hard part. To do this efficiently, the APRIORI Algorithm takes advantage of the "downward closure" property for Frequent I-sets: If a I-set is frequent, then all its subsets are also frequent. E.g., in the Market Basket Example, If A is an I-subset of B and if all of B is in a given Transaction's basket, the certainly all of A is in that basket too. Therefore Supp(A)  Supp(B) whenever AB. First, APRIORI scans to determine all Frequent 1-item I-sets (contain 1 item; therfore called 1-Itemsets), next APRIORI uses downward closure to efficiently find candidates for Frequent 2-Itemsets, next APRIORI scans to determine which of those candidate 2-Itemsets is actually Frequent, next APRIORI uses downward closure to efficiently find candidates for Frequent 3-Itemsets, next APRIORI scans to determine which of those candidate 3-Itemsets is actually Frequent, ... Until there are no candidates remaining (on the next slide we walk through an example using both a HTT and a VTT)

Example ARM using uncompressed ItemPtrees HTT Scan D C1 F1 = L1 C2 C2 Scan D F2 = L2 C3 itemset {2 3 5} {1 2 3} {1,3,5} F3 = L3 Scan D {123} pruned since {12} not frequent {135} pruned since {15} not frequent It seems the pruning step in purple above is unnecessary here since root count will show up below the threshold and that root count (using PopCount) is almost free anyway??? Example ARM using uncompressed ItemPtrees (the 1-count at the root of each Ptree) P1^P2^P3 1 //\\ 0010 P1^P3 ^P5 1 P2^P3 ^P5 2 0110 P1 2 //\\ 1010 P2 3 0111 P3 3 1110 P4 1 1000 P5 3 Build Item Ptrees: Scan D P1^P2 1 //\\ 0010 P1^P3 2 1010 P1^P5 1 P2^P3 2 0110 P2^P5 3 0111 P3^P5 2 TID 1 2 3 4 5 100 200 300 400 F1={1}{2}{3}{5} cts: 2 3 3 3 F2={13}{23}{25}{35} cts 2 2 3 2 F3={235} cts 2 All we need to do ARM are theses FrequentItemTables with Counts.

L1 L3 Data_Lecture_4.1_ARM L2 1-ItemSets don’t support Association Rules (They will have no antecedent or no consequent). 2-Itemsets do support ARs. Are there any Strong Rules supported by Frequent=Large 2-ItemSets (at minconf=.75)? {1,3} conf({1}{3}) = supp{1,3}/supp{1} = 2/2 = 1 ≥ .75 STRONG conf({3}{1}) = supp{1,3}/supp{3} = 2/3 = .67 < .75 {2,3} conf({2}{3}) = supp{2,3}/supp{2} = 2/3 = .67 < .75 conf({3}{2}) = supp{2,3}/supp{3} = 2/3 = .67 < .75 {2,5} conf({2}{5}) = supp{2,5}/supp{2} = 3/3 = 1 ≥ .75 STRONG! conf({5}{2}) = supp{2,5}/supp{5} = 3/3 = 1 ≥ .75 STRONG! {3,5} conf({3}{5}) = supp{3,5}/supp{3} = 2/3 = .67 < .75 conf({5}{3}) = supp{3,5}/supp{5} = 2/3 = .67 < .75 Are there any Strong Rules supported by Frequent or Large 3-ItemSets? {2,3,5} conf({2,3}{5}) = supp{2,3,5}/supp{2,3} = 2/2 = 1 ≥ .75 STRONG! conf({2,5}{3}) = supp{2,3,5}/supp{2,5} = 2/3 = .67 < .75 No subset antecedent can yield a strong rule either (i.e., no need to check conf({2}{3,5}) or conf({5}{2,3}) since both denominators will be at least as large and therefore, both confidences will be at least as low. conf({3,5}{2}) = supp{2,3,5}/supp{3,5} = 2/3 = .67 < .75 No need to check conf({3}{2,5}) or conf({5}{2,3}) DONE!

G9 Using ARM to find kplexes on the bipartite graph, G9? Does it work? WSP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18W 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E 8 7 8 7 4 4 4 3 4 4 4 6 7 8 5 2 2 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W ESP1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 E 3 3 6 4 8 8 a e c 5 4 6 3 3 G9 18 16 18 16 12 16 16 17 18 18 17 17 18 18 18 17 13 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 9 9 9 9 9 e e e e 9 9 9 9 9 ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 WomenSet ARM: MinSup=6 Mincnf=.75 EventSet ARM MnSp=9 Mncf=.75 Frequent 1WomenSets: 1 2 3 4 12 13 14 Frequency (#events attended) 8 7 8 8 6 7 8 Freq 1EventSets: 3 5 6 7 8 9 c Freq (# attended) 6 8 8 10 14 12 6 Candidate 2WomenSets: 12 13 14 1c 1d 1e 23 24 2c 2d 2e 34 3c 3d 3e 4c 4d 4e cd ce de Freq #events co-attended 6 7 7 2 2 2 6 6 1 2 2 7 2 3 3 2 3 3 6 5 6 Cand 2EventSets: 35 36 37 38 39 3c 56 57 58 59 5c 67 68 69 6c 78 79 7c 89 8c 9c Freq=#attended 6 5 4 5 2 0 6 6 7 3 0 5 7 4 1 8 5 4 9 5 5 Frequent 2WomenSets: 12 13 14 1c 1d 1e 23 24 2c 2d 2e 34 3c 3d 3e 4c 4d 4e cd ce de Freq #events co-attended 6 7 7 2 2 2 6 6 1 2 2 7 2 3 3 2 3 3 6 5 6 freq 2EventSets: 35 36 37 38 39 3c 56 57 58 59 5c 67 68 69 6c 78 79 7c 89 Freq=#attended 6 5 4 5 2 0 6 6 7 3 0 5 7 4 1 8 5 4 9 Cand 3EventSets 568 578 all others excl because a sub2 not freq Freq # attended 6 5 Cand3WSets: 123 124 134 234 (cde is excluded since ce is infreq) Freq #events co-attended 5 5 6 5 Frequent 3WomenSets: 123 124 134 234 Freq #events co-attended 5 5 6 5 Strong Erules 35 53 56 65 57 58 68 78 98 567 657 567 576 675 (Says 567 is a strong Event community?) Freq 3ESets: 567 Freq=6 5 StrongWrules 21 12 13 31 14 41 23 32 24 42 34 43 134 314 413 134 143 341 Says 1234 is a strong Women community? Confidence: .83 .75 .87 .87 .87 .87 .83 .75 .83 .75 .87 .87 .75 .75 .75 .83 .83 .83 But 134 is a very strong Women Commun? Note: When I did this ARM analysis, I had several degrees miscounted. None-the-less, I think the same general negative result is expected. Next we try using the WSP2 and ESP2 relationships for ARM??

G9 ESP2 EventSet ARM All rule confidences are either 18 16 18 16 12 16 16 17 18 18 17 17 18 18 18 17 13 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 9 9 9 9 9 e e e e 9 9 9 9 9 ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ESP2 EventSet ARM MnSp=9 Mncf=.75 WSP2 WomenSet ARM MinSup=18 Mincnf=.75 Freq1WSets: 1349adef Frequencies all 18 C2WSets: 13 14 19 1a 1d 1e 1f 34 39 3a 3d 3e 3f 49 4a 4d 4e 4f 9a 9d 9e 9f ad ae af de df ef Freq all 18 This is not interesting! Go to ESP2 Eset ARM Freq1EventSets E1 Freq 123456789abcde 99999eeee99999 Freq2EventSets E1 E2 Freq 1 23456789 99999999 2 3456789 9999999 3 456789 999999 4 56789 99999 5 6789 9999 6 789abcde eee99999 7 89abcde ee99999 8 9abcde e99999 9 abcde a bcde b cde 999 c de 99 d e Freq3EventSets E1 E2 E3 Freq 1 2 3456789 9999999 1|2 3 456789 999999 1|2|3 4 56789 99999 1|2|3|4 5 6789 9999 1|2|3|4|5 6 789 999 7 89 99 8 9 89abcde ee99999 6|7 9abcde e99999 6|7|8 abcde 6|7|8|9 a bcde 6|7|8|9|a b cde 6|7|8|9|a|b c de 6|7|8|9|a|b|c d e Freq4EventSets E1 E2 E3 E4 Freq 1 2 3 456789 999999 1|2 4 56789 99999 1|2|3 5 6789 9999 1|2|3|4 6 789 999 1|2|3|4|5 7 89 99 8 9 9abcde e99999 6|7 abcde 6|7|8 a bcde 6|7|8|9 b cde 6|7|8|9|a c de 6|7|8|9|a|b d e Freq5EventSets E1 E2 E3 E4 E5 Freq 1 2 3 4 56789 99999 1|2 5 6789 9999 1|2|3 6 789 999 1|2|3|4 7 89 99 1|2|3|4|5 8 9 abcde 6|7 a bcde 6|7|8 b cde 6|7|8|9 c de 6|7|8|9|a d e Freq8EventSets E1 E2 E3 E4 E5 E6 E7 E8 Freq 1 2 3 4 5 6 7 89 99 1|2 8 9 a b c de 6|7 d e Freq6EventSets E1 E2 E3 E4 E5 E6 Freq 1 2 3 4 5 6789 9999 1|2 6 789 999 1|2|3 7 89 99 1|2|3|4 8 9 a bcde 6|7 b cde 6|7|8 c de 6|7|8|9 d e All rule confidences are either 100% (9/9 or e/e) or 9/e=64% Freq9EventSets E1 E2 E3 E4 E5 E6 E7 E8 E9 Freq 1 2 3 4 5 6 7 8 9 a b c d e Freq7EventSets E1 E2 E3 E4 E5 E6 E7 Freq 1 2 3 4 5 6 789 999 1|2 7 89 99 1|2|3 8 9 a b cde 6|7 c de 6|7|8 d e ARM on either SP1 or SP2 (W or E) does not seem to help much in identifying communities.

G9 K-plex search on G9 (A k-plex is a SG missing  k edges If H is a k-plex and F is a ISG, then F is a kplex A graph (V,E) is a k-plex iff |V|(|V|-1)/2 – |E| k 1 8 8 8 8 8 d d d d 8 8 8 8 8 ESP2 2 3 4 5 6 7 8 9 10 11 12 13 14 E 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 15 17 15 11 15 15 16 17 17 16 16 17 17 17 16 12 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 W WSP2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 h f h f b f f g h h g g h h h g c c Events123456789abcde 14*13/2=91 degs=88888dddd88888 |Edge|=66 kplex k25 Events23456789abcde Not calculating k degs= 7777cccc88888 Until it gets lower Events3456789abcde 14*13/2=91 degs= 666bbbb88888 |Edges|=66 kpl Events456789abcde 14*13/2=91 degs= 55aaaa88888 |Edges|=66 kplex k25 Women123456789abcdefghi 18*17/2=153 degs=hfhfbffghhgghhhgcc |Edges| =139 kplex k14 Events56789abcde 14*13/2=91 degs= 4999988888 |Edges|=66 kplex k25 Women123456789abcdefgh 18*17/2=153 degs=gfgfbfffggffgggfc |Edges| =139 kplex k14 Events6789abcde 9*8/2=36 A 9Clique! degs= 888888888 |Edges|=36 kplex k0 Women123456789abcdefg 18*17/2=153 degs=ffffbffeffeefffe |Edges| =139 kplex k14 So take out {6789abcde} and start over. Women12346789abcdefg 15*14/2=105 degs=eeeeeeeeeeeeeee |Edges| =105 15kplex k0 15Clique Events12345 5*4/2=10 |Edges|=10 kplex k 0 A 5clique! degs: 44444 So take out {12346789abcdefg} and start over. If we had used the full algorithm which pursues each minimum degree tie path, one of them would start by eliminating 14 instead of 1. That will result in the 9Clique 123456789 and the 5Clique abcde. All the other 8 ties would result in one of these two situations. How can we know that ahead of time and avoid all those unproductive minimum degree tie paths? Women5hi 3*2/2=3 degs=011 |Edges| =1 kplex k2 Womenhi 2*1/2=1 degs=11 |Edges| =1 kplex k0 Clique We get no information from applying our kplex search algorithm to WSP2. Again, how could we know this ahead of time to avoid all the work? Possibly by noticing the very high 1-density of the pTrees? (only 28 zeros)? Every ISG of a Clique is a Clique so 6789 and 789 are Cliques (which seems to be the authors intent?) If the goal is to find all maximal Cliques, how do we know that CA=123456789 is maximal? If it weren’t then there would be at least one of abcde which when added to CA=123456789 would results in a 10Clique. Checking a: PCA&Pa would have to have count=9 (It doesn’t! It has count=5) and PCA(a) would have to be 1 (It isn’t. It’s 0). The same is true for bcde. The same type of analysis shows 6789abcde is maximal. I think one can prove that any Clique obtained by our algorithm would be maximal (without the above expensive check), since we start with the whole vertex set and throw out one at a time until we get a clique, so it has to be maximal? The Women associated strongly with the blue EventClique, abgde are {12 13 14 15 16} and associated but loosely are {10 11 17 18}. The Women associated strongly with the green EventClique, 12345 are {1 2 3 4 5} and associated but loosely are {6 7 9}