Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Multi-hop closure theorem for the Rolodex Model using pTrees

Similar presentations


Presentation on theme: "The Multi-hop closure theorem for the Rolodex Model using pTrees"— Presentation transcript:

1 The Multi-hop closure theorem for the Rolodex Model using pTrees
Arijit Chatterjee, Arjun G. Roy, Mohammad Hossain William Perrizo Computer Science Department North Dakota State University

2 Data Mining Reference :

3 But it is pure (pure0) so this branch ends
Vertical Structuring into predicate Trees (pTrees): project attributes (4 files) pTrees then vertically slice off each bit position (12 files) Given a table of Horizontal records. (traditionally Vertically Processed, so VPHD ) then compress each bit slice into a pTree e.g., compress R11 into P11: =2 Determine the number of occurences of We use Horizontal Processing of Vertical Data, HPVD, to find the number of occurences of ? R(A1 A2 A3 A4) R[A1] R[A2] R[A3] R[A4] Base 10 Base 2 = for Horizontally structured records Scan vertically R11 1 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 pure1? false=0 pure1? true=1 pure1? false=0 pure1? false=0 pure1? false=0 Record the truth of the predicate pure1 (all 1-bits) in a tree recursively on halves until the half is pure. 1. Whole is pure1? false  0 0 0 0 1 P11 P11 P12 P13 P21 P22 P23 P31 P32 P33 P41 P42 P43 0 0 0 1 1 1 0 1 0 01 0 1 0 0 ^ 2. Left half pure1? false  0 3. Right half pure1? false  0 0 0 P11 4. Left half of rt half ? false0 0 0 5. Rt half of right half? true1 0 0 0 1 To count occurrences of 7,0,1,4 use P11^P12^P13^P’21^P’22^P’23^P’31^P’32^P33^P41^P’42^P’43 But it is pure (pure0) so this branch ends

4 Multi-relationships RoloDex Model: and the RoloDex Model
2 Entity Many relationship cards axes Multi-relationships and the RoloDex Model  Customer 1 2 3 4 Item 1 customer rates movie as 5 card 1 5 people 2 3 4 items terms DataCube Model for 3 entities, items, people and terms. cust item card 5 6 7 People  1 2 3 4 Author movie 2 3 1 5 4 customer rates movie card 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enrollments 1 Doc termdoc card authordoc card 1 3 2 Doc 1 2 3 4 Gene genegene card (ppi) docdoc People  term  7 6 5 4 3 Gene 1 2 3 4 G 5 6 7 6 5 4 3 2 t 1 termterm card (share stem?) 1 3 Exp expPI card expgene card genegene card (ppi) The bottom line point of this slides is that all entities are interrelated through multiple relationships (multiple "hops") Items: i1 i2 i3 i4 i5 |0 001|0 |0 11| |1 001|0 |1 01| |2 010|1 |0 10| People: p1 p p3 p4 |0 100|A|M| |1 001|T|M| |2 010|S|F| |3 011|B|F| |4 100|C|M| Terms: t1 t2 t3 t4 t5 t6 |1 010|1 101|2 11| |2 001|0 000|3 11| |3 011|1 001|3 11| |4 011|3 001|0 00| Relationship: p1 i1 t1 |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 |3 0| 2 |4 1| 2 |5 1|_2 Relational Model: How can we data mine those multi-relationships?

5 Multi-hop rule mining using the RoloDex model:
A hop is a relationship, R, hopping from one entity, E to another, F. Given a high value "relationship" data set, we pay the one-time cost of creating pTrees both ways. 2 3 4 5 F 4 1 3 1 2 1 1 1 R, as a matrix, has both E-pTrees (horizontal bit slices, Re) and F-pTrees (vertical bit slices, Rf) E R(E,F) Standard ARM finds strong (frequent, confident) single-entity rules. ( i.e., both A and C  E Counts are on the other entity, F). ct(&eARe)  mnsp ct(&eARe &eCRe) / ct(&eARe)  mncf What about multi-entity rules? ( i.e., AE, CF ). A 1-hop (AE and CF) F-focused rule is strong if: ( "focused on" refers to where the counts are taken) ct(&eARe)  mnsp ct(&eARe & PC) / ct(&eARe)  mncf 1-hop, F-focused strong rules can be mined efficiently because of 1. (antecedent downward closure) If A is frequent, then all of its subsets are frequent. Or, if A is infrequent, then all its supersets are infrequent. Since frequency involves only A, we can mine for all qualifying antecedents efficiently using downward closure. 2. (consequent upward closure) If AC is non-confident, then so is AD for all subsets, D, of C. So  frequent antecedent, A, use upward closure to mine for all of its' confident consequents. The theorem suggested here is: For (a+c)-hop strong rule mining with a focus entity which is a hops from the antecedent and c hops from the consequent, if a {c} is odd [even], use downward [upward] closure on the frequency {confidence} step in the mining. In this case A is 1-hop from F (1 is odd, use downward closure). C is 0-hops from F (0 is even, use upward closure). A 1-hop (AE and CF) E-focused rule is strong if: ct( PA)  mnsp ct( PA &fC SC ) / ct( PA )  mncf

6 ct(&eARe &gCSg) / ct(&eARe)  mncf
2-hop F-focused (The focus is on middle entity, F) C G S(F,G) 1 4 1 3 AC strong if: ct(&eARe)  mnsp ct(&eARe &gCSg) / ct(&eARe)  mncf 1 2 1 1 1. (antecedent downward closure) If A is infrequent, then so are all of its supersets. 2 3 4 5 F 2. (consequent downward closure) If AC is non-confident, so is AD for all supersets, D. 4 1 1, down, down 3 1 2 1 1 1 A  E R(E,F) 2-hop G-focused ct(&f&eAReSf)mnsp  mncf ct(&f&eAReSf & PC) / &f&eAReSf 1. (antecedent upward closure) If A is infrequent, then so for are all subsets. 2,0 up, up 2. (consequent upward closure) If AC is non-confident, so is AD for all subsets, D. 2-hop E-focused ct(PA)mnsp  mncf ct(PA&f&gCSgRf ) / ct(PA) 0,2 up,up 1. (antecedent upward closure) If A is infrequent, then so for are all subsets. 2. (consequent upward closure) If AC is non-confident, so is AD for all subsets, D. It was 2-hop F-focus that generated the interest in multi-hop rule mining, in particular: R = a "friends" relationship (e.g., from Facebook) S = a "buys" relationship between people and items. Is it a strong rule that friends of those who bought a set of items, also buy those items?

7 ct( &f&eAReSf &h(& )UiTh ) / ct(&f&eAReSf)
S(F,G) R(E,F) 1 2 3 4 E F 5 G A C T(G,H) H U(H,I) I V(I,J) J 5-hop Focus on G: (if yellow then green)  mnsp ct( &f&eAReSf &h(& )UiTh ) / ct(&f&eAReSf)  mncnf i(&jCVj) 5-hop focus on G: 1. (antecedent has upward closure) 2. (consequent has downward closure)

8 Multi-hop closure property theorem
“For transitive (a+c)-hop strong rule mining with a focus entity which is ‘a’ hops from the antecedent and ‘c’ hops from the consequent, if a [or c] is odd or even then one can use downward or upward closure respectively on that step“

9 &elist(&clist(&aDXa)Yc)We
The Multi-hop Closure Theorem A condition is downward [upward] closed: If when it is true of A, it is true for all subsets [supersets], D, of A. Given an (a+c)-hop multi-relationship, where the focus entity is a hops from the antecedent and c hops from the consequent, if a [or c] is odd/even then downward/upward closure applies. A pTree, X, is said to be "covered by" a pTree, Y, if  one-bit in X, there is a one-bit at that same position in Y (the list corresponding to the bitmap, Y, is a superset of the list corresponding to the bitmap, X) Lemma-0: For any two pTrees, X, Y; X&Y is covered by X and thus ct(X&Y)  ct(X) and list(X&Y)list(X) Proof-0: ANDing with Y may zero some of X's ones but it will never change any zeros to ones. Lemma-1: Let AD, &aAXa covers &aDXa (ANDing over a superset always covers) Lemma-2: Let AD, &clist(&aDXa)Yc covers &clist(&aAXa)Yc Proof-1&2: Let Z=&aD-AXa then &aDXa =Z&(&aAXa). lemma-1 now follows from lemma-0, as does D'=list(&aAXa) A'=list(&aDXa)  so by lemma-1, we get lemma-2: Lemma-2: Let AD, &clist(&aDXa)Yc covers &clist(&aAXa)Yc Lemma-3: AD, &elist(&clist(&aAXa)Yc)We covers &elist(&clist(&aDXa)Yc)We Proof-3: lemma-3 follows in the same way from lemma-1 and lemma-2. Continuing this establishes: If there are an odd number of nested &'s then the expression with D is covered by the expression with A. Therefore the count with D  with A. Thus, if the frequent expression and the confidence expression are > threshold for A then the same is true for D. This establishes downward closure. Exactly analogously, if there are an even number of nested &'s we get the upward closures.


Download ppt "The Multi-hop closure theorem for the Rolodex Model using pTrees"

Similar presentations


Ads by Google