MYRRH A hop is a relationship, R, hopping from one entity, E, to another entity, F. Strong Rule Mining (SRM) finds all frequent and confident rules, AC.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
RoloDex Model The Data Cube Model gives a great picture of relationships, but can become gigantic (instances are bitmapped rather than listed, so there.
Mining Association Rules
Mining Association Rules
Entity Tables, Relationship Tables We Classify using any Table (as the Training Table) on any of its columns, the class label column. Medical Expert System:
Toward a Unified Theory of Data Mining DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Assume a Partition has uniquely labeled components.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Data Mining 1 Data Mining is one aspect of Database Query Processing (on the "what if" or pattern and trend end of Query Processing, rather than the "please.
Query and Analysis on the document and customer/item bag card of the DataDex Kellie Erickson.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining
1234 G Exp G So as not to duplicate axes, this copy of G should be folded over to coincide with the other copy, producing a "conical" unipartite.
Graph Path Analytics (using pTrees)
A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F)
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Elsayed Hemayed Data Mining Course
document 2345 course Text person EnrollEnroll Buy MYRRH ManY-Relationship-Rule Harvester.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
APPENDIX: Data Mining DUALITIES : 1. PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Given any set, S: A Partition is a decomposition of a set.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
Path Analytics (using pTrees)
Chapter 2 Sets and Functions.
Data Mining Find information from data data ? information.
Security in Outsourcing of Association Rule Mining
pTrees predicate Tree technologies
Reducing Number of Candidates
Toward a Unified Theory of Data Mining DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Assume a Partition has uniquely.
Association rule mining
Mining Association Rules
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH
Frequent Pattern Mining
The vertex-labelled, edge-labelled graph
pTrees predicate Tree technologies
Program layers of a DBMS
GAIO threshold = 15 become: V= D2 H4 GAIO-Ct=
Using a 3-dim DSR(Document Sender Receiver) matrix and
All Shortest Path pTrees for a unipartite undirected graph, G7 (SP1, SP2, SP3, SP4, SP5)
Graph Path Analytics (using pTrees)
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Mining Complex Data COMP Seminar Spring 2011.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
prove that it is an addition (it is a nudge worth reading about).
Discriminative Pattern Mining
Lesson 11 - R Chapter 11 Review:
The Multi-hop closure theorem for the Rolodex Model using pTrees
Lecture 5 Theory of AUTOMATA
PARTIALLY ORDERED SET DIRECTED ACYCLIC GRAPH
Association Analysis: Basic Concepts
Presentation transcript:

MYRRH A hop is a relationship, R, hopping from one entity, E, to another entity, F. Strong Rule Mining (SRM) finds all frequent and confident rules, AC. R(E,F) 1 2 3 4 E F 5 Frequency can lower bound the antecedent, consequent or both (ARM: A,CE, Its justification is the elimination of insignificant cases. Its purpose is the tractability of SRM. ct(&eACRe)  mnsp) Confidence lower bounds the frequency of both over the antecedent frequency: ct(&eARe &eCRe) / ct(&eARe)mncf The crux of SRM is frequencies. To compare frequencies meaningfully, they must be on the same entity (the focus entity). SRMs are categorized by the number of hops, k, whether transitive or non-transitive and by the focus entity. ARM is 1-hop, non-transitive (A,CE), F-focused SRM (1nF) (note: non-transitivity is difficult to define in multi-hop SRM??) 1-hop, transitive (AE,CF), F-focused SRM (1tF) APRIORI: ct(&eARe)  mnsp ct(&eARe &PC) / ct(&eARe)  mncf 1. (antecedent downward closure) If A is frequent, all of its subsets are frequent. Or, if A is infrequent, then so are all of its supersets. Since frequency involves only A, we can mine for all qualifying antecedents efficiently using downward closure. 2. (consequent upward closure) If AC is non-confident, then so is AD for all subsets, D, of C. So  frequent antecedent, A, use upward closure to mine for all of its' confident consequents. The theorem we demonstrate throughout this section is: For transitive (a+c)-hop Apriori strong rule mining with a focus entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even then one can use downward/upward closure on that step in the mining of strong (frequent and confident) rules. In this case A is 1-hop from F (odd, use downward closure). C is 0-hops from F (even, use upward closure). We will be checking more examples to see if the Odddownward Evenupward theorem seems to hold. 1-hop, transitive, E-focused rule, AC SRM (1tE) |A|=ct(PA)  mnsp ct(PA&fCRf) / ct(PA)  mncf 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets. 2. (consequent downward closure) If AC is non-confident, then so is AD for all supersets, D, of C. In this case A is 0-hops from E (even, use upward closure). C is 1-hop from E (odd, use downward closure).

ct(&eARe &gCSg) / ct(&eARe)  mncf 2-hop transitive F-focused (focus on middle entity, F) AC strong if: ct(&eARe)  mnsp and C G S(F,G) ct(&eARe &gCSg) / ct(&eARe)  mncf 1 4 1 3 1. (antecedent downward closure) If A is infrequent, so are all of its supersets. 1 2 1 1 2. (consequent downward closure) If AC is non-confident, so is AD for all supersets, D. 3. Apriori for 2-hops: Find all freq antecedents, A, using downward closure. For each: find C1G, the set of g's s.t. A{g} is confident. Find C2G, the set of C1G pairs that are confident consequents for antecedent, A. Find C3G, the set of triples (from C2G) s.t. all subpairs are in C2G (ala Apriori), etc. 1,1 odd so down, down correct. 2 3 4 5 F 2,0 even so up,up is correct. 0,2 even so up,up is correct. 4 1 3 1 2 1 Does standard ARM give the same info after collapsing R and S into T via, 1 2 3 4 G T(E,G) E A  C T(e,g) = 1 iff (&eARe)g = 1 NO! Standard ARM has A and C both subsets of E (or G) If G=E and S(F,E)=transposeR(E,F) then it is standard ARM? What if G=E=F and T is collapsed? Think about the facebook case (where G=E=people, R and S are directional friendships...? ct(&flist&eAReSf)mnsp  mncf ct(&flist&eAReSf & PC) / &flist&eAReSf 1 1 A  E R(E,F) 2-hop trans G-foc ct(&fl&eAReSf)mnsp  mncf ct(&f&eAReSf & PC) / &f&eAReSf 1. (antecedent upward closure) If A is infrequent, then so for are all subsets. 2. (consequent upward closure) If AC is non-confident, so is AD for all subsets, D. 2-hop trans E-foc ct(PA)mnsp  mncf ct(PA&f&gCSgRf ) / ct(PA) 1. (antecedent upward closure) If A is infrequent, so are all subsets. 2. (consequent upward closure) If AC is non-confident, so is AD for all subsets, D. We can for multi-hop relationships from the cards of 1-hop relationships Does this open up a new area of text mining? TDP=2 TDP=1 1 2 3 4 D T 5 A  C TDP=2 TDP=1 1 2 3 4 T D 5 A  C 1 2 … 9 Term 3 D DTPe k=1..7 TDRolodexCd 1 2 … 7 Pos 3 D DTPe k=1..9 PDCd 1 2 … 7 Pos 9 T DTPe k=1..3 PTCd

RoloDex Model: Relational Model: |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 2 Entities many relationships 1 customer rates movie as 5 card One can form multi-hops with any of these cards. Are there any that provide and interesting setting for ARM data mining?  Customer 1 2 3 4 Item cust item card 1 5 people 2 3 4 items terms DataCube Model for 3 entities, items, people and terms. 1 2 3 4 Author 5 6 7 People  movie 2 3 1 5 4 customer rates movie card 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enrollments 1 Doc termdoc card authordoc card 1 3 2 Doc 1 2 3 4 Gene genegene card (ppi) docdoc People  term  7 6 5 4 3 Gene 1 2 3 4 G 5 6 7 6 5 4 3 2 t 1 termterm card (share stem?) 1 3 Exp expPI card expgene card genegene card (ppi) Items: i1 i2 i3 i4 i5 |0 001|0 |0 11| |1 001|0 |1 01| |2 010|1 |0 10| People: p1 p2 p3 p4 |0 100|A|M| |1 001|T|M| |2 010|S|F| |3 011|B|F| |4 100|C|M| Terms: t1 t2 t3 t4 t5 t6 |1 010|1 101|2 11| |2 001|0 000|3 11| |3 011|1 001|3 11| |4 011|3 001|0 00| Relationship: p1 i1 t1 |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 |3 0| 2 |4 1| 2 |5 1|_2 Relational Model:

RoloDex Model: Relational Model: |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 2 Entities many relationships Supp(A) = CusFreq(ItemSet) Conf(AB) =Supp(AB)/Supp(A) 5 6 16 ItemSet ItemSet antecedent 1 2 3 4 5 6 16 itemset itemset card  Customer 1 2 3 4 Item 1 customer rates movie as 5 card 1 5 people 2 3 4 items terms DataCube Model for 3 entities, items, people and terms. cust item card 1 2 3 4 Author 5 6 7 People  movie 2 3 1 5 4 customer rates movie card 2 3 4 5 PI 2 3 4 5 PI 4 3 2 1 Course Enrollments 1 Doc termdoc card authordoc card 1 3 2 Doc 1 2 3 4 Gene genegene card (ppi) docdoc People  term  7 6 5 4 3 Gene 1 2 3 4 G 5 6 1 3 Exp 7 6 5 4 3 2 t 1 termterm card (share stem?) expPI card expgene card genegene card (ppi) Items: i1 i2 i3 i4 i5 |0 001|0 |0 11| |1 001|0 |1 01| |2 010|1 |0 10| People: p1 p2 p3 p4 |0 100|A|M| |1 001|T|M| |2 010|S|F| |3 011|B|F| |4 100|C|M| Terms: t1 t2 t3 t4 t5 t6 |1 010|1 101|2 11| |2 001|0 000|3 11| |3 011|1 001|3 11| |4 011|3 001|0 00| Relationship: p1 i1 t1 |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 |3 0| 2 |4 1| 2 |5 1|_2 Relational Model:

ct(&eARe ct(&eARe &g&hCThSg) / ct(&eARe ct(&f&eAReSf) &hCTh) 3-hop Collapse T: TC≡ {gG|T(g,h) hC} That's just 2-hop case w TCG replacing C. ( can be replaced by  or any other quantifier. The choice of quantifier should match that intended for C.). Collapse T and S: STC≡{fF |S(f,g) gTC} Then it's 1-hop w STC replacing C.  mnsup ct(&eARe S(F,G) R(E,F) 1 2 3 4 E F 5 G A C T(G,H) H Focus on F  mncnf ct(&eARe &g&hCThSg) / ct(&eARe antecedent downward closure: A infreq. implies supersets infreq. A 1-hop from F (down consequent upward closure: AC noncnf implies AD noncnf. DC. C 2-hops (up  mnsp ct(&f&eAReSf) Focus on G  mncnf &hCTh) ct(&f&eAReSf / ct(&f&eAReSf) antecedent upward closure: A infreq. implies all subsets infreq. A 2-hop from G (up) consequent downward closure: AC noncnf impl AD noncnf. DC. C 1-hops (down) ct( 1001 &g=1,3,4 Sg ) /ct(1001) ct( 1001 &1001&1000&1100) / 2 ct( 1000 ) / 2 = 1/2 Focus on F Are they different? Yes, because the confidences can be different numbers. Focus on G. ct(&eARe &glist&hCThSg ) /ct(&eARe &hCTh) ct(&flist&eAReSf / ct(&flist&eAReSf) ct(&f=2,5Sf &1101 ) / ct(&f=2,5Sf ct(1101 & 0011 & / ct(1101 & 0011 ) ct(0001 ) / ct(0001) = 1/1 =1 ct(PA & Rf) f&g&hCThSg / ct(PA)  mncnf  mnsup ct(PA) Focus on E antecedent upward closure: A infreq. implies subsets infreq. A 0-hops from E (up) consequent downward closure: AC noncnf implies AD noncnf. DC. C 3-hops (down) Focus on H antecedent downward closure: A infreq. implies all subsets infreq. A 3-hops from G (down) consequent upward closure: AC noncnf impl AD noncnf. DC. C 0-hops (up) ct(& Tg & PC) g&f&eAReSf mncnf /ct(& Tg) ct(& Tg) mnsp

* ct(&iCUi) ) ct(&f&eAReSf) ct( &f&eAReSf &h&iCUiTh ) 4-hop S(F,G) R(E,F) 1 2 3 4 E F 5 G A C T(G,H) H U(H,I) I Focus on G? Replace C by UC; A by RA as above (not different from 2 hop?) Focus on H (RA for A, use 3-hop) or focus on F (UC for C, use 3-hop). Another focus on G (the main way)  mnsup ct(&f&eAReSf)  mncnf ct( &f&eAReSf &h&iCUiTh ) / ct(&f&eAReSf) ... R(E,G) 1 2 3 4 E G 5 A Sn(G,G) S1(G,G) U(G,I) C I F=G=H=genes and S,T=gene-gene intereactions. More than 3, S1, ..., Sn? &iCUi))+ (ct(S1(&eARe mncnf / ( (ct(&eARe))n * ct(&iCUi) ) &iCUi))+... ct(S2(&eARe &iCUi)) ) ct(Sn(&eARe If the S cube can be implemented so counts can be can be made of the 3-rectangle in blue directly, calculation of confidence would be fast. 4-hop APRIORI focus on G:  mnsup ct(&f&eAReSf)  mncnf ct(&f&eAReSf &h&iCUiTh) / ct(&f&eAReSf) 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 2. (consequent upward closure) If AC is non-confident, then so is AD for all subsets, D, of C (the "list" will be larger, so the AND over the list will produce fewer ones) So  frequent antecedent, A, use upward closure to mine out all confident consequents, C.

ct(&f&eAReSf) ct( &f&eAReSf &h(& )UiTh ) / ct(&f&eAReSf) S(F,G) R(E,F) 1 2 3 4 E F 5 G A C T(G,H) H U(H,I) I V(I,J) J 5-hop Focus on G:  mnsup ct(&f&eAReSf) ct( &f&eAReSf &h(& )UiTh ) / ct(&f&eAReSf)  mncnf i(&jCVj) 5-hop APRIORI focus on G: 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 2. (consequent downward closure) If AC is non-confident, then so is AD for all supersets, D, of C. So  frequent antecedent, A, use downward closure to mine out all confident consequents, C.

ct( &f(& )ReSf) ct( &f(& )ReSf &h(& )UiTh) / ct( &f(& )ReSf ) 6-hop S(F,G) R(E,F) 1 2 3 4 E F 5 G A C T(G,H) H U(H,I) I V(I,J) J D Q(D,E) The conclusion we have demonstrated (but not proven) is: for (a+c)-hop transitive Apriori ARM with focus the entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even use downward/upward closure on that step in the mining of strong (frequent and confident) rules. Focus on G: ct( &f(& )ReSf)  mnsup e(&dDQd) ct( &f(& )ReSf &h(& )UiTh) / e(&dDQd) i(&jCVj) ct( &f(& )ReSf )  mncnf e(&dDQd) 6-hop APRIORI: 1. (antecedent downward closure) If A is infrequent, then so are all of its supersetsbsets. Frequency involves only A, so mine all qualifying antecedents using downward closure. 2. (consequent downward closure) If AC is non-confident, then so is AD for all supersets, D, of C. So  frequent antecedent, A, use downward closure to mine out all confident consequents, C.

Given any 1-hop labeled relationship (e. g Given any 1-hop labeled relationship (e.g., cells have values from {1,2,…,n} then there is: 1. a natural n-hop transitive relationship, A implies D, by alternating entities for each specific label value relationship. 2. cards for each entity consisting of the bitslices of cell values. E.g., in netflix, Rating(Cust,Movie) has label set {0,1,2,3,4,5}, so in 1. it generates a bonafide 6-hop transitive relationship. In 2. an alternative is to bitmap each label value (rather than bitslicing them). Below Rn-i can be bitslices or bitmaps R3(C,M) R2(M,C) 1 2 3 4 M C 5 A D R4(M,C) R5(C,M) R0(M,C) R1(C,M) R0(E,F) Rn-2(E,F) Rn-1(E,F) F 2 3 4 5 1 E A  ... D E.g., equity trading on a given day, QuantityBought(Cust,Stock) w labels {0,1,2,3,4,5} (where n means n thousand shares) so that generates a bonafide 6-hop transitive relationship: E.g., equity trading - moved similarly, (define moved similarly on a day --> StockStock(#DaysMovedSimilarlyOfLast10) E.g., equity trading - moved similarly2, (define moved similarly to mean that stock2 moved similarly to what stock1 did the previous day.Define relationship StockStock(#DaysMovedSimilarlyOfLast10) E.g., Gene-Experiment, Label values could be "expression level". Intervalize and go! Has Strong Transitive Rule Mining (STRM) been done? Are their downward and upward closure theorems already for it? Is it useful? That is, are there good examples of use: stocks, gene-experiment, MBR, Netflix predictor,...

ct(&iABBi &tDBt) / ct(&iABBi)  mncf Let Types be an entity which clusters Items (moves Items up the semantic hierarchy), E.g., in a store, Types might include; dairy, hardware, household, canned, snacks, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and l et D by a TypesSet which does not include TA. Then: 1 Buys(C,T) BoughtBy(I,C,) Items Customers 2 3 4 5 Types (of Items) A  D 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD confident might mean ct(&iABBi &tDBt) / ct(&iABBi)  mncf ct(&iABBi | tDBt) / ct(&iABBi)  mncf ct( | iABBi | tDBt) / ct( | iABBi)  mncf ct( | iABBi &tDBt) / ct( | iABBi)  mncf AD frequent might mean ct(&iABBi)  mnsp ct( | iABBi)  mnsp ct(&tDBt)  mnsp ct( | tDBt)  mnsp ct(&iABBi &tDBt)  mnsp, etc.

ct(&iABBi &tDBt) / ct(&iABBi)  mncf Let Types be an entity which clusters Items (moves Items up the semantic hierarchy), E.g., in a store, Types might include; dairy, hardware, household, canned, snacks, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and l et D by a TypesSet which does not include TA. Then: 1 Buys(C,T) BoughtBy(I,C,) Items Customers 2 3 4 5 Types (of Items) A  D 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD might mean If iA s.t. BB(i,c) then tT, B(c,t) AD confident might mean ct(&iABBi &tDBt) / ct(&iABBi)  mncf ct(&iABBi | tDBt) / ct(&iABBi)  mncf ct( | iABBi | tDBt) / ct( | iABBi)  mncf ct( | iABBi &tDBt) / ct( | iABBi)  mncf AD frequent might mean ct(&iABBi)  mnsp ct( | iABBi)  mnsp ct(&tDBt)  mnsp ct( | tDBt)  mnsp ct(&iABBi &tDBt)  mnsp, etc.