Download presentation
Presentation is loading. Please wait.
1
RoloDex Model The Data Cube Model gives a great picture of relationships, but can become gigantic (instances are bitmapped rather than listed, so there needs to be a position for each potential instance, not just each extant instance). The inefficiency described above is especially severe in the very common Bipartite - Unipartite on Part (BUP) relationships. Examples: In Bioinformatics, bipartite relationships between genes (one entity) and experiments or treatments (another entity) are studied in conjunction with unipartite relationships on one of the gene part (e.g., gene-gene or protein-protein interactions). In Market Research, bipartite relationships between items and customers are studied in conjunction with unipartite relationships on the customer part (or on the product part, or both). For this situation, the Relational Model provides no picture and the Data Cube Model is too inefficient (requires that the unipartite relationship be redundantly replicated for every instance of the other bi-part). We suggest the RoloDex Model.
2
1234 G 1 1 3 Exp 1 2 3 4 G So as not to duplicate axes, this copy of G should be folded over to coincide with the other copy, producing a "conical" unipartite card. The Bipartite, Unipartite-on-Part Experiment Gene Relationship, EGG
3
Axis-Card pair (Entity-Relationship pair), a c(a,b), a support count for AxisSets (or ratio or %) : A, for a graph relationship, supp G (A, a c(a,b))=|{b: a A, (a,b) c}| and for a multigraph, supp MG is the histogram over b of (a,b)-EdgeCounts, a A. Other quantifiers can be used also (e.g., the universal, is used in MBR) Customer 1 2 3 4 Item 7 6 5 4 3 2 t 1 6 5 4 3 Gene 1 1 1 Doc 1 2 3 4 Gene 1 1 3 Exp 1 1 1 1 1 1 1 1 1234 Author 1234 G 56 term 7 567 People 1 1 1 1 1 1 3 2 1 Doc 2345 PI People cust item card authordoc card termdoc card docdoc termterm card (share stem?) expgene card gene gene card (ppi) expPI card Most interestingness measure are based on one of these supports. In IR, df(t) = supp G ({t}, t c(t,d)); tf(t,d) is the one histogram bar in supp MG ({t}, t c(t,d)) In MBR supp(I)=supp G (I. i c(i,t)) In MDA, supp MG (GSet, g c(g,e)) Of course all supports are inherited redundantly by the card, c(a,b). 5 6 16 ItemSet Supp(A) = CusFreq(ItemSet) gene gene card (ppi) RoloDex Model ItemSet antecedent 12345616 itemset itemset card Conf(A B) =Supp(A B)/Supp(A)
4
card (RELATIONSHIP) c(I,T) one has Association Rules among disjoint Isets, A C, A,C I, with A∩C= ∅ and Association Rules among disjoint Tsets, A C, A,C T, with A∩C= ∅ Two measures of quality of A C are: SUPP(A C) where e.g., for any Iset, A, SUPP(A) ≡ |{ t | (i,t) E i A}| CONF(A C) = SUPP(A C)/SUPP(A) First Cousin Association Rules: Given any card sharing an axis with the bipartite relationship, B(T,I), e.g., C(T,U) Cousin Association Rules are those in which the antecedent, Tsets is generated by a subset, S, of U as follows: {t T| u S such that (t,u) C} (note this should be called an "existential first cousin AR" since we are using the existential quantifier. One can use the universal quantifier (used in MBR AR s )) E.g., S U, A=C(S), A' T then A A' is a CAR and we can also label it S A' First Cousin Association Rules Once Removed (FCAR1R s ) are those in which both Tsets are generated by another bipartite relationship and we can label antecedent and or the consequent using the generating set or the Tset. Cousin Association Rule Mining Approach (CARMA)
5
The Cousin Association Rule Mining Approach (CARMA) Second Cousin Association Rules are those in which the antecedent Tset is generated by a subset of an axis which shares a card with T, which shares the card, B, with I. 2CAR s can be denoted using the generating (second cousin) set or the Tset antecedent. Second Cousin Association Rules once removed are those in which the antecedent Tset is generated by a subset of an axis which shares a card with T, which shares the card, B, with I and the consequent is generated by C(T,U) (a first cousin, Tset). 2CAR-1r s can be denoted using any combination of the generating (second cousin) set or the Tset antecedent and the generating (first cousin) or Tset consequent. Second Cousin Association Rules twice removed are those in which the antecedent Tset is generated by a subset of an axis which shares a card with T, which shares the card, B, with I and the consequent is generated by a subset of an axis which shares a card with T, which shares another first cousin card with I. 2CAR-2r s can be denoted using any combination of the generating (second cousin) set or the Tset antecedent and the generating (second cousin) or Tset consequent. Note 2CAR-2r s are also 2CAR-1r s so they can be denoted as above also. Third Cousin Association Rules are those.... We note that these definitions give us many opportunities to define quality measures
6
Customer 1 2 3 4 Item 7 6 5 4 3 2 t 1 6 5 4 3 Gene 1 1 1 Doc 1 2 3 4 Gene 1 1 3 Exp 1 1 1 1 1 1 1 1 1234 Author 1234 G 56 term 7 567 People 1 1 1 1 1 1 3 2 1 Doc 2345 PI People cust item card authordoc card termdoc card docdoc termterm card (share stem?) expgene card gene gene card (ppi) expPI card For Distance CARMA relationships, quality (e.g., supp or conf or???) can be measured using information on any/all cards along the relationship (multiple cards can contribute factors or terms or in some other way???) gene gene card (ppi) Measuring CARMA Quality in the RoloDex Model
7
First, we propose definition of Generalized Association Rules (GARs) which contains the standard "1 Entity Itemset" AR definition as a special case. Association Pathway Mining (APM) is a DM technique (with application to bioinformatics?) Given Relationships, R 1,R 2 (RoloDex cards) with shared Entity,E 2, (axis), E 1 R 1 E 2 R 2 E 3 and given A E 1 and C E 3, then A C, is a Generalized E 2 Association Rule, with Support R 1 R 2 (A C) = | {t E 2 | a A, (a,t) R 1 and c C, (c,t) R 2 } | Confidence R 1 R 2 (A C) = Support R 1 R 2 (A C) / Support R 1 (A) where as always, Support R 1 (A) = |{t E 2 | a A, (a,t) R 1 }|. E 3 =E 1, the GAR is a standard AR iff A C= . Association Pathway Mining (APM) is the identification and assessment (e.g., support, confidence, etc.)of chains of GARs in a RoloDex. Restricting to the mining of cousin GARs reduces the number of strong rules or pathways links. Generalized CARMA:
8
Downward closure property of Support Sets: SS(A' C') SS(A C) A' A, C' C Therefore, if all labels are non-negative, then LSW(A C) LSW(A' C') (in order for LSW(A C) to exceed a threshold is that all LSW(A' C') exceed that threshold A' A, C' C). So an Apriori-like frequent set pair mine would go as: Start with pairs of 1-sets (in E 1 and E 3 ). The only candidate 2-antecedents with 1-consequents (equiv, 2-consequents with 1-antecedents) would be those formed by joining... The weighted support concept can be extended to the case there R 1 and/or R 2 have labels as well. Vertical methods can be applied by converting E 2 to vertical format (E 2 instances are the rows and pertinent features from other cards/axes are "rolled over" to E 2 as derived feature attributes More generally, A E 1 R 1 E 2 R 2 E 3 C Support-Set R 1 R 2 (A C) = SS R 1, R 2 (A C) = {t E 2 | a A (a,t) R 1, c C (c,t) R 2 } If E 2 has real labels, Label-Weighted-Support R 1 R 2 (A C) = LWS R 1 R 2 (A C) = t SS R 1 R 2 label(t) R1R1 11 11 l 2,2 l 2,3 E2E2 11 11 R3R3 E1E1 E3E3 A C SS R1R2
9
Customer 1 2 3 4 Item 7 6 5 4 3 2 t 1 6 5 4 3 Gene 1 1 1 Doc 1 2 3 4 Gene 1 1 3 Exp 1 1 1 1 1 1 1 1 1234 Author 1234 G 56 term 7 567 People 1 1 1 1 1 1 3 2 1 Doc 2345 PI People cust item card authordoc card termdoc card docdoc termterm card (share stem?) expgene card gene gene card (ppi) expPI card 5 6 16 ItemSet Supp(A) = CusFreq(ItemSet) gene gene card (ppi) RoloDex Model ItemSet antecedent 12345616 itemset itemset card Conf(A B) =Supp(A B)/Supp(A) movie 0000 02 00 3000 100 50 0 0 0 5 1 2 3 4 400 000 5 0 0 1 0 3 0 0 customer rates movie card 0000 00 00 0000 100 000 0 0 0 0 000 000 0 1 0 0 0 0 customer rates movie as 1 card 0000 00 00 0000 000 100 0 0 0 1 000 000 1 0 0 0 0 0 customer rates movie as 5 card......
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.