Presentation is loading. Please wait.

Presentation is loading. Please wait.

A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F) 0001 0010 0001 0100 1 2.

Similar presentations


Presentation on theme: "A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F) 0001 0010 0001 0100 1 2."— Presentation transcript:

1 A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F) 0001 0010 0001 0100 1 2 3 4 E F 2345 SRMs are categorized by the number of hops, k, whether transitive or non-transitive and by the focus entity. ARM is 1-hop, non-transitive (A,C  E), F-focused SRM (1nF) ct(& e  A R e &P C ) / ct(& e  A R e )  mncf ct(& e  A R e )  mnsp consequent upward closure: If A  C is non-confident, then so is A  D for all subsets, D, of C. So  frequent antecedent, A, use upward closure to mine for all of its' confident consequents. antecedent downward closure: If A is frequent, all of its subsets are frequent. Or, if A is infrequent, then so are all of its supersets. Since frequency involves only A, we can mine for all qualifying antecedents efficiently using downward closure. Transitive (a+c)-hop Apriori strong rule mining with a focus entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even then one can use downward/upward closure on that step in the mining of strong (frequent and confident) rules. In this case A is 1-hop from F (odd, use downward closure).C is 0-hops from F (even, use upward closure). We will be checking more examples to see if the Odd  downward Even  upward theorem seems to hold. 1-hop, transitive (A  E,C  F), F-focused SRM (1tF) 1-hop, transitive, E-focused rule, A  C SRM (1tE) ct(P A & f  C R f ) / ct(P A )  mncf |A|=ct(P A )  mnsp antecedent upward closure: If A is infrequent, then so are all of its subsets. consequent downward closure) If A  C is non-confident, then so is A  D for all supersets, D, of C. In this case A is 0-hops from E (even, use upward closure). C is 1-hop from E (odd, use downward closure). A  C strong if: ct(& e  A R e & g  C S g ) / ct(& e  A R e )  mncf ct(& e  A R e )  mnsp and 2-hop transitive F-focused S(F,G) R(E,F) 0001 0010 0001 0100 1001 0111 1000 1100 1 2 3 4 E F 2345 1 2 3 4 G A  CC Apriori for 2-hops: Find all freq antecedents, A, using downward closure.  find C1G, the set of g's s.t. A  {g} is confident. Find C2G, set of C1G pairs that are confident consequents for antecedent, A. Find C3G, set of triples (from C2G) s.t. all subpairs are in C2G (ala Apriori), etc. 1,1 odd so down, down correct. 2-hop trans G-foc  mncf ct(& f  list& e  A R e S f & P C ) / & f  list& e  A R e S f ct(& f  list& e  A R e S f )  mnsp 1. (antecedent upward closure) If A is infrequent, then so for are all subsets. 2. (consequent upward closure) If A  C non-conf, so is A  D for all subsets, D. 2,0 even so up,up is correct. 2-hop trans E-foc antecedent upward closure: If A is infrequent, so are all subsets. consequent upward closuree: If A  C non-conf so is A  D for all subsets, D. 0,2 even so up,up is correct.  mncf ct(P A & f  & g  C S g R f )/ ct(P A ) ct(P A )  mnsp  mncf ct(& f  & e  A R e S f & P C ) / & f  & e  A R e S f ct(& f  l& e  A R e S f )  mnsp A  C, is confident if a high fraction of the f  F which are related to every a  A, are also related to every c  C F is the Focus Entity and the high fraction is the MinimumConfidence ratio.

2 1 2 … 9 Term 123 D DTPe k=1..7 TDRolodexCd 1 2 … 7 Pos 123 D DTPe k=1..9 PDCd 1 2 … 7 Pos 12…9 T DTPe k=1..3 PTCd We can form multi-hop relationships from RoloDex cards. Does this open up a new area of text mining for the three DTP Rolodexes  Recall: A  C, is confident if a high fraction of the f  F which are related to every a  A, are also related to every c  C. F is the Focus Entity and the high fraction is the MinimumConfidence ratio. DT (P=k) DT (P=h) 000 001 000 100 011 100 3 … 1 D T 1…9 3 … 1 D AA CC A confident DThk rule means: A high fraction of the terms, t  T in Position=h of every doc  A, are also in Position=k of every doc  C. Is there a high payoff research area here? DP (T=k) DP (T=h) 000 001 000 100 011 100 3 … 1 D P 1…7 3 … 1 D AA CC A confident DPhk rule means: A high fraction of the Positions, p  P which hold Term=h for every doc  A, also hold Term=k in Pos=p for every doc  C. Is this a high payoff research area? TP (D=k) TP (D=h) 000 001 000 100 011 100 9 … 1 T P 1…7 9 … 1 T AA CC A confident TPhk rule means: A high fraction of the Positions, p  P in Doc=h which hold every Term, t  A, also hold every Term, t  C in Doc=k This only makes sense for A,C singleton Terms. Also it seems like P would have to be singleton? Is this a high payoff research area? TD (P=k) TD (P=h) 000 001 000 100 011 100 9 … 1 T D 1…3 9 … 1 T AA CC A confident TDhk rule means: A high fraction of the Documents, d  D having in Position=h, every Term, t  A, also have in Position=k, every Term, t  C. Again, A,C must be singletons. High payoff ? It suggests in 1-hop ARM: Looking for strong TD rules: A high fraction of the Documents, d  D having every Term, t  A, also have every Term, t  C. Again, A,C must be singletons. Is there a high payoff research area here? PD (T=k) PD (T=h) 000 001 000 100 011 100 7 … 1 P D 1…3 7 … 1 P AA CC A confident PDhk rule means: A high fraction of the Documents, d  D having Term=h in every Pos, p  A, also have Term=k in every Pos. p  C. High payoff ? PT (D=k) PT (D=h) 000 001 000 100 011 100 7 … 1 P T 1…9 7 … 1 P AA CC A confident PThk rule means: A high fraction of the Terms, t  T in Doc=h which occur at every Pos, p  A, also occur at every Pos, p  C in Doc=k Is this a high payoff research area?

3 More on forming multi-hop relationships from RoloDex cards. A  C, is confident if a high fraction of the f  F which are related to every a  A, are also related to every b  B. F is the Focus Entity and the high fraction is the MinimumConfidence ratio. Buys (Day=2) Buys (Day=1) 000 001 000 100 011 100 3 … 1 I C 1…9 3 … 1 I AA BB A confident Buy12 rule means: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2 Consider the Market Basket RoloDex (different Cust-Item card for each day) Buys (Day=k) 000 001 000 3 … 1 Cust 1…9 Item “Buys” pathways? Buys (Day=2) Buys (Day=1) 000 001 000 100 011 100 3 … 1 I C 1…9 3 … 1 I AA I A confident Buy123 pathway means: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2 And most of those customers will Buy all of D on Day=3 Buys (Day=3) 000 001 000 DCDC 1…9 Buys (Day=2) Buys (Day=1) 000 001 000 100 011 100 3 … 1 I C 1…9 3 … 1 I AA I A confident Buy1234 pathway means: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2, then most of those customers will Buy all of D on Day=3 And most of those customers will Buy all of E on Day=4 Buys (Day=3) 000 001 000 C 1…9 Buys (Day=4) 100 011 100 3 … 1 E  I

4 More on forming multi-hop relationships from RoloDex cards. A  C, is confident if a high fraction of f  F related to every a  A, are also related to every c  C. Consider the Protein-Protein Interaction RoloDex (different Gene-Gene card for each interaction involved in some pathway) Interaction=k 000 001 000 3 … 1 Gene 1…9 What is a biological pathway? A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in the cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move. How do biological pathways work? For your body to develop properly and stay healthy, many things must work together at many different levels - from organs to cells to genes. From both inside and outside the body, cells are constantly receiving chemical cues prompted by such things as injury, infection, stress or even the presence or lack of food. To react and adjust to these cues, cells send and receive signals through biological pathways. The molecules that make up biological pathways interact with signals, as well as with each other, to carry out their designated tasks. Biological pathways can act over short or long distances. For example, some cells send signals to nearby cells to repair localized damage, such as a scratch on a knee. Other cells produce substances, such as hormones, that travel through the blood to distant target cells. These biological pathways control a person's response to the world. For example, some pathways subtly affect how the body processes drugs, while others play a major role in how a fertilized egg develops into a baby. Other pathways maintain balance while a person is walking, control how and when the pupil in the eye opens or closes in response to light, and affect the skin's reaction to changing temperature. Biological pathways do not always work properly. When something goes wrong in a pathway, the result can be a disease such as cancer or diabetes. What are some types of biological pathways? There are many types of biological pathways. Among the most well- known are pathways involved in metabolism, in the regulation of genes and in the transmission of signals. Metabolic pathways make possible the chemical reactions that occur in our bodies. An example of a metabolic pathway is the process by which cells break down food into energy molecules that can be stored for later use. Other metabolic pathways actually help to build molecules. Gene-regulation pathways turn genes on and off. Such action is vital because genes provide the recipe by which cells produce proteins, which are the key components needed to carry out nearly every task in our bodies. Proteins make up our muscles and organs, help our bodies move and defend us against germs. Signal transduction pathways move a signal from a cell's exterior to its interior. Different cells are able to receive specific signals through structures on their surface called receptors. After interacting with these receptors, the signal travels into the cell, where its message is transmitted by specialized proteins that trigger a specific reaction in the cell. For example, a chemical signal from outside the cell might direct the cell to produce a particular protein inside the cell. In turn, that protein may be a signal that prompts the cell to move. What is a biological network? Researchers are learning that biological pathways are far more complicated than once thought. Most pathways do not start at point A and end at point B. In fact, many pathways have no real boundaries, and pathways often work together to accomplish tasks. When multiple biological pathways interact with each other, they form a biological network. How do researchers find biological pathways? Researchers have discovered many important biological pathways through laboratory studies of cultured cells, bacteria, fruit flies, mice and other organisms. Many of the pathways identified in these model systems are the same as, or are similar to, counterparts in humans. Still, many biological pathways remain to be discovered. It will take years of research to identify and understand the complex connections among all the molecules in all biological pathways, as well as to understand how these pathways work together.

5  Customer 1 2 3 4 Item 6 5 4 3 Gene 1 1 1 Doc 1 2 3 4 Gene 1 1 3 Exp 1 1 1 1 1 1 1 1 1234 Author 1234 G 56 term  7 567 People  1 1 1 1 1 1 3 2 1 Doc 2345 PI People  cust item card authordoc card termdoc card docdoc expgene card gene gene card (ppi) expPI card gene gene card (ppi) movie 0000 02 00 3000 100 50 0 0 0 5 1 2 3 4 400 000 5 0 0 1 0 3 0 0 customer rates movie card 0000 00 00 0000 000 100 0 0 0 1 000 000 1 0 0 0 0 0 customer rates movie as 5 card 4 3 2 1 Course Enroll ments 15 people 234 1 2 3 4 items 3 2 1 terms DataCube Model for 3 entities, items, people and terms. 7 6 5 4 3 2 t 1 termterm card (share stem?) Items: i 1 i 2 i 3 i 4 i 5 |0 001|0 |0 11| |1 001|0 |1 01| |2 010|1 |0 10| People: p 1 p 2 p 3 p 4 |0 100|A|M| |1 001|T|M| |2 010|S|F| |3 011|B|F| |4 100|C|M| Terms: t 1 t 2 t 3 t 4 t 5 t 6 |1 010|1 101|2 11| |2 001|0 000|3 11| |3 011|1 001|3 11| |4 011|3 001|0 00| Relationship: p 1 i 1 t 1 |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 |3 0| 2 |4 1| 2 |5 1|_2 Relational Model: 2345 PI RoloDex Model: 2 Entities many relationships One can form multi-hops with any of these cards. Are there any that provide and interesting setting for ARM data mining?

6 3-hop S(F,G) R(E,F) 0001 0010 0001 0100 1001 0111 1000 1100 1 2 3 4 E F 2345 1 2 3 4 G AA CC T(G,H) 0001 1010 0001 0101 H 2345 Collapse T: TC≡ {g  G|T(g,h)  h  C} That's just 2-hop case w TC  G replacing C. (  can be replaced by  or any other quantifier. The choice of quantifier should match that intended for C.). Collapse T and S: STC≡{f  F |S(f,g)  g  TC} Then it's 1-hop w STC replacing C. Focus on G  mncnf ct(& e  A R e & g  & h  C T h S g ) / ct(& e  A R e  mncnf &hCTh)&hCTh) ct(& f  & e  A R e S f / ct(& f  & e  A R e S f ) ct( 1001 & g=1,3,4 S g ) /ct(1001) ct( 1001 &1001&1000&1100) / 2 ct( 1000 ) / 2 = 1/2 Focus on F Are they different? Yes, because the confidences can be different numbers. Focus on G. ct(& e  A R e & g  list& h  C T h S g ) /ct(& e  A R e &hCTh)&hCTh)ct(& f  list& e  A R e S f / ct(& f  list& e  A R e S f ) ct(& f=2,5 S f &1101 ) / ct(& f=2,5 S f ct(1101 & 0011 &&1101 ) / ct(1101 & 0011 ) ct(0001 ) / ct(0001) = 1/1 =1  mnsup ct(& e  A R e  mnsp ct(& f  & e  A R e S f ) Focus on F antecedent downward closure: A infreq. implies supersets infreq. A 1-hop from F (down consequent upward closure: A  C noncnf implies A  D noncnf.  D  C. C 2-hops (up antecedent upward closure: A infreq. implies all subsets infreq. A 2-hop from G (up) consequent downward closure: A  C noncnf impl A  D noncnf.  D  C. C 1-hops (down) ct(P A & R f ) f&g&hCThSgf&g&hCThSg / ct(P A )  mncnf  mnsup ct(P A ) Focus on E antecedent upward closure: A infreq. implies subsets infreq. A 0-hops from E (up) consequent downward closure: A  C noncnf implies A  D noncnf.  D  C. C 3-hops (down) Focus on H antecedent downward closure: A infreq. implies all subsets infreq. A 3-hops from G (down) consequent upward closure: A  C noncnf impl A  D noncnf.  D  C. C 0-hops (up) ct(& T g & P C ) g  & f  & e  A R e S f  mncnf /ct(& T g ) g  & f  & e  A R e S f ct(& T g ) g  & f  & e  A R e S f  mnsp

7 4-hop S(F,G) R(E,F) 0001 0010 0001 0100 1001 0111 1000 1100 1 2 3 4 E F 2345 1 2 3 4 G AA CC T(G,H) 0001 1010 0001 0101 H 2345 U(H,I) 1001 0101 1000 1100 1 2 3 4 I Focus on G? Replace C by UC; A by RA as above (not different from 2 hop?) Focus on H (RA for A, use 3-hop) or focus on F (UC for C, use 3-hop). Another focus on G (the main way)  mncnf ct( & f  & e  A R e S f & h  & i  C U i T h ) / ct(& f  & e  A R e S f )  mnsup ct(& f  & e  A R e S f ) F=G=H=genes and S,T=gene-gene intereactions. More than 3, S 1,..., S n ?  & i  C U i ))+ (ct(S 1 (& e  A R e  mncnf / ( (ct(& e  A R e )) n * ct(& i  C U i ) )  & i  C U i ))+... ct(S 2 (& e  A R e  & i  C U i )) ) ct(S n (& e  A R e If the S cube can be implemented so counts can be can be made of the 3-rectangle in blue directly, calculation of confidence would be fast.... R(E,G) 0011 0011 0001 0100 1 2 3 4 E G 2345 AA 1 2 3 4 G S n (G,G) S 1 (G,G) 1001 0111 1000 1100 1001 0111 1000 1100 1001 0111 1000 1100 1001 0111 1000 1100 U(G,I) 1011 0111 1000 1100 CC I 2345 2. (consequent upward closure)If A  C is non-confident, then so is A  D for all subsets, D, of C (the "list" will be larger, so the AND over the list will produce fewer ones) So  frequent antecedent, A, use upward closure to mine out all confident consequents, C. 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 4-hop APRIORI focus on G:  mncnf ct(& f  & e  A R e S f &h&iCUiTh)&h&iCUiTh) / ct(& f  & e  A R e S f )  mnsup ct(& f  & e  A R e S f )

8 5-hop Focus on G:  mncnf ct( & f  & e  A R e S f & h  (& )U i T h ) / 2. (consequent downward closure)If A  C is non-confident, then so is A  D for all supersets, D, of C. So  frequent antecedent, A, use downward closure to mine out all confident consequents, C. 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 5-hop APRIORI focus on G: S(F,G) R(E,F) 0001 0010 0001 0100 1001 0111 1000 1100 1 2 3 4 E F 2345 1 2 3 4 G AA CC T(G,H) 0001 1010 0001 0101 H 2345 U(H,I) 1001 0101 1000 1100 1 2 3 4 I V(I,J) 0001 1010 0001 0101 J 2345 i(&jCVj)i(&jCVj) ct(& f  & e  A R e S f )  mnsup ct(& f  & e  A R e S f )

9 6-hop Focus on G:  mncnf ct( & h  (& )U i T h ) / 2. (consequent downward closure)If A  C is non-confident, then so is A  D for all supersets, D, of C. So  frequent antecedent, A, use downward closure to mine out all confident consequents, C. 1. (antecedent downward closure) If A is infrequent, then so are all of its supersetsbsets. Frequency involves only A, so mine all qualifying antecedents using downward closure. 6-hop APRIORI: i(&jCVj)i(&jCVj)  mnsup & f  (& )R e S f e(&dDQd)e(&dDQd) & f  (& )R e S f ) e  (& d  D Q d ) ct( & f  (& )R e S f ) e  (& d  D Q d ) ct( The conclusion we have demonstrated (but not proven) is: for (a+c)-hop transitive Apriori ARM with focus the entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even use downward/upward closure on that step in the mining of strong (frequent and confident) rules.

10 Given any 1-hop labeled relationship (e.g., cells have values from {1,2,…,n} then there is: 1. a natural n-hop transitive relationship, A implies D, by alternating entities for each specific label value relationship. 2. cards for each entity consisting of the bitslices of cell values. E.g., in netflix, Rating(Cust,Movie) has label set {0,1,2,3,4,5}, so in 1. it generates a bonafide 6-hop transitive relationship. In 2. an alternative is to bitmap each label value (rather than bitslicing them). Below R n-i can be bitslices or bitmaps E.g., equity trading on a given day, QuantityBought(Cust,Stock) w labels {0,1,2,3,4,5} (where n means n thousand shares) so that generates a bonafide 6-hop transitive relationship: E.g., equity trading - moved similarly, (define moved similarly on a day --> StockStock(#DaysMovedSimilarlyOfLast10) E.g., equity trading - moved similarly2, (define moved similarly to mean that stock2 moved similarly to what stock1 did the previous day.Define relationship StockStock(#DaysMovedSimilarlyOfLast10) E.g., Gene-Experiment, Label values could be "expression level". Intervalize and go! Has Strong Transitive Rule Mining (STRM) been done? Are their downward and upward closure theorems already for it? Is it useful? That is, are there good examples of use: stocks, gene-experiment, MBR, Netflix predictor,... R 0 (E,F) R n-2 (E,F) R n-1 (E,F) F 2345 1 2 3 4 E A  0001 0010 0001 0100 0001 0010 0001 0100 0001 0010 0001 0100 0001 0010 0001 0100 0001 0010 0001 0100... DD

11 Let Types be an entity which clusters Items (moves Items up the semantic hierarchy), E.g., in a store, Types might include; dairy, hardware, household, canned, snacks, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and l et D by a TypesSet which does not include TA. Then: A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D frequent might mean ct(& i  A BB i )  mnsp ct( | i  A BB i )  mnsp ct(& t  D B t )  mnsp ct( | t  D B t )  mnsp ct(& i  A BB i & t  D B t )  mnsp, etc. 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 Buys(C,T) BoughtBy(I,C,) 0001 0010 0001 0100 1001 0111 1000 1100 Items Customers 2345 1 2 3 4 Types (of Items) A  DD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ct(& i  A BB i & t  D B t ) / ct(& i  A BB i )  mncf A  D confident might mean ct(& i  A BB i | t  D B t ) / ct(& i  A BB i )  mncf ct( | i  A BB i | t  D B t ) / ct( | i  A BB i )  mncf ct( | i  A BB i & t  D B t ) / ct( | i  A BB i )  mncf

12 Let Types be an entity which clusters Items (moves Items up the semantic hierarchy), E.g., in a store, Types might include; dairy, hardware, household, canned, snacks, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and l et D by a TypesSet which does not include TA. Then: A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D frequent might mean ct(& i  A BB i )  mnsp ct( | i  A BB i )  mnsp ct(& t  D B t )  mnsp ct( | t  D B t )  mnsp ct(& i  A BB i & t  D B t )  mnsp, etc. 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 0001 Buys(C,T) BoughtBy(I,C,) 0001 0010 0001 0100 1001 0111 1000 1100 Items Customers 2345 1 2 3 4 Types (of Items) A  DD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ct(& i  A BB i & t  D B t ) / ct(& i  A BB i )  mncf A  D confident might mean ct(& i  A BB i | t  D B t ) / ct(& i  A BB i )  mncf ct( | i  A BB i | t  D B t ) / ct( | i  A BB i )  mncf ct( | i  A BB i & t  D B t ) / ct( | i  A BB i )  mncf

13 Text Mining using pTrees Pos 100 0010... Term buy DTPe in PpTreeSet index (T,D) Doc3 Doc2 Doc1 1 0 DTPe Position Table Pos T1D1 T1D2 T1D3...T9D1…T9D3 1 1 0 1... 0 … 0 7 0 … 0... 1 … 1...... 1234 56 7 3 2 1.Doc are April apple and an always. all AAPL buy... Term 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 1 0 0 000 000 010 000 100 001 DTPe Data Cube 1 2 … 9 Term 123 D TDcard P=k k=1..7 DTPe k=1..7 TDRolodexCd 1 2 … 7 Pos 123 D PDcard T=k k=1..9 DTPe k=1..9 PDCd 1 2 … 7 Pos 12…9 T PT card D=k k=1,2,3 DTPe k=1..3 PTCd DTPe Document Table: Doc T1P1…T1P7... T9P1…T9P7 1 1 … 0... 0 … 0 2 0 … 0... 1 … 0 3 0 … 0... 1 … 1 Classical Document Table: Doc Auth… Date...Subj1 …Subjm 1 1 1/2/13... 0 … 0 2 0 2/2/15... 1 … 0 3 0 3/3/14... 1 … 1 Classical DocTbl DpTreeSet 1 Auth Date 0 Subj1 0 Subjm DTPe Term Table: Term P1D1 P1D2 P1D3...P7D1…P7D3 1 1 0 1... 0 … 0 9 0 … 0... 1 … 1...... DTPe Term Usage Table: Term P1D1 P1D2 P1D3...P7D1…P7D3 1 noun verb adj adv …noun 9 adj noun noun adj noun...... Doc3 Doc2 Doc1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 DTPe TpTreeSet index (D,P) P ositions 1 2 … 0 0 0 0 0 1 0 0 1 P1D1 noun 1 0 0 0 0 0 0 0 0 P1D1 adj tf is the +rollup of the DTPe datacube along the position dimension. One can use any measurement or data structure of measurements, e.g., DT tfidf in which each cell has a decimal tfidf, which can be bitsliced directly into whole number bitslices plus fractional bitslices (one for each binary digit to the right of the binary point-no need to shift!) using: MOD(INT(x/(2k),2), e.g., a tfidf =3.5 is k: 3 2 1 0 -1 -2 bit: 0 0 1 1 1 0 3 2 1.Docs are April apple and an always. all AAPL buy Terms 0 0 1 2 0 0 0 1 2 DTtf DocTerm termfreq Data Cube DT tfidf Doc Table: Doc T1 T2... T9 1.75 0... 1 2 0 1.25 3 0 0 0 DT tfidf DpTreeSet 0 T1k1 01 T1k0 T1k-1 T1k-2 1 Rating of T=stock at doc date close: 1=sell, 2=hold,3=buy 0=non-stock Term 3 2 1.Docs are April apple and an always. all $AAPL buy Terms 0 0 0 0 0 0 0 3 0 DT SR DocTerm StockRating Cube DT SR bitslice DpTreeSet 1 T2k2 1 T2k1 DT SR bitmap DpTreeSet 1 T2,R=buy 00 T2,R=hold T2,R=sell


Download ppt "A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F) 0001 0010 0001 0100 1 2."

Similar presentations


Ads by Google