A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F) 0001 0010 0001 0100 1 2.

Slides:



Advertisements
Similar presentations
Biochemistry Enzymes.
Advertisements

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Mutations Georgia Standard:
RoloDex Model The Data Cube Model gives a great picture of relationships, but can become gigantic (instances are bitmapped rather than listed, so there.
Considers the operation of specific organ systems
Selection Sort, Insertion Sort, Bubble, & Shellsort
7: Metabolism and enzymes
9.2 Maintaining a Balance The Role of Enzymes NEXT For Revision click here.
Topic 1: It’s My Body Part 2: Nervous System.
Chapter 28: Human Systems and Homeostasis
Basic Life processes (certain processes that distinguish organisms (living things) from non-living things Metabolism (the sum of all the chemical processes.
30.1 Organization of the Human Body
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Toward a Unified Theory of Data Mining DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Assume a Partition has uniquely labeled components.
What our body needs and why
Gene Expression Cells use information in genes to build hundreds of different proteins, each with a specific function. But, not all proteins are required.
What You Will Do Identify the two categories of vitamins and foods that provide them. List and describe the major minerals and their role in nutrition.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Aim: What are the characteristics of living things? Topic: Conditions for life.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Cell Size Limitations.
Chemical Reactions and Enzymes 2-4. Chemical Reactions Process that changes one set of chemicals into another set of chemicals Process that changes one.
Cells at Work. 3.1 Enzymes make life possible Most reactions that take place in the cell are carried out with the help of enzymes. (Organic catalysts)
Query and Analysis on the document and customer/item bag card of the DataDex Kellie Erickson.
Proficiency Review Biology.
What do we need to survive?.  To maintain boundaries – we need to check our insides in and our outsides out.  Movement – we need to move around, but.
Cell Differentiation & Organization of the Human Body Ch 10.4 & 30.1 (M)
HOMEOSTASIS Staying within limits Limits Staying Limits Keeping enzymes happy Maintainin g a balance.
What is metabolism? - All the chemical reactions within the cell.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
1234 G Exp G So as not to duplicate axes, this copy of G should be folded over to coincide with the other copy, producing a "conical" unipartite.
Nutrition and Physical Activity In this chapter, you will Learn About… Why your body needs nutrients. How to follow a balanced eating plan. Making healthful.
Graph Path Analytics (using pTrees)
COOPERATION MAKES IT HAPPEN Homeostasis. What is homeostasis? The ability of all living things – plants, animals, even bacteria – to maintain stable internal.
Cell metabolism. Metabolism encompasses the integrated and controlled pathways of enzyme catalysed reactions within a cell Metabolism The word “metabolism”
Nervous System.
Elsayed Hemayed Data Mining Course
6 Essential Nutrients Vitamins Carbohydrates Minerals Fat Protein Water.
document 2345 course Text person EnrollEnroll Buy MYRRH ManY-Relationship-Rule Harvester.
Organization of Living Things. Overview of Organization of the Human Body Many people have compared the human body to a machine. Each machine consists.
Carbohydrates, Fats, and Proteins
Matter and Change Chapter Eleven: The Chemistry of Living Systems 11.1 The Chemistry of Carbon 11.2 Protein, Fats, and Nucleic Acids.
Chapter 5 Staying Active and Managing Your Weight
Organs Lesson 4. Nick Nutri showing photos of the progression of cells, tissues, and ?: In the last read-aloud you learned about cells and tissues. Similar.
Organization of Cells. A branch of biology and medicine that considers the structure of living things Human anatomy focuses on the human body.
S3 FOODS & NUTRITION METABOLISM. WHAT IS METABOLISM? Metabolism is a collection of chemical reactions that take place in the body's cells. Metabolism.
Unit 2 Lesson 3 Nutrition and Fitness
Path Analytics (using pTrees)
30.1 Organization of the Human Body
pTrees predicate Tree technologies
Cell Communication: A Review
Toward a Unified Theory of Data Mining DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH Assume a Partition has uniquely.
DUALITIES: PARTITION FUNCTION EQUIVALENCE RELATION UNDIRECTED GRAPH
The vertex-labelled, edge-labelled graph
MYRRH A hop is a relationship, R, hopping from one entity, E, to another entity, F. Strong Rule Mining (SRM) finds all frequent and confident rules, AC.
William Norris Professor and Head, Department of Computer Science
pTrees predicate Tree technologies
Graph Path Analytics (using pTrees)
Chemical Reactions A chemical reaction breaks down some substances and builds other substances 2H2 + O > 2H2O Chemical reactions can occur when.
Chemical Reactions A chemical reaction breaks down some substances and builds other substances 2H2 + O > 2H2O Chemical reactions can occur when.
prove that it is an addition (it is a nudge worth reading about).
The Multi-hop closure theorem for the Rolodex Model using pTrees
Chapter 28: Human Systems and Homeostasis
What You Will Do Identify the two categories of vitamins and foods that provide them. List and describe the major minerals and their role in nutrition.
Chapter 28: Human Systems and Homeostasis
Chapter 28: Human Systems and Homeostasis
PARTIALLY ORDERED SET DIRECTED ACYCLIC GRAPH
Presentation transcript:

A hop is a relationship, R, hopping from entity, E, to entity, F. Strong Rule Mining finds all frequent, confident rules R(E,F) E F 2345 SRMs are categorized by the number of hops, k, whether transitive or non-transitive and by the focus entity. ARM is 1-hop, non-transitive (A,C  E), F-focused SRM (1nF) ct(& e  A R e &P C ) / ct(& e  A R e )  mncf ct(& e  A R e )  mnsp consequent upward closure: If A  C is non-confident, then so is A  D for all subsets, D, of C. So  frequent antecedent, A, use upward closure to mine for all of its' confident consequents. antecedent downward closure: If A is frequent, all of its subsets are frequent. Or, if A is infrequent, then so are all of its supersets. Since frequency involves only A, we can mine for all qualifying antecedents efficiently using downward closure. Transitive (a+c)-hop Apriori strong rule mining with a focus entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even then one can use downward/upward closure on that step in the mining of strong (frequent and confident) rules. In this case A is 1-hop from F (odd, use downward closure).C is 0-hops from F (even, use upward closure). We will be checking more examples to see if the Odd  downward Even  upward theorem seems to hold. 1-hop, transitive (A  E,C  F), F-focused SRM (1tF) 1-hop, transitive, E-focused rule, A  C SRM (1tE) ct(P A & f  C R f ) / ct(P A )  mncf |A|=ct(P A )  mnsp antecedent upward closure: If A is infrequent, then so are all of its subsets. consequent downward closure) If A  C is non-confident, then so is A  D for all supersets, D, of C. In this case A is 0-hops from E (even, use upward closure). C is 1-hop from E (odd, use downward closure). A  C strong if: ct(& e  A R e & g  C S g ) / ct(& e  A R e )  mncf ct(& e  A R e )  mnsp and 2-hop transitive F-focused S(F,G) R(E,F) E F G A  CC Apriori for 2-hops: Find all freq antecedents, A, using downward closure.  find C1G, the set of g's s.t. A  {g} is confident. Find C2G, set of C1G pairs that are confident consequents for antecedent, A. Find C3G, set of triples (from C2G) s.t. all subpairs are in C2G (ala Apriori), etc. 1,1 odd so down, down correct. 2-hop trans G-foc  mncf ct(& f  list& e  A R e S f & P C ) / & f  list& e  A R e S f ct(& f  list& e  A R e S f )  mnsp 1. (antecedent upward closure) If A is infrequent, then so for are all subsets. 2. (consequent upward closure) If A  C non-conf, so is A  D for all subsets, D. 2,0 even so up,up is correct. 2-hop trans E-foc antecedent upward closure: If A is infrequent, so are all subsets. consequent upward closuree: If A  C non-conf so is A  D for all subsets, D. 0,2 even so up,up is correct.  mncf ct(P A & f  & g  C S g R f )/ ct(P A ) ct(P A )  mnsp  mncf ct(& f  & e  A R e S f & P C ) / & f  & e  A R e S f ct(& f  l& e  A R e S f )  mnsp A  C, is confident if a high fraction of the f  F which are related to every a  A, are also related to every c  C F is the Focus Entity and the high fraction is the MinimumConfidence ratio.

1 2 … 9 Term 123 D DTPe k=1..7 TDRolodexCd 1 2 … 7 Pos 123 D DTPe k=1..9 PDCd 1 2 … 7 Pos 12…9 T DTPe k=1..3 PTCd We can form multi-hop relationships from RoloDex cards. Does this open up a new area of text mining for the three DTP Rolodexes  Recall: A  C, is confident if a high fraction of the f  F which are related to every a  A, are also related to every c  C. F is the Focus Entity and the high fraction is the MinimumConfidence ratio. DT (P=k) DT (P=h) … 1 D T 1…9 3 … 1 D AA CC A confident DThk rule means: A high fraction of the terms, t  T in Position=h of every doc  A, are also in Position=k of every doc  C. Is there a high payoff research area here? DP (T=k) DP (T=h) … 1 D P 1…7 3 … 1 D AA CC A confident DPhk rule means: A high fraction of the Positions, p  P which hold Term=h for every doc  A, also hold Term=k in Pos=p for every doc  C. Is this a high payoff research area? TP (D=k) TP (D=h) … 1 T P 1…7 9 … 1 T AA CC A confident TPhk rule means: A high fraction of the Positions, p  P in Doc=h which hold every Term, t  A, also hold every Term, t  C in Doc=k This only makes sense for A,C singleton Terms. Also it seems like P would have to be singleton? Is this a high payoff research area? TD (P=k) TD (P=h) … 1 T D 1…3 9 … 1 T AA CC A confident TDhk rule means: A high fraction of the Documents, d  D having in Position=h, every Term, t  A, also have in Position=k, every Term, t  C. Again, A,C must be singletons. High payoff ? It suggests in 1-hop ARM: Looking for strong TD rules: A high fraction of the Documents, d  D having every Term, t  A, also have every Term, t  C. Again, A,C must be singletons. Is there a high payoff research area here? PD (T=k) PD (T=h) … 1 P D 1…3 7 … 1 P AA CC A confident PDhk rule means: A high fraction of the Documents, d  D having Term=h in every Pos, p  A, also have Term=k in every Pos. p  C. High payoff ? PT (D=k) PT (D=h) … 1 P T 1…9 7 … 1 P AA CC A confident PThk rule means: A high fraction of the Terms, t  T in Doc=h which occur at every Pos, p  A, also occur at every Pos, p  C in Doc=k Is this a high payoff research area?

More on forming multi-hop relationships from RoloDex cards. A  C, is confident if a high fraction of the f  F which are related to every a  A, are also related to every b  B. F is the Focus Entity and the high fraction is the MinimumConfidence ratio. Buys (Day=2) Buys (Day=1) … 1 I C 1…9 3 … 1 I AA BB A confident Buy12 rule means: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2 Consider the Market Basket RoloDex (different Cust-Item card for each day) Buys (Day=k) … 1 Cust 1…9 Item “Buys” pathways? Buys (Day=2) Buys (Day=1) … 1 I C 1…9 3 … 1 I AA I A confident Buy123 pathway means: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2 And most of those customers will Buy all of D on Day=3 Buys (Day=3) DCDC 1…9 Buys (Day=2) Buys (Day=1) … 1 I C 1…9 3 … 1 I AA I A confident Buy1234 pathway means: Some customers Buys all of A on Day=1, then most of those customers will Buy all of B on Day=2, then most of those customers will Buy all of D on Day=3 And most of those customers will Buy all of E on Day=4 Buys (Day=3) C 1…9 Buys (Day=4) … 1 E  I

More on forming multi-hop relationships from RoloDex cards. A  C, is confident if a high fraction of f  F related to every a  A, are also related to every c  C. Consider the Protein-Protein Interaction RoloDex (different Gene-Gene card for each interaction involved in some pathway) Interaction=k … 1 Gene 1…9 What is a biological pathway? A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in the cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on and off, or spur a cell to move. How do biological pathways work? For your body to develop properly and stay healthy, many things must work together at many different levels - from organs to cells to genes. From both inside and outside the body, cells are constantly receiving chemical cues prompted by such things as injury, infection, stress or even the presence or lack of food. To react and adjust to these cues, cells send and receive signals through biological pathways. The molecules that make up biological pathways interact with signals, as well as with each other, to carry out their designated tasks. Biological pathways can act over short or long distances. For example, some cells send signals to nearby cells to repair localized damage, such as a scratch on a knee. Other cells produce substances, such as hormones, that travel through the blood to distant target cells. These biological pathways control a person's response to the world. For example, some pathways subtly affect how the body processes drugs, while others play a major role in how a fertilized egg develops into a baby. Other pathways maintain balance while a person is walking, control how and when the pupil in the eye opens or closes in response to light, and affect the skin's reaction to changing temperature. Biological pathways do not always work properly. When something goes wrong in a pathway, the result can be a disease such as cancer or diabetes. What are some types of biological pathways? There are many types of biological pathways. Among the most well- known are pathways involved in metabolism, in the regulation of genes and in the transmission of signals. Metabolic pathways make possible the chemical reactions that occur in our bodies. An example of a metabolic pathway is the process by which cells break down food into energy molecules that can be stored for later use. Other metabolic pathways actually help to build molecules. Gene-regulation pathways turn genes on and off. Such action is vital because genes provide the recipe by which cells produce proteins, which are the key components needed to carry out nearly every task in our bodies. Proteins make up our muscles and organs, help our bodies move and defend us against germs. Signal transduction pathways move a signal from a cell's exterior to its interior. Different cells are able to receive specific signals through structures on their surface called receptors. After interacting with these receptors, the signal travels into the cell, where its message is transmitted by specialized proteins that trigger a specific reaction in the cell. For example, a chemical signal from outside the cell might direct the cell to produce a particular protein inside the cell. In turn, that protein may be a signal that prompts the cell to move. What is a biological network? Researchers are learning that biological pathways are far more complicated than once thought. Most pathways do not start at point A and end at point B. In fact, many pathways have no real boundaries, and pathways often work together to accomplish tasks. When multiple biological pathways interact with each other, they form a biological network. How do researchers find biological pathways? Researchers have discovered many important biological pathways through laboratory studies of cultured cells, bacteria, fruit flies, mice and other organisms. Many of the pathways identified in these model systems are the same as, or are similar to, counterparts in humans. Still, many biological pathways remain to be discovered. It will take years of research to identify and understand the complex connections among all the molecules in all biological pathways, as well as to understand how these pathways work together.

 Customer Item Gene Doc Gene Exp Author 1234 G 56 term  People  Doc 2345 PI People  cust item card authordoc card termdoc card docdoc expgene card gene gene card (ppi) expPI card gene gene card (ppi) movie customer rates movie card customer rates movie as 5 card Course Enroll ments 15 people items terms DataCube Model for 3 entities, items, people and terms t 1 termterm card (share stem?) Items: i 1 i 2 i 3 i 4 i 5 |0 001|0 |0 11| |1 001|0 |1 01| |2 010|1 |0 10| People: p 1 p 2 p 3 p 4 |0 100|A|M| |1 001|T|M| |2 010|S|F| |3 011|B|F| |4 100|C|M| Terms: t 1 t 2 t 3 t 4 t 5 t 6 |1 010|1 101|2 11| |2 001|0 000|3 11| |3 011|1 001|3 11| |4 011|3 001|0 00| Relationship: p 1 i 1 t 1 |0 0| 1 |0 1| 1 |1 0| 1 |2 0| 2 |3 0| 2 |4 1| 2 |5 1|_2 Relational Model: 2345 PI RoloDex Model: 2 Entities many relationships One can form multi-hops with any of these cards. Are there any that provide and interesting setting for ARM data mining?

3-hop S(F,G) R(E,F) E F G AA CC T(G,H) H 2345 Collapse T: TC≡ {g  G|T(g,h)  h  C} That's just 2-hop case w TC  G replacing C. (  can be replaced by  or any other quantifier. The choice of quantifier should match that intended for C.). Collapse T and S: STC≡{f  F |S(f,g)  g  TC} Then it's 1-hop w STC replacing C. Focus on G  mncnf ct(& e  A R e & g  & h  C T h S g ) / ct(& e  A R e  mncnf &hCTh)&hCTh) ct(& f  & e  A R e S f / ct(& f  & e  A R e S f ) ct( 1001 & g=1,3,4 S g ) /ct(1001) ct( 1001 &1001&1000&1100) / 2 ct( 1000 ) / 2 = 1/2 Focus on F Are they different? Yes, because the confidences can be different numbers. Focus on G. ct(& e  A R e & g  list& h  C T h S g ) /ct(& e  A R e &hCTh)&hCTh)ct(& f  list& e  A R e S f / ct(& f  list& e  A R e S f ) ct(& f=2,5 S f &1101 ) / ct(& f=2,5 S f ct(1101 & 0011 &&1101 ) / ct(1101 & 0011 ) ct(0001 ) / ct(0001) = 1/1 =1  mnsup ct(& e  A R e  mnsp ct(& f  & e  A R e S f ) Focus on F antecedent downward closure: A infreq. implies supersets infreq. A 1-hop from F (down consequent upward closure: A  C noncnf implies A  D noncnf.  D  C. C 2-hops (up antecedent upward closure: A infreq. implies all subsets infreq. A 2-hop from G (up) consequent downward closure: A  C noncnf impl A  D noncnf.  D  C. C 1-hops (down) ct(P A & R f ) f&g&hCThSgf&g&hCThSg / ct(P A )  mncnf  mnsup ct(P A ) Focus on E antecedent upward closure: A infreq. implies subsets infreq. A 0-hops from E (up) consequent downward closure: A  C noncnf implies A  D noncnf.  D  C. C 3-hops (down) Focus on H antecedent downward closure: A infreq. implies all subsets infreq. A 3-hops from G (down) consequent upward closure: A  C noncnf impl A  D noncnf.  D  C. C 0-hops (up) ct(& T g & P C ) g  & f  & e  A R e S f  mncnf /ct(& T g ) g  & f  & e  A R e S f ct(& T g ) g  & f  & e  A R e S f  mnsp

4-hop S(F,G) R(E,F) E F G AA CC T(G,H) H 2345 U(H,I) I Focus on G? Replace C by UC; A by RA as above (not different from 2 hop?) Focus on H (RA for A, use 3-hop) or focus on F (UC for C, use 3-hop). Another focus on G (the main way)  mncnf ct( & f  & e  A R e S f & h  & i  C U i T h ) / ct(& f  & e  A R e S f )  mnsup ct(& f  & e  A R e S f ) F=G=H=genes and S,T=gene-gene intereactions. More than 3, S 1,..., S n ?  & i  C U i ))+ (ct(S 1 (& e  A R e  mncnf / ( (ct(& e  A R e )) n * ct(& i  C U i ) )  & i  C U i ))+... ct(S 2 (& e  A R e  & i  C U i )) ) ct(S n (& e  A R e If the S cube can be implemented so counts can be can be made of the 3-rectangle in blue directly, calculation of confidence would be fast.... R(E,G) E G 2345 AA G S n (G,G) S 1 (G,G) U(G,I) CC I (consequent upward closure)If A  C is non-confident, then so is A  D for all subsets, D, of C (the "list" will be larger, so the AND over the list will produce fewer ones) So  frequent antecedent, A, use upward closure to mine out all confident consequents, C. 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 4-hop APRIORI focus on G:  mncnf ct(& f  & e  A R e S f &h&iCUiTh)&h&iCUiTh) / ct(& f  & e  A R e S f )  mnsup ct(& f  & e  A R e S f )

5-hop Focus on G:  mncnf ct( & f  & e  A R e S f & h  (& )U i T h ) / 2. (consequent downward closure)If A  C is non-confident, then so is A  D for all supersets, D, of C. So  frequent antecedent, A, use downward closure to mine out all confident consequents, C. 1. (antecedent upward closure) If A is infrequent, then so are all of its subsets (the "list" will be larger, so the AND over the list will produce fewer ones) Frequency involves only A, so mine all qualifying antecedents using upward closure. 5-hop APRIORI focus on G: S(F,G) R(E,F) E F G AA CC T(G,H) H 2345 U(H,I) I V(I,J) J 2345 i(&jCVj)i(&jCVj) ct(& f  & e  A R e S f )  mnsup ct(& f  & e  A R e S f )

6-hop Focus on G:  mncnf ct( & h  (& )U i T h ) / 2. (consequent downward closure)If A  C is non-confident, then so is A  D for all supersets, D, of C. So  frequent antecedent, A, use downward closure to mine out all confident consequents, C. 1. (antecedent downward closure) If A is infrequent, then so are all of its supersetsbsets. Frequency involves only A, so mine all qualifying antecedents using downward closure. 6-hop APRIORI: i(&jCVj)i(&jCVj)  mnsup & f  (& )R e S f e(&dDQd)e(&dDQd) & f  (& )R e S f ) e  (& d  D Q d ) ct( & f  (& )R e S f ) e  (& d  D Q d ) ct( The conclusion we have demonstrated (but not proven) is: for (a+c)-hop transitive Apriori ARM with focus the entity which is a hops from the antecedent and c hops from the consequent, if a/c is odd/even use downward/upward closure on that step in the mining of strong (frequent and confident) rules.

Given any 1-hop labeled relationship (e.g., cells have values from {1,2,…,n} then there is: 1. a natural n-hop transitive relationship, A implies D, by alternating entities for each specific label value relationship. 2. cards for each entity consisting of the bitslices of cell values. E.g., in netflix, Rating(Cust,Movie) has label set {0,1,2,3,4,5}, so in 1. it generates a bonafide 6-hop transitive relationship. In 2. an alternative is to bitmap each label value (rather than bitslicing them). Below R n-i can be bitslices or bitmaps E.g., equity trading on a given day, QuantityBought(Cust,Stock) w labels {0,1,2,3,4,5} (where n means n thousand shares) so that generates a bonafide 6-hop transitive relationship: E.g., equity trading - moved similarly, (define moved similarly on a day --> StockStock(#DaysMovedSimilarlyOfLast10) E.g., equity trading - moved similarly2, (define moved similarly to mean that stock2 moved similarly to what stock1 did the previous day.Define relationship StockStock(#DaysMovedSimilarlyOfLast10) E.g., Gene-Experiment, Label values could be "expression level". Intervalize and go! Has Strong Transitive Rule Mining (STRM) been done? Are their downward and upward closure theorems already for it? Is it useful? That is, are there good examples of use: stocks, gene-experiment, MBR, Netflix predictor,... R 0 (E,F) R n-2 (E,F) R n-1 (E,F) F E A  DD

Let Types be an entity which clusters Items (moves Items up the semantic hierarchy), E.g., in a store, Types might include; dairy, hardware, household, canned, snacks, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and l et D by a TypesSet which does not include TA. Then: A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D frequent might mean ct(& i  A BB i )  mnsp ct( | i  A BB i )  mnsp ct(& t  D B t )  mnsp ct( | t  D B t )  mnsp ct(& i  A BB i & t  D B t )  mnsp, etc Buys(C,T) BoughtBy(I,C,) Items Customers Types (of Items) A  DD ct(& i  A BB i & t  D B t ) / ct(& i  A BB i )  mncf A  D confident might mean ct(& i  A BB i | t  D B t ) / ct(& i  A BB i )  mncf ct( | i  A BB i | t  D B t ) / ct( | i  A BB i )  mncf ct( | i  A BB i & t  D B t ) / ct( | i  A BB i )  mncf

Let Types be an entity which clusters Items (moves Items up the semantic hierarchy), E.g., in a store, Types might include; dairy, hardware, household, canned, snacks, baking, meats, produce, bakery, automotive, electronics, toddler, boys, girls, women, men, pharmacy, garden, toys, farm). Let A be an ItemSet wholly of one Type, TA, and l et D by a TypesSet which does not include TA. Then: A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D might mean If  i  A s.t. BB(i,c) then  t  T, B(c,t) A  D frequent might mean ct(& i  A BB i )  mnsp ct( | i  A BB i )  mnsp ct(& t  D B t )  mnsp ct( | t  D B t )  mnsp ct(& i  A BB i & t  D B t )  mnsp, etc Buys(C,T) BoughtBy(I,C,) Items Customers Types (of Items) A  DD ct(& i  A BB i & t  D B t ) / ct(& i  A BB i )  mncf A  D confident might mean ct(& i  A BB i | t  D B t ) / ct(& i  A BB i )  mncf ct( | i  A BB i | t  D B t ) / ct( | i  A BB i )  mncf ct( | i  A BB i & t  D B t ) / ct( | i  A BB i )  mncf

Text Mining using pTrees Pos Term buy DTPe in PpTreeSet index (T,D) Doc3 Doc2 Doc1 1 0 DTPe Position Table Pos T1D1 T1D2 T1D3...T9D1…T9D … … … Doc are April apple and an always. all AAPL buy... Term DTPe Data Cube 1 2 … 9 Term 123 D TDcard P=k k=1..7 DTPe k=1..7 TDRolodexCd 1 2 … 7 Pos 123 D PDcard T=k k=1..9 DTPe k=1..9 PDCd 1 2 … 7 Pos 12…9 T PT card D=k k=1,2,3 DTPe k=1..3 PTCd DTPe Document Table: Doc T1P1…T1P7... T9P1…T9P7 1 1 … … … … … … 1 Classical Document Table: Doc Auth… Date...Subj1 …Subjm 1 1 1/2/ … /2/ … /3/ … 1 Classical DocTbl DpTreeSet 1 Auth Date 0 Subj1 0 Subjm DTPe Term Table: Term P1D1 P1D2 P1D3...P7D1…P7D … … … DTPe Term Usage Table: Term P1D1 P1D2 P1D3...P7D1…P7D3 1 noun verb adj adv …noun 9 adj noun noun adj noun Doc3 Doc2 Doc DTPe TpTreeSet index (D,P) P ositions 1 2 … P1D1 noun P1D1 adj tf is the +rollup of the DTPe datacube along the position dimension. One can use any measurement or data structure of measurements, e.g., DT tfidf in which each cell has a decimal tfidf, which can be bitsliced directly into whole number bitslices plus fractional bitslices (one for each binary digit to the right of the binary point-no need to shift!) using: MOD(INT(x/(2k),2), e.g., a tfidf =3.5 is k: bit: Docs are April apple and an always. all AAPL buy Terms DTtf DocTerm termfreq Data Cube DT tfidf Doc Table: Doc T1 T2... T DT tfidf DpTreeSet 0 T1k1 01 T1k0 T1k-1 T1k-2 1 Rating of T=stock at doc date close: 1=sell, 2=hold,3=buy 0=non-stock Term Docs are April apple and an always. all $AAPL buy Terms DT SR DocTerm StockRating Cube DT SR bitslice DpTreeSet 1 T2k2 1 T2k1 DT SR bitmap DpTreeSet 1 T2,R=buy 00 T2,R=hold T2,R=sell