Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi.

Similar presentations


Presentation on theme: "1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi."— Presentation transcript:

1 1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A tutorial @ EDBT2000 Fosca Giannotti and Dino Pedreschi Pisa KDD Lab CNUCE-CNR & Univ. Pisa http://www-kdd.di.unipi.it/

2 EDBT2000 tutorial - Assoc 2 Market Basket Analysis: the context Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping basket” Customer1 Customer2Customer3 Milk, eggs, sugar, bread Milk, eggs, cereal, breadEggs, sugar

3 EDBT2000 tutorial - Assoc 3 Market Basket Analysis: the context Given: a database of customer transactions, where each transaction is a set of items y Find groups of items which are frequently purchased together

4 EDBT2000 tutorial - Assoc 4 Goal of MBA zExtract information on purchasing behavior zActionable information: can suggest ynew store layouts ynew product assortments ywhich products to put on promotion zMBA applicable whenever a customer purchases multiple things in proximity ycredit cards yservices of telecommunication companies ybanking services ymedical treatments

5 EDBT2000 tutorial - Assoc 5 MBA: applicable to many other contexts Telecommunication: Each customer is a transaction containing the set of customer’s phone calls Atmospheric phenomena: Each time interval (e.g. a day) is a transaction containing the set of observed event (rains, wind, etc.) Etc.

6 EDBT2000 tutorial - Assoc 6 Association Rules zExpress how product/services relate to each other, and tend to group together z“if a customer purchases three-way calling, then will also purchase call-waiting” zsimple to understand zactionable information: bundle three-way calling and call-waiting in a single package

7 EDBT2000 tutorial - Assoc 7 Useful, trivial, unexplicable zUseful: “On Thursdays, grocery store consumers often purchase diapers and beer together”. zTrivial: “Customers who purchase maintenance agreements are very likely to purchase large appliances”. zUnexplicable: “When a new hardaware store opens, one of the most sold items is toilet rings.”

8 EDBT2000 tutorial - Assoc 8 Basic Concepts Transaction : Relational formatCompact format Item: single element, Itemset: set of items Support of an itemset I: # of transaction containing I Minimum Support  : threshold for support Frequent Itemset : with support  . Frequent Itemsets represents set of items which are positively correlated

9 EDBT2000 tutorial - Assoc 9 Frequent Itemsets Support({dairy}) = 3 (75%) Support({fruit}) = 3 (75%) Support({dairy, fruit}) = 2 (50%) If  = 60%, then {dairy} and {fruit} are frequent while {dairy, fruit} is not.

10 EDBT2000 tutorial - Assoc 10 Association Rules: Measures +Let A and B be a partition of I : A  B [s, c] A and B are itemsets s = support of A  B = support(A  B) c = confidence of A  B = support(A  B)/support(A) + Measure for rules: + minimum support  + minimum confidence  +The rules holds if : s   and c  

11 EDBT2000 tutorial - Assoc 11 Association Rules: Meaning A  B [ s, c ] Support: denotes the frequency of the rule within transactions. A high value means that the rule involve a great part of database. support(A  B [ s, c ]) = p(A  B) Confidence: denotes the percentage of transactions containing A which contain also B. It is an estimation of conditioned probability. confidence(A  B [ s, c ]) = p(B|A) = p(A & B)/p(A).

12 EDBT2000 tutorial - Assoc 12 Association Rules - Example For rule A  C: support = support({A, C}) = 50% confidence = support({A, C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%

13 EDBT2000 tutorial - Assoc 13 References - Association rules zA. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. VLDB'95, 432-443, Zurich, Switzerland. zC. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB'98, 594-605, New York, NY. zR. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95, 407-419, Zurich, Switzerland. zR. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD'96, 1- 12, Montreal, Canada. zR. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97, 67-73, Newport Beach, California. zD. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining. SIGMOD'98, 1-12, Seattle, Washington. zB. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, 412-421, Orlando, FL. zR.J. Miller and Y. Yang. Association rules over interval data. SIGMOD'97, 452-461, Tucson, Arizona. zJ. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. ICDE'99, Sydney, Australia. zF. Giannotti, G. Manco, D. Pedreschi and F. Turini. Experiences with a logic-based knowledge discovery support environment. In Proc. 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD'99 DMKD). Philadelphia, May 1999. zF. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Integration of Deduction and Induction for Mining Supermarket Sales Data. In Proc. PADD'99, Practical Application of Data Discovery, Int. Conference, London, April 1999. zSunita Sarawagi, Shiby Thomas, Rakesh Agrawal: Integrating Mining with Relational Database Systems: Alternatives and Implications. SIGMOD Conference 1998: 343-354Sunita SarawagiShiby ThomasSIGMOD Conference 1998 zThis last paper illustrates the difficulty of implementing Apriori efficiently in a DBMS


Download ppt "1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi."

Similar presentations


Ads by Google