1 Knowledge discovery & data mining Association rules and market basket analysis A EDBT2000 Fosca Giannotti and Dino Pedreschi Pisa KDD Lab.

Slides:



Advertisements
Similar presentations
Association Rules Mining
Advertisements

CSE 634 Data Mining Techniques
Data Mining Techniques Association Rule
Mining Multiple-level Association Rules in Large Databases
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules Mining Part III. Multiple-Level Association Rules Items often form hierarchy. Items at the lower level are expected to have lower support.
Mining Association Rules in Large Databases
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
Fast Algorithms for Association Rule Mining
1 Association Rules & Correlations zBasic concepts zEfficient and scalable frequent itemset mining methods: yApriori, and improvements yFP-growth zRule.
Mining Association Rules
Mining Association Rules
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Performance and Scalability: Apriori Implementation.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) Warsaw University of Technology.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Ch5 Mining Frequent Patterns, Associations, and Correlations
©Jiawei Han and Micheline Kamber
October 6, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 6 — ©Jiawei Han and Micheline.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 Data Mining and Warehousing: Session 6 Association Analysis Jia-wei Han
1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
Han: Association Rule Mining; modified & extended by Ch. Eick 1 Association Rule Mining — Slides for Textbook — — Chapter 6 — ©Jiawei Han and Micheline.
Mining various kinds of Association Rules
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
UNIT-5 Mining Association Rules in Large Databases LectureTopic ********************************************** Lecture-27Association rule mining Lecture-28Mining.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
1 Knowledge discovery & data mining Association rules and market basket analysis --introduction UCLA CS240A Course Notes* __________________________ *
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Overview Definition of Apriori Algorithm
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Find information from data data ? information.
Jian Pei and Runying Mao (Simon Fraser University)
UNIT-5 Mining Association Rules in Large Databases
Predictive Analytics in SQL and Datalog
Association rule mining
Mining Association Rules
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Association Rules.
Association Rules Zbigniew W. Ras*,#) presented by
©Jiawei Han and Micheline Kamber
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
©Jiawei Han and Micheline Kamber
Presentation transcript:

1 Knowledge discovery & data mining Association rules and market basket analysis A EDBT2000 Fosca Giannotti and Dino Pedreschi Pisa KDD Lab CNUCE-CNR & Univ. Pisa

EDBT2000 tutorial - Assoc 2 Association rules - module outline 1.What are association rules (AR) and what are they used for: 1.The paradigmatic application: Market Basket Analysis 2.The single dimensional AR (intra-attribute) 2.How to compute AR 1.Basic Apriori Algorithm and its optimizations 2.Multi-Dimension AR (inter-attribute) 3.Quantitative AR 4.Constrained AR 3.How to reason on AR and how to evaluate their quality 1.Multiple-level AR 2.Interestingness 3.Correlation vs. Association

EDBT2000 tutorial - Assoc 3 Market Basket Analysis: the context Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping basket” Customer1 Customer2Customer3 Milk, eggs, sugar, bread Milk, eggs, cereal, breadEggs, sugar

EDBT2000 tutorial - Assoc 4 Market Basket Analysis: the context Given: a database of customer transactions, where each transaction is a set of items y Find groups of items which are frequently purchased together

EDBT2000 tutorial - Assoc 5 Goal of MBA zExtract information on purchasing behavior zActionable information: can suggest ynew store layouts ynew product assortments ywhich products to put on promotion zMBA applicable whenever a customer purchases multiple things in proximity ycredit cards yservices of telecommunication companies ybanking services ymedical treatments

EDBT2000 tutorial - Assoc 6 MBA: applicable to many other contexts Telecommunication: Each customer is a transaction containing the set of customer’s phone calls Atmospheric phenomena: Each time interval (e.g. a day) is a transaction containing the set of observed event (rains, wind, etc.) Etc.

EDBT2000 tutorial - Assoc 7 Association Rules zExpress how product/services relate to each other, and tend to group together z“if a customer purchases three-way calling, then will also purchase call-waiting” zsimple to understand zactionable information: bundle three-way calling and call-waiting in a single package

EDBT2000 tutorial - Assoc 8 Useful, trivial, unexplicable zUseful: “On Thursdays, grocery store consumers often purchase diapers and beer together”. zTrivial: “Customers who purchase maintenance agreements are very likely to purchase large appliances”. zUnexplicable: “When a new hardaware store opens, one of the most sold items is toilet rings.”

EDBT2000 tutorial - Assoc 9 Association Rules Road Map z Single dimension vs. multiple dimensional associations y E.g., association on items bought vs. linking on different attributes. y Intra-Attribute vs. Inter-Attribute z Boolean vs. quantitative associations y Association on categorical vs. numerical data z Simple vs. constraint-based y E.g., small sales (sum 1,000)? z Single level vs. multiple-level analysis y E.g., what brands of beers are associated with what brands of diapers? z Association vs. correlation analysis. y Association does not necessarily imply correlation.

EDBT2000 tutorial - Assoc 10 Basic Concepts Transaction : Relational formatCompact format Item: single element, Itemset: set of items Support of an itemset I: # of transaction containing I Minimum Support  : threshold for support Frequent Itemset : with support  . Frequent Itemsets represents set of items which are positively correlated

EDBT2000 tutorial - Assoc 11 Frequent Itemsets Support({dairy}) = 3 (75%) Support({fruit}) = 3 (75%) Support({dairy, fruit}) = 2 (50%) If  = 60%, then {dairy} and {fruit} are frequent while {dairy, fruit} is not.

EDBT2000 tutorial - Assoc 12 Frequent Itemsets vs. Logic Rules Frequent itemset I = {a, b} does not distinguish between (1) and (2) Logic does: x  y iff when x holds, y holds too (1) (2)

EDBT2000 tutorial - Assoc 13 Association Rules: Measures +Let A and B be a partition of I : A  B [s, c] A and B are itemsets s = support of A  B = support(A  B) c = confidence of A  B = support(A  B)/support(A) + Measure for rules: + minimum support  + minimum confidence  +The rules holds if : s   and c  

EDBT2000 tutorial - Assoc 14 Association Rules: Meaning A  B [ s, c ] Support: denotes the frequency of the rule within transactions. A high value means that the rule involve a great part of database. support(A  B [ s, c ]) = p(A  B) Confidence: denotes the percentage of transactions containing A which contain also B. It is an estimation of conditioned probability. confidence(A  B [ s, c ]) = p(B|A) = p(A & B)/p(A).

EDBT2000 tutorial - Assoc 15 Association Rules - Example For rule A  C: support = support({A, C}) = 50% confidence = support({A, C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%

EDBT2000 tutorial - Assoc 16 Association Rules – the effect

EDBT2000 tutorial - Assoc 17 Association Rules – the parameters  and  Minimum Support  : High  few frequent itemsets  few valid rules which occur very often Low  many valid rules which occur rarely Minimum Confidence  : High  few rules, but all “almost logically true” Low  many rules, but many of them very “uncertain” Typical Values:  = 2  10 %  = 70  90 %

EDBT2000 tutorial - Assoc 18 Association Rules – visualization (Patients <15 old for USL 19 (a unit of Sanitary service), January-September 1997) AZITHROMYCINUM (R) => BECLOMETASONE Supp=5,7% Conf=34,5% SULBUTAMOLO => BECLOMETASONE Supp=~4% Conf=57%

EDBT2000 tutorial - Assoc 19 Association Rules – bank transactions Step 1: Create groups of customers (cluster) on the base of demographical data. Step 2: Describe customers of each cluster by mining association rules. Example: Rules on cluster 6 (23,7% of dataset):

EDBT2000 tutorial - Assoc 20 Cluster 6 (23.7% of customers)

EDBT2000 tutorial - Assoc 21 Association rules - module outline zWhat are association rules (AR) and what are they used for: zThe paradigmatic application: Market Basket Analysis zThe single dimensional AR (intra-attribute) zHow to compute AR zBasic Apriori Algorithm and its optimizations zMulti-Dimension AR (inter-attribute) zQuantitative AR zConstrained AR zHow to reason on AR and how to evaluate their quality zMultiple-level AR zInterestingness zCorrelation vs. Association

EDBT2000 tutorial - Assoc 22 Basic Apriori Algorithm Problem Decomposition ÀFind the frequent itemsets: the sets of items that satisfy the support constraint uA subset of a frequent itemset is also a frequent itemset, i.e., if {A,B} is a frequent itemset, both {A} and {B} should be a frequent itemset uIteratively find frequent itemsets with cardinality from 1 to k (k-itemset) ÁUse the frequent itemsets to generate association rules.

EDBT2000 tutorial - Assoc 23 Problem Decomposition zFor minimum support = 50% = 2 transactions and minimum confidence = 50% For the rule 1  3: Support = Support({1, 3}) = 50% Confidence = Support({1,3})/Support({1}) = 66%

EDBT2000 tutorial - Assoc 24 The Apriori algorithm zF k : Set of frequent itemsets of size k zC k : Set of candidate itemsets of size k F 1 = {frequent items} for ( k=1; F k != 0; k++) do { C k+1 = New candidates generated from F k foreach transaction t in the database do Increment the count of all candidates in C k+1 that are contained in t F k+1 = Candidates in C k+1 with minimum support } Answer = U k F k

EDBT2000 tutorial - Assoc 25 The Apriori property If B is frequent and A  B then A is also frequent Each transaction which contains B contains also A, which implies supp.(A)  supp.(B)) Consequence: if A is not frequent, then it is not necessary to generate the itemsets which include A. Example: with minimum support = 30%. The itemset {c} is not frequent so is not necessary to check for: {c, a}, {c, b}, {c, d}, {c, a, b}, {c, a, d}, {c, b, d}

EDBT2000 tutorial - Assoc 26 Apriori - Example abcd c, db, db, ca, da, ca, b a, b, db, c, da, c, da, b, c a,b,c,d {a,d} is not frequent, so the 3-itemsets {a,b,d}, {a,c,d} and the 4-itemset {a,b,c,d}, are not generated.

EDBT2000 tutorial - Assoc 27 Apriori - Example Database D C1C1 Scan D F1F1 C2C2 C2C2 F2F2

EDBT2000 tutorial - Assoc 28 Optimizations zDHP: Direct Hash and Pruning (Park, Chen and Yu, SIGMOD’95). zPartitioning Algorithm (Savasere, Omiecinski and Navathe, VLDB’95). zSampling (Toivonen’96). zDynamic Itemset Counting (Brin et. al. SIGMOD’97)

EDBT2000 tutorial - Assoc 29 Multidimensional AR Associations between values of different attributes : RULES: nationality = French  income = high [50%, 100%] income = high  nationality = French [50%, 75%] age = 50  nationality = Italian [33%, 100%]

EDBT2000 tutorial - Assoc 30 Single-dimensional vs multi-dimensional AR Single-dimensional ( Intra-attribute) The events are: items A, B and C belong to the same transaction Occurrence of events: transactions Multi-dimensional ( Inter-attribute) The events are : attribute A assumes value a, attribute B assumes value b and attribute C assumes value c. Occurrence of events: tuples

EDBT2000 tutorial - Assoc 31 Single-dimensional AR vs Multi-dimensional Multi-dimensional Single-dimensional Schema:

EDBT2000 tutorial - Assoc 32 Quantitative Attributes zQuantitative attributes (e.g. age, income) zCategorical attributes (e.g. color of car) Problem: too many distinct values Solution: transform quantitative attributes in categorical ones via discretization.

EDBT2000 tutorial - Assoc 33 Quantitative Association Rules [Age: ] and [Married: Yes]  [NumCars:2] min support = 40% min confidence = 50%

EDBT2000 tutorial - Assoc 34 Discretization of quantitative attributes Solution: each value is replaced by the interval to which it belongs. height: 0-150cm, cm, cm, >180cm weight: 0-40kg, 41-60kg, 60-80kg, >80kg income: 0-10ML, 11-20ML, 20-25ML, 25-30ML, >30ML Problem: the discretization may be useless (see weight).

EDBT2000 tutorial - Assoc 35 Discretization of quantitative attributes (2) 1. Interval with a fixed “reasonable” granularity Ex. intervals of 10 cm for height. 2.Interval size is defined by some domain dependent criterion Ex.: 0-20ML, 21-22ML, 23-24ML, 25-26ML, >26ML 3.Interval size determined by analyzing data, studying the distribution or using clustering kg kg > 68 kg

EDBT2000 tutorial - Assoc 36 Discretization of quantitative attributes (3) 1.Quantitative attributes are statically discretized by using predefined concept hierarchies: zelementary use of background knowledge Loose interaction between Apriori and discretizer zQuantitative attributes are dynamically discretized yinto “bins” based on the distribution of the data. yconsidering the distance between data points. Tighter interaction between Apriori and discretizer

EDBT2000 tutorial - Assoc 37 Constraint-based AR zPreprocessing : use constraints to focus on a subset of transactions yExample: find association rules where the prices of all items are at most 200 Euro zOptimizations: use constraints to optimize Apriori algorithm yAnti-monotonicity: when a set violates the constraint, so does any of its supersets. yApriori algorithm uses this property for pruning zPush constraints as deep as possible inside the frequent set computation

EDBT2000 tutorial - Assoc 38 Constraint-based AR (2) zWhat kinds of constraints can be used in mining? yData constraints: xSQL-like queries Find product pairs sold together in Vancouver in Dec.’98. xOLAP-like queries (Dimension/level ) in relevance to region, price, brand, customer category. yRule constraints: xspecify the form or property of rules to be mined. Constraint-based AR

EDBT2000 tutorial - Assoc 39 Rule Constraints zTwo kind of constraints: yRule form constraints: meta-rule guided mining.  P(x, y) ^ Q(x, w)  takes(x, “database systems”). yRule content constraint: constraint-based query optimization (Ng, et al., SIGMOD’98). xsum(LHS) 20 ^ count(LHS) > 3 ^ sum(RHS) > 1000 z1-variable vs. 2-variable constraints (Lakshmanan, et al. SIGMOD’99 ): y1-var: A constraint confining only one side (L/R) of the rule, e.g., as shown above. y2-var: A constraint confining both sides (L and R). xsum(LHS) < min(RHS) ^ max(RHS) < 5* sum(LHS)

EDBT2000 tutorial - Assoc 40 Mining Association Rules with Constraints zPostprocessing yA naïve solution: apply Apriori for finding all frequent sets, and then to test them for constraint satisfaction one by one. zOptimization yHan approach: comprehensive analysis of the properties of constraints and try to push them as deeply as possible inside the frequent set computation.

EDBT2000 tutorial - Assoc 41 Apriori property revisited zAnti-monotonicity: If a set S violates the constraint, any superset of S violates the constraint. zExamples: ysum(S.Price)  v is anti-monotone ysum(S.Price)  v is not anti-monotone ysum(S.Price) = v is partly anti-monotone zApplication: yPush “sum(S.price)  1000” deeply into iterative frequent set computation.

EDBT2000 tutorial - Assoc 42 Association rules - module outline zWhat are association rules (AR) and what are they used for: zThe paradigmatic application: Market Basket Analysis zThe single dimensional AR (intra-attribute) zHow to compute AR zBasic Apriori Algorithm and its optimizations zMulti-Dimension AR (inter-attribute) zQuantitative AR zConstrained AR zHow to reason on AR and how to evaluate their quality zMultiple-level AR zInterestingness zCorrelation vs. Association

EDBT2000 tutorial - Assoc 43 Multilevel AR zIs difficult to find interesting patterns at a too primitive level yhigh support = too few rules ylow support = too many rules, most uninteresting zApproach: reason at suitable level of abstraction zA common form of background knowledge is that an attribute may be generalized or specialized according to a hierarchy of concepts zDimensions and levels can be efficiently encoded in transactions zMultilevel Association Rules : rules which combine associations with hierarchy of concepts

EDBT2000 tutorial - Assoc 44 Hierarchy of concepts

EDBT2000 tutorial - Assoc 45 Multilevel AR zFresh  Bakery [20%, 60%] zDairy  Bread [6%, 50%] zFruit  Bread [1%, 50%] is not valid Fresh [support = 20%] Dairy [support = 6%] Back Fruit [support = 4%] Vegetable [support = 7%]

EDBT2000 tutorial - Assoc 46 Support and Confidence of Multilevel Association Rules zGeneralizing/specializing values of attributes affects support and confidence zfrom specialized to general: support of rules increases (new rules may become valid) zfrom general to specialized: support of rules decreases (rules may become not valid, their support falls under the threshold) zConfidence is not affected

EDBT2000 tutorial - Assoc 47 Reasoning with Multilevel AR zToo low level => too many rules and too primitive. Example: Apple Melinda  Colgate Tooth-paste It is a curiosity not a behavior zToo high level => uninteresting rules Example: Foodstuff  Varia zRedundancy => some rules may be redundant due to “ancestor” relationships between items. yA rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor. zExample (milk has 4 subclasses) y milk  wheat bread, [support = 8%, confidence = 70%] y2% milk  wheat bread, [support = 2%, confidence = 72%]

EDBT2000 tutorial - Assoc 48 Mining Multilevel AR zCalculate frequent itemsets at each concept level, until no more frequent itemsets can be found zFor each level use Apriori zA top_down, progressive deepening approach: y First find high-level strong rules: fresh  bakery [20%, 60%]. y Then find their lower-level “weaker” rules: fruit  bread [6%, 50%]. zVariations at mining multiple-level association rules. – Level-crossed association rules: fruit  wheat bread – Association rules with multiple, alternative hierarchies: fruit  Wonder bread

EDBT2000 tutorial - Assoc 49 Multi-level Association: Uniform Support vs. Reduced Support zUniform Support: the same minimum support for all levels y+ One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. y– If support threshold too high  miss low level associations. too low  generate too many high level associations. zReduced Support: reduced minimum support at lower levels - different strategies possible

EDBT2000 tutorial - Assoc 50 Uniform Support Multi-level mining with uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Back

EDBT2000 tutorial - Assoc 51 Reduced Support Multi-level mining with reduced support 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 3% Back Milk [support = 10%]

EDBT2000 tutorial - Assoc 52 zRedundancy: if {a}  {b, c} holds, then {a, b}  {c} and {a, c}  {b} hold also with same support and less or equal confidence. So first rule is stronger. zSignificance: Example: {b}  {a} has confidence (66%), but is not significant as support({a}) = 75%. Reasoning with AR

EDBT2000 tutorial - Assoc 53 Beyond Support and Confidence zExample 1: (Aggarwal & Yu, PODS98) z{tea} => {coffee} has high support (20%) and confidence (80%) zHowever, a priori probability that a customer buys coffee is 90% yA customer who is known to buy tea is less likely to buy coffee (by 10%) yThere is a negative correlation between buying tea and buying coffee y{~tea} => {coffee} has higher confidence(93%)

EDBT2000 tutorial - Assoc 54 Correlation and Interest zTwo events are independent if P(A  B) = P(A)*P(B), otherwise are correlated. zInterest = P(A  B) / P(B)*P(A) zInterest expresses measure of correlation y= 1  A and B are independent events yless than 1  A and B negatively correlated, ygreater than 1  A and B positively correlated. yIn our example, I(drink tea  drink coffee )=0.89 i.e. they are negatively correlated.

EDBT2000 tutorial - Assoc 55 Domain dependent measures zTogether with support, confidence, interest, …, use also (in post-processing) domain- dependent measures zE.g., use rule constraints on rules zExample: take only rules which are significant with respect their economic value z sum(LHS)+ sum(RHS) > 100

EDBT2000 tutorial - Assoc 56 Temporal AR zCan use temporal dimension in data zE.g., y{diaper} -> {beer} [5%, 87%] ysupport may jump to 25% every Thursday night zHow to mine AR’s that follow interesting user defined temporal patterns? zChallenge is to design algorithms that avoid to compute every rule at every time unit.

EDBT2000 tutorial - Assoc 57 A brief history of AR mining research zApriori (Agrawal et. al SIGMOD93) zOptimizations of Apriori xFast algorithm (Agrawal et. al VLDB94) xHash-based (Park et. al SIGMOD95) xPartitioning (Navathe et. al VLDB95) xDirect Itemset Counting (Brin et. al SIGMOD97) zProblem extensions xGeneralized AR (Srikant et. al; Han et. al. VLDB95) xQuantitative AR (Srikant et. al SIGMOD96) xN-dimensional AR (Lu et. al DMKD’98) xTemporal AR (Ozden et al. ICDE98) zParallel mining (Agrawal et. al TKDE96) zDistributed mining (Cheung et. al PDIS96) zIncremental mining (Cheung et. al ICDE96)

EDBT2000 tutorial - Assoc 58 Conclusions zAssociation rule mining yprobably the most significant contribution from the database community to KDD yA large number of papers have been published zMany interesting issues have been explored zAn interesting research direction yAssociation analysis in other types of data: spatial data, multimedia data, time series data, etc.

EDBT2000 tutorial - Assoc 59 Conclusion (2) zMBA is a key factor of success in the competition of supermarket retailers. zKnowledge of customers and their purchasing behavior brings potentially huge added value.

EDBT2000 tutorial - Assoc 60 Which tools for market basket analysis? zAssociation rule are needed but insufficient zMarket analysts ask for business rules: yIs supermarket assortment adequate for the company’s target class of customers? yIs a promotional campaign effective in establishing a desired purchasing habit?

EDBT2000 tutorial - Assoc 61 Business rules: temporal reasoning on AR zWhich rules are established by a promotion? zHow do rules change along time?

EDBT2000 tutorial - Assoc 62 Our position zA suitable integration of ydeductive reasoning (logic database languages) yinductive reasoning (association rules) zprovides a viable solution to high-level problems in market basket analysis zDATASIFT: LDL++ (UCLA deductive database) extended with association rules and decisin trees.

EDBT2000 tutorial - Assoc 63 References - Association rules zR. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD'93, , Washington, D.C. zR. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB' , Santiago, Chile. zR. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, 3-14, Taipei, Taiwan. zR. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93, Seattle, Washington. zS. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. SIGMOD'97, , Tucson, Arizona.. zD.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. ICDE'96, , New Orleans, LA.. zT. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD'96, 13-23, Montreal, Canada. zE.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. SIGMOD'97, , Tucson, Arizona. zJ. Han and Y. Fu. Discovery of multiple-level association rules from large databases. VLDB'95, , Zurich, Switzerland. zM. Kamber, J. Han, and J. Y. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. KDD'97, , Newport Beach, California. zM. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. CIKM'94, , Gaithersburg, Maryland. zR. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD'98, 13-24, Seattle, Washington. zB. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, , Orlando, FL. zJ.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95, , San Jose, CA. zS. Ramaswamy, S. Mahajan, and A. Silberschatz. On the discovery of interesting patterns in association rules. VLDB'98, , New York, NY. zS. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD'98, , Seattle, WA.

EDBT2000 tutorial - Assoc 64 References - Association rules zA. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. VLDB'95, , Zurich, Switzerland. zC. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB'98, , New York, NY. zR. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95, , Zurich, Switzerland. zR. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD'96, 1- 12, Montreal, Canada. zR. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97, 67-73, Newport Beach, California. zD. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining. SIGMOD'98, 1-12, Seattle, Washington. zB. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, , Orlando, FL. zR.J. Miller and Y. Yang. Association rules over interval data. SIGMOD'97, , Tucson, Arizona. zJ. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. ICDE'99, Sydney, Australia. zF. Giannotti, G. Manco, D. Pedreschi and F. Turini. Experiences with a logic-based knowledge discovery support environment. In Proc ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD'99 DMKD). Philadelphia, May zF. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Integration of Deduction and Induction for Mining Supermarket Sales Data. In Proc. PADD'99, Practical Application of Data Discovery, Int. Conference, London, April 1999.