Download presentation
Presentation is loading. Please wait.
Published byBarnaby Strickland Modified over 9 years ago
1
CS 349: Market Basket Data Mining All about beer and diapers.
2
Overview What is Data Mining Market Baskets How fast does it run? What does it do?
3
What is Data Mining? Statistics Data Analysis Machine Learning Databases
4
Types of Data that can be Mined market basket classification time series text
5
Applications of Market Basket supermarkets data with boolean attributes –census data: single vs married word occurrence
6
Some Measures of the Data number of baskets : N number of items : M average number of items per basket: W (width)
7
Aspects of Market Basket Mining What is interesting? How do you make it run fast?
8
What is Interesting? (first try) Itemset I = set of items association rule - A -> B support(I) = fraction of baskets that contain I confidence(A->B) = probability that a basket contains B given that it contains A
9
How do you find Itemsets with high support? Apriori algorithm, Agrawal et al (1993) Find all itemsets with support > s 1-itemset = itemset with 1 item … k-itemset = itemset with k items large itemset = itemset with support > s candidate itemset = itemset that may have support > s
10
Apriori Algorithm start with all 1-itemsets go through data and count their support and find all “large” 1-itemsets combine them to form “candidate” 2- itemsets go through data and count their support and find all “large” 2-itemsets combine them to form “candidate” 3- itemsets …
11
Run Time k passes over data where k is the size of the largest candidate itemset Memory chunking algorithm ==> 2 passes over data on disk but multiple in memory Toivonen 1996 gives statistical technique 1 + e passes (but more memory) Brin 1997 - Dynamic Itemset Counting 1 + e passes (less memory)
12
But what is really interesting? A->B Support = P(AB) Confidence = P(B|A) Interest = P(AB)/P(A)P(B) Implication Strength = P(A)P(~B)/P(A~B)
13
But what is really really interesting? Causality Surprise
14
Summary What is Data Mining? Market Baskets Finding Itemsets with high support Finding Interesting Rules
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.