CS 349: Market Basket Data Mining All about beer and diapers.

Overview What is Data Mining Market Baskets How fast does it run? What does it do?

What is Data Mining? Statistics Data Analysis Machine Learning Databases

Types of Data that can be Mined market basket classification time series text

Applications of Market Basket supermarkets data with boolean attributes –census data: single vs married word occurrence

Some Measures of the Data number of baskets : N number of items : M average number of items per basket: W (width)

Aspects of Market Basket Mining What is interesting? How do you make it run fast?

What is Interesting? (first try) Itemset I = set of items association rule - A -> B support(I) = fraction of baskets that contain I confidence(A->B) = probability that a basket contains B given that it contains A

How do you find Itemsets with high support? Apriori algorithm, Agrawal et al (1993) Find all itemsets with support > s 1-itemset = itemset with 1 item … k-itemset = itemset with k items large itemset = itemset with support > s candidate itemset = itemset that may have support > s

Apriori Algorithm start with all 1-itemsets go through data and count their support and find all “large” 1-itemsets combine them to form “candidate” 2- itemsets go through data and count their support and find all “large” 2-itemsets combine them to form “candidate” 3- itemsets …

Run Time k passes over data where k is the size of the largest candidate itemset Memory chunking algorithm ==> 2 passes over data on disk but multiple in memory Toivonen 1996 gives statistical technique 1 + e passes (but more memory) Brin 1997 - Dynamic Itemset Counting 1 + e passes (less memory)

But what is really interesting? A->B Support = P(AB) Confidence = P(B|A) Interest = P(AB)/P(A)P(B) Implication Strength = P(A)P(~B)/P(A~B)

But what is really really interesting? Causality Surprise

Summary What is Data Mining? Market Baskets Finding Itemsets with high support Finding Interesting Rules

CS 349: Market Basket Data Mining All about beer and diapers.

Similar presentations

Presentation on theme: "CS 349: Market Basket Data Mining All about beer and diapers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 349: Market Basket Data Mining All about beer and diapers.

Similar presentations

Presentation on theme: "CS 349: Market Basket Data Mining All about beer and diapers."— Presentation transcript:

Similar presentations

About project

Feedback