Data Mining, Frequent-Itemset Mining. Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot.

Slides:



Advertisements
Similar presentations
1 CPS : Information Management and Mining Association Rules and Frequent Itemsets.
Advertisements

Data Mining of Very Large Data
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
IDS561 Big Data Analytics Week 6.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Introduction to Data Mining
1 Association Rules Market Baskets Frequent Itemsets A-Priori Algorithm.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining, Frequent-Itemset Mining
Chapter 9 Business Intelligence Systems
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 Association Rules Market Baskets Frequent Itemsets A-priori Algorithm.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 “Association Rules” Market Baskets Frequent Itemsets A-priori Algorithm.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Data Mining An Introduction.
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
© 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke Slide 1 Chapter 9 Competitive Advantage with Information Systems for Decision Making.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large.
Frequent Itemsets and Association Rules 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
DATA MINING LECTURE 3 Frequent Itemsets Association Rules.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.
The Three Analytics Techniques. Decision Trees – Determining Probability.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Sampling Large Databases for Association Rules Jingting Zeng CIS 664 Presentation March 13, 2007.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Data Mining Find information from data data ? information.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
CS 345: Topics in Data Warehousing Thursday, November 18, 2004.
1 CPS216: Advanced Database Systems Data Mining Slides created by Jeffrey Ullman, Stanford.
Jeffrey D. Ullman Stanford University.  2% of your grade will be for answering other students’ questions on Piazza.  18% for Gradiance.  Piazza code.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
For a Merchandising Business. What is merchandise? A ‘good’ (anything really) Bought for a certain price Sold for a higher price Goods are bought and.
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
Jerry Post Copyright © Database Management Systems: Data Mining Market Baskets Association Rules.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Data Mining – Association Rules
On-Line Application Processing
The Shopping Basket Analysis Tool
Frequent Itemsets Association Rules
CPS216: Advanced Database Systems Data Mining
Market Basket Many-to-many relationship between different objects
Market Baskets Frequent Itemsets A-Priori Algorithm
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Learning
Presentation transcript:

Data Mining, Frequent-Itemset Mining

Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot dogs also buy mustard," Find "similar" items in a large collection. E.g.: – Find documents on the Web that share a significant amount of words – Find books that have been bought by many of the same Amazon customers. Find clusters of data. E.g. – Find clusters of Web pages by the words they use.

Frequent-Itemset Mining Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. Fundamental problem What sets of items are often bought together? Application If a large number of baskets contain both hot dogs and mustard, we can use this information in several ways. How?

Beer and Diapers What’s the explanation here?

On-Line Purchases Amazon.com offers several million different items for sale, and has several tens of millions of customers. Baskets = Customers, Items = Books, DVDs, etc. Motivation: Find out what items are bought together. Baskets = Books, DVDs, etc. Items = Customers Motivation: Find out similar customers.

Words and Documents Baskets = sentences; Items = words in those sentences. Motivation: Find words that appear together unusually frequently, i.e., linked concepts. Baskets = sentences, Items = documents containing those sentences. Motivation: Items that appear together too often could represent plagiarism.

Genes Baskets = people; Items = genes or blood-chemistry factors. Motivation: Detect combinations of genes that result in diabetes

Support Support for a set of items (itemset) I = the number of baskets containing all items in I. Given a support threshold s, itemsets that appear in > s baskets are called frequent itemsets.

Example: Frequent Itemsets Items={milk, coke, pepsi, beer, juice}. Support = 3 baskets. B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j},, {b,c}, {c,j}. {m,b}

Scale of Problem WalMart sells 100,000 items and can store billions of baskets. The Web has over 100,000,000 words and billions of pages.

Association Rules If-then rules about the contents of baskets. {i 1, i 2,…,i k } → j means: “if a basket contains all of i 1,…,i k then it is likely to contain j.” Confidence of this association rule is the probability of j given i 1,…,i k. Example B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} An association rule: {m, b} → c. – Confidence = 2/4 = 50%.

Interest The interest of an association rule X → Y is the absolute value of the amount by which the confidence differs from the probability of Y being in a given basket. Example B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} For association rule {m, b} → c, item c appears in 5/8 of the baskets. Interest = |2/4 - 5/8| = 1/8 --- not very interesting.

Finding Association Rules Typical question: – “find all association rules with support ≥ s and confidence ≥ c.” Note: “support” of an association rule is the support of the set of items it mentions. Hard part: finding the high-support (frequent ) itemsets. – Checking the confidence of association rules involving those sets is relatively easy.