Data Mining, Frequent-Itemset Mining

Slides:



Advertisements
Similar presentations
1 CPS : Information Management and Mining Association Rules and Frequent Itemsets.
Advertisements

DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Data Mining of Very Large Data
1 Data Mining Introductions What Is It? Cultures of Data Mining.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
IDS561 Big Data Analytics Week 6.
Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Introduction to Data Mining
1 Association Rules Market Baskets Frequent Itemsets A-Priori Algorithm.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions,
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 Association Rules Market Baskets Frequent Itemsets A-priori Algorithm.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Data Mining, Frequent-Itemset Mining. Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
1 “Association Rules” Market Baskets Frequent Itemsets A-priori Algorithm.
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Data Mining An Introduction.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large.
Frequent Itemsets and Association Rules 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Sampling Large Databases for Association Rules Jingting Zeng CIS 664 Presentation March 13, 2007.
Data Mining Find information from data data ? information.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
1 CPS216: Advanced Database Systems Data Mining Slides created by Jeffrey Ullman, Stanford.
Jeffrey D. Ullman Stanford University.  2% of your grade will be for answering other students’ questions on Piazza.  18% for Gradiance.  Piazza code.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Elsayed Hemayed Data Mining Course
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
Jerry Post Copyright © Database Management Systems: Data Mining Market Baskets Association Rules.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Data Mining – Association Rules
On-Line Application Processing
The Shopping Basket Analysis Tool
CPS216: Advanced Database Systems Data Mining
Market Basket Many-to-many relationship between different objects
Market Baskets Frequent Itemsets A-Priori Algorithm
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
On-Line Application Processing
MIS2502: Data Analytics Association Rule Learning
Presentation transcript:

Data Mining, Frequent-Itemset Mining COMP 451/651 Data Mining, Frequent-Itemset Mining Chapter 7 1

Data Mining Discovery of useful, possibly unexpected, patterns in data. Example patterns: "50% of the people who buy hot dogs also buy mustard,“ "these three individual's pattern of credit-card expenditures indicate that they are running an illigal activity." Some mining problems Find frequent itemsets in "market-basket" data Find "similar" items in a large collection. Example applications: Find documents on the Web that share a significant amount of common text or Find books that have been bought by many of the same Amazon customers. Find clusters of data Example application Find clusters of Web pages by the words they use.

Frequent-Itemset Mining The Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. Fundamental problem What sets of items are often bought together? Application If a large number of baskets contain both hot dogs and mustard, we can use this information in several ways. How?

Hot Dogs and Mustard Apparently, many people walk from where the hot dogs are to where the mustard is. We can put them close together, and put between them other foods that might also be bought with hot dogs and mustard, e.g., ketchup or potato chips. Doing so can generate additional "impulse" sales. The store can run a sale on hot dogs and at the same time raise the price of mustard. People will come to the store for the cheap hot dogs, and many will need mustard too. It is not worth the trouble to go to another store for cheaper mustard, so they buy that too. The store makes back on mustard what it loses on hot dogs, and also gets more customers into the store.

Beer and Diapers What’s the explanation here?

On-Line Purchases Amazon.com offers several million different items for sale, and has several tens of millions of customers. Basket = Customer, Item = Book, DVD, etc. Motivation: Find out what items are bought together. Basket = Book, DVD, etc. Item = Customer Motivation: Find out similar customers.

Words and Documents Baskets = sentences; items = words in those sentences. Lets us find words that appear together unusually frequently, i.e., linked concepts. Baskets = sentences, items = documents containing those sentences. Items that appear together too often could represent plagiarism.

Genes Baskets = people; items = genes or blood-chemistry factors. Has been used to detect combinations of genes that result in diabetes

Support Question: find sets of items that appear “frequently” in the baskets. Support for itemset I = the number of baskets containing all items in I. Given a support threshold s, sets of items that appear in > s baskets are called frequent itemsets.

Example: Frequent Itemsets Items={milk, coke, pepsi, beer, juice}. Support = 3 baskets. B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j}, {m,b} , {b,c} , {c,j}.

Scale of Problem WalMart sells 100,000 items and can store billions of baskets. The Web has over 100,000,000 words and billions of pages.

Association Rules If-then rules about the contents of baskets. {i1, i2,…,ik} → j means: “if a basket contains all of i1,…,ik then it is likely to contain j.” Confidence of this association rule is the probability of j given i1,…,ik. Example B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} An association rule: {m, b} → c. Confidence = 2/4 = 50%.

Interest The interest of an association rule X → Y is the absolute value of the amount by which the confidence differs from the probability of Y being in a given basket. Example B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} For association rule {m, b} → c, item c appears in 5/8 of the baskets. Interest = |2/4 - 5/8| = 1/8 --- not very interesting.

Finding Association Rules A typical question: “find all association rules with support ≥ s and confidence ≥ c.” Note: “support” of an association rule is the support of the set of items it mentions. Hard part: finding the high-support (frequent ) itemsets. Checking the confidence of association rules involving those sets is relatively easy.