Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Brian Chase.  Retailers now have massive databases full of transactional history ◦ Simply transaction date and list of items  Is it possible to gain.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
Mining Association Rules in Large Databases
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Lecture14: Association Rules
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Performance and Scalability: Apriori Implementation.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining Chapter 2 Association Rule Mining
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Information Systems Data Analysis – Association Mining Prof. Les Sztandera.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Overview Definition of Apriori Algorithm
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
Data Mining Find information from data data ? information.
Association rule mining
Association Rules Repoussis Panagiotis.
Mining Association Rules
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Gyozo Gidofalvi Uppsala Database Laboratory
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
Mining Sequential Patterns
Market Basket Analysis and Association Rules
©Jiawei Han and Micheline Kamber
15-826: Multimedia Databases and Data Mining
Presentation transcript:

Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang

Outline Introduction Formal Model Apriori Algorithm Experiments Summary

Introduction Association rule: - Association rules are used to discover elements that co-occur frequently within a dataset consisting of multiple independent selections of elements (such as purchasing transactions), and to discover rules. Applications: - Questions such as "if a customer purchases product A, how likely is he to purchase product B?" and "What products will a customer buy if he buys products C and D?" are answered by association-finding algorithms. (market basket analysis)

Formal Model Let I = I_1, I_2,..., I_n be a set of items. Let T be a database of transactions. Each transaction t in T is represented as a subset of I. Let X be a subset of I. Support and Confidence: By an association rule, we mean an implication of the form X  I_k, where X is a set of some items in I, and I_k is a single item in I that is not present in X. support: probability that a transaction contains X and I_k. P(X,I_k) confidence: conditional probability that a transaction having X also contains I_k. P(l_k | X)

Support and Confidence - Example Let minimum support 50%, and minimum confidence 50%, we have –A  C (50%, 66.6%) –C  A (50%, 100%)

Apriori Algorithm To find subsets which are common to at least a minimum confidence of the itemsets. Using a "bottom up" approach, where frequent itemsets (the sets of items that follows minimum support) are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Generating from each large itemset, rules that use items from the large itemset

Find Frequent Itemsets - Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

Experiments We experimented with the rule mining algorithm using the sales data obtained from a large retailing company. There are a total of 46,873 customer transactions in this data. Each transaction contains the department numbers from which a customer bought an item in a visit. There are a total of 63 departments. The algorithm finds if there is an association between departments in the customer purchasing behavior.

The following rules were found for a minimum support of 1% and minimum condence of 50%. [Tires]  [Automotive Services] (98.80, 5.79) [Auto Accessories], [Tires]  [Automotive Services] (98.29, 1.47) [Auto Accessories]  [Automotive Services] (79.51, 11.81) [Automotive Services]  [Auto Accessories] (71.60, 11.81) [Home Laundry Appliances]  [Maintenance Agreement Sales] (66.55, 1.25) [Children's Hardlines]  [Infants and Children's wear] (66.15, 4.24) [Men's Furnishing]  [Men's Sportswear] (54.86, 5.21)

Summary Apriori, while historically significant, suffers from a number of inefficiencies or trade-offs, which have spawned other algorithms. Hash tables: uses a hash tree to store candidate itemsets. This hash tree has item sets at the leaves and at internal nodes Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB Sampling: mining on a subset of given data, need a lower support threshold + a method to determine the completeness.

Reference R. Agrawal, T. Imielinski, A. Swami: “Mining Associations between Sets of Items in Massive Databases”, Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Washington D.C., May 1993,