Data Mining Association Analysis: Basic Concepts and Algorithms

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Association Rule Mining
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Thanks: Jiawei Han and Jian Pei.
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Chapter 2: Mining Frequent Patterns, Associations and Correlations
Ch5 Mining Frequent Patterns, Associations, and Correlations
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
What Is Association Mining? l Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Lecture 10 Frequent Itemset Mining/Association Rule MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Find information from data data ? information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Chapter 6: Mining Frequent Patterns, Association and Correlations
Dept. of Information Management, Tamkang University
What is Frequent Pattern Analysis?
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Information Management course
Association rule mining
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Department of Computer Science National Tsing Hua University
Association Rule Mining
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Association Analysis: Basic Concepts
What Is Association Mining?
Presentation transcript:

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Also, slides from Jiawei Han and Jian Pei © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Frequent-pattern mining methods

What Is Frequent Pattern Mining? Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database [AIS93] Frequent pattern mining: finding regularities in data What products were often purchased together? What are the subsequent purchases after buying a PC? Frequent-pattern mining methods

Why Is Frequent Pattern Mining an Essential Task in Data Mining? Foundation for many essential data mining tasks Association, correlation, causality Sequential patterns, temporal or cyclic association, partial periodicity, spatial and multimedia association Associative classification, cluster analysis, iceberg cube, fascicles (semantic data compression) Broad applications Basket data analysis, cross-marketing, catalog design, sale campaign analysis Web log (click stream) analysis, DNA sequence analysis, etc. Frequent-pattern mining methods

Basic Concepts: Frequent Patterns and Association Rules Itemset X={x1, …, xk} Find all the rules XY with min confidence and support support, s, probability that a transaction contains XY confidence, c, conditional probability that a transaction having X also contains Y. Transaction-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Customer buys diaper buys both buys beer Let min_support = 50%, min_conf = 50%: A  C (50%, 66.7%) C  A (50%, 100%) Frequent-pattern mining methods

Concept: Frequent Itemsets Outlook Temperature Humidity Play sunny hot high no overcast yes rainy mild cool normal Minimum support=2 {sunny, hot, no} {sunny, hot, high, no} {rainy, normal} Min Support =3 ? How strong is {sunny, no}? Count = Percentage = Frequent-pattern mining methods

Concept: Itemset  Rules {sunny, hot, no} = {Outlook=Sunny, Temp=hot, Play=no} Generate a rule: Outlook=sunny and Temp=hot  Play=no How strong is this rule? Support of the rule = support of the itemset {sunny, hot, no} = 2 = Pr({sunny, hot, no}) Either expressed in count form or percentage form Confidence = Pr(Play=no | {Outlook=sunny, Temp=hot}) In general LHS RHS, Confidence = Pr(RHS|LHS) Confidence =Pr(RHS|LHS) =count(LHS and RHS) / count(LHS) What is the confidence of Outlook=sunnyPlay=no? Frequent-pattern mining methods

Frequent-pattern mining methods Frequent Patterns Patterns = Item Sets {i1, i2, … in}, where each item is a pair: (Attribute=value) Frequent Patterns Itemsets whose support >= minimum support Support count(itemset)/count(database) Frequent-pattern mining methods

Frequent Itemset Generation Given d items, there are 2d possible candidate itemsets Frequent-pattern mining methods

Frequent-pattern mining methods Max-patterns Max-pattern: frequent patterns without proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern Tid Items 10 A,B,C,D,E 20 B,C,D,E, 30 A,C,D,F Min_sup=2 Frequent-pattern mining methods

Maximal Frequent Itemset An itemset is maximal frequent if none of its immediate supersets is frequent Maximal Itemsets Infrequent Itemsets Border Frequent-pattern mining methods

Frequent-pattern mining methods Frequent Max Patterns Succinct Expression of frequent patterns Let {a, b, c} be frequent Then, {a, b}, {b, c}, {a, c} must also be frequent Then {a}, {b}, {c}, must also be frequent By writing down {a, b, c} once, we save lots of computation Max Pattern If {a, b, c} is a frequent max pattern, then {a, b, c, x} is NOT a frequent pattern, for any other item x. Frequent-pattern mining methods

Find Frequent Max Patterns Outlook Temperature Humidity Play sunny hot high no overcast yes rainy mild cool normal Minimum support=2 {sunny, hot, no} ?? Frequent-pattern mining methods

Mining Association Rules—an Example Min. support 50% Min. confidence 50% Transaction-id Items bought 10 A, B, C 20 A, C 30 A, D 40 B, E, F Frequent pattern Support {A} 75% {B} 50% {C} {A, C} For rule A  C: support = support({A}{C}) = 50% confidence = support({A}{C})/support({A}) = 66.6% Frequent-pattern mining methods

Method 1: Apriori: A Candidate Generation-and-test Approach Any subset of a frequent itemset must be frequent if {beer, diaper, nuts} is frequent, so is {beer, diaper} Every transaction having {beer, diaper, nuts} also contains {beer, diaper} Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! Method: generate length (k+1) candidate itemsets from length k frequent itemsets, and test the candidates against DB The performance studies show its efficiency and scalability Agrawal & Srikant 1994, Mannila, et al. 1994 Frequent-pattern mining methods

The Apriori Algorithm — An Example Itemset sup {A} 2 {B} 3 {C} {D} 1 {E} Database TDB Itemset sup {A} 2 {B} 3 {C} {E} L1 C1 Tid Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E 1st scan C2 Itemset sup {A, B} 1 {A, C} 2 {A, E} {B, C} {B, E} 3 {C, E} C2 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} L2 2nd scan Itemset sup {A, C} 2 {B, C} {B, E} 3 {C, E} C3 L3 Itemset {B, C, E} 3rd scan Itemset sup {B, C, E} 2 Frequent-pattern mining methods

Frequent-pattern mining methods FP-growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets Frequent-pattern mining methods

Frequent-pattern mining methods FP-tree construction null After reading TID=1: A:1 B:1 After reading TID=2: null B:1 A:1 B:1 C:1 D:1 Frequent-pattern mining methods

Frequent-pattern mining methods FP-Tree Construction Transaction Database null B:3 A:7 B:5 C:3 C:1 D:1 Header table D:1 C:3 E:1 D:1 E:1 D:1 E:1 D:1 Pointers are used to assist frequent itemset generation Frequent-pattern mining methods

Frequent-pattern mining methods FP-growth Conditional Pattern base for D: P = {(A:1,B:1,C:1), (A:1,B:1), (A:1,C:1), (A:1), (B:1,C:1)} Recursively apply FP-growth on P Frequent Itemsets found (with min sup = 2): AD, BD, CD, ACD, BCD, ABD null A:7 B:1 B:5 C:1 C:1 D:1 D:1 C:3 D:1 D:1 D:1 Frequent-pattern mining methods

Frequent-pattern mining methods Conclusion Effective hash-based algorithm for the candidate itemset generation Two phase transaction database pruning Much more efficient ( time & space ) than Apriori algorithm Frequent-pattern mining methods