1 Top Down FP-Growth for Association Rule Mining By Ke Wang.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Mining Multiple-level Association Rules in Large Databases
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Item Mining.
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Fast Algorithms for Association Rule Mining
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Performance and Scalability: Apriori Implementation.
SEG Tutorial 2 – Frequent Pattern Mining.
1 Top Down FP-Growth for Association Rule Mining Ke Wang Liu Tang Jiawei Han Junqiang Liu Simon Fraser University.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
What Is Sequential Pattern Mining?
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Chapter 6: Mining Frequent Patterns, Association and Correlations
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Association rule mining
Frequent Pattern Mining
Byung Joon Park, Sung Hee Kim
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Presentation transcript:

1 Top Down FP-Growth for Association Rule Mining By Ke Wang

2 Introduction Classically, for rule A  B : –support: computed by count( AB ) frequent --- if pass minimum support threshold –confidence: computed by count( AB ) / count(A ) confident – if pass minimum confidence threshold How to mine association rules? –find all frequent patterns –generate rules from the frequent patterns

3 Introduction Limitations of current research –use uniform minimum support threshold –only use support as pruning measure Our contribution –improve efficiency –adopt multiple minimum supports –introduce confidence pruning

4 Related work -- Frequent pattern mining Apriori algorithm –method: use anti-monotone property of support to do pruning, i.e. if length k pattern is infrequent, its length k+1 super-pattern can never be frequent FP-growth algorithm--better than Apriori –method: build FP-tree to store database mine FP-tree in bottom-up order

5 Related work -- Association rule mining Fast algorithms trying to guarantee completeness of frequent patterns Parallel algorithms & association rule based query languages Various association rule mining problems –multi-level multi-dimension rule –constraints on specific item

6 TD-FP-Growth for frequent pattern mining Similar tree structure as FP-growth –Compressed tree to store the database –nodes on each path of the tree are globally ordered Different mining method VS.FP-growth –FP-growth: bottom-up tree mining –TD-FP-Growth : top-down tree mining

7 TD-FP-Growth for frequent pattern mining b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 b, e a, b, c, e b, c, e a, c, d a minsup = 2 Entry valuecountside-link abceabce Construct a FP-tree:

8 b, e a, b, c, e b, c, e a, c, d a minsup = 2 itemHead of node-link abceabce TD-FP-Growth for frequent pattern mining FP-growth: bottom-up mining b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 (b: 1) (b: 1, c: 1) (a: 1, b: 1, c: 1) e’s conditional pattern base Mining order: e, c, b, a

9 TD-FP-Growth for frequent pattern mining FP-growth: bottom-up mining (b: 1) (b: 1, c: 1) (a: 1, b: 1, c: 1) root b: 3 c: 2 itemHead of node-link bcbc  drawback! both e’s conditional pattern base and conditional FP-tree are stored in memory mine e’s conditional FP-tree recursively conditional pattern bases and FP-trees are built for all other items and their super-patterns

10 TD-FP-Growth for frequent pattern mining TD-FP-Growth : adopt top-down mining strategy –motivation: avoid building extra databases and sub-trees as FP-growth does –method: process nodes on the upper level before those on the lower level –result: any modification happened on the upper level nodes would not affect the lower level nodes See example 

11 TD-FP-Growth for frequent pattern mining b, e a, b, c, e b, c, e a, c, d a minsup = 2 CT-tree and header table H Entry valuecountside-link abceabce b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 Mining order: a, b, c, e

12 CT-tree for frequent pattern mining b, e a, b, c, e b, c, e a, c, d a minsup = 2 a: 2 b: 1 CT-tree and header table H b: 2 root b: 1c: 1 a: 3 e: 1 c: 1e: 1 c: 1 e: 1 sub-header-table H_c Entry valuecountside-link abab 2222 Entry valuecountside-link abceabce

13 CT-tree for frequent pattern mining Completeness –for entry i in H, we mine all the frequent patterns that end up with item i, no more and no less Complete set of frequent patterns: {a } {b } {c }, {b, c }, {a, c } {e }, {b, e }, {c, e }, {b, c, e }

14 TD-FP-Growth for frequent pattern mining Comparing to FP-growth, TD-FP-Growth is: –Space saving: only one tree and a few header tables no extra databases and sub-trees –Time saving: does not build extra databases and sub-trees walk up path only once to update count information for nodes on the tree and build sub- header-tables.

15 TD-FP-Growth for association rule mining Assumptions: –There is a class-attribute in the database –Items in the class-attribute called class-items, others are non-class-items –Each transaction is associated a class-item –Only class-item appears in the right-hand of the rule Transaction ID non-class- attribute class-attribute 1a, b…C1C1 2d…C2C2 3e, d, f…C3C3 ……… example rule: a, b  C i

16 TD-FP-Growth for association rule mining-- multi mini support Why? –Use uniform minimum support, computation of count considers only number of appearance –Uniform minimum support is unfair to items that appears less but worth more. Eg. responder vs. non-responder How? –Use different support threshold for different class

17 TD-FP-Growth for association rule mining -- multi mini support multiple VS. uniform –C 1 : 4, C 2 : 2 –rules with relative minsup = 50% proportional to each class -- multiplier in performance uniform minimum support: absolute minsup = 1; –11 nodes tree, 23 rules multiple minimum supports: absolute minsup 1 = 2; absolute minsup 2 = 1; –7 nodes tree, 9 rules –more effective and space-saving –time-saving --- show in performance c, f, C 1 b, e, C 2 b, e, f, C 1 a, c, f, C 1 c, e, C 2 b, c, d, C 1

18 TD-FP-Growth for association rule mining --conf pruning Motivation –make use of the other constraint of association rule: confidence, to speed up mining Method –confidence is not anti-monotone –introduce: acting constraint of confidence, which is anti-monotone –push it inside the mining process

19 TD-FP-Growth for association rule mining --conf pruning conf(A  B) = count(AB) / count(A) >= minconf  count(AB) >= count(A) * minconf  count(AB) >= minsup * minconf (anti-monotone & weaker) --- the acting constraint of confidence for the original confidence constraint of rule A  B support of rule is computed by: count(A) count(AB): class-count of itemset A related to class B

20 TD-FP-Growth for association rule mining --conf pruning c, f, C 1 b, e, C 2 b, e, f, C 1 a, c, f, C 1 a, c, d, C 2 minsup = 2 minconf= 60% Header table H: count(i) = count(i, C 1 ) + count(i, C 2 ) root b: 2 e: 2 …… … count(e) >= minsup; However, both count(e, C 1 ) & count(e, C 2 ) < minsup * minconf;  terminate mining for e! sub-header-table H_e If no confidence pruning  Entry value i count (i) count(i,Ci)count(i,C 2 ) side- link b211 Entry value i count (i) count(i,C 1 ) count(i,C 2 ) side- link abcefabcef ……………………

21 Performance Choose several data sets from UC_Irvine Machine Learning Database Repository: h ttp:// name of dataset # of transactions # of items in each transaction class distribution # of distinct items Dna-train %, 24.25%, 52.55% 240 Connect %, 24.62%, 65.83% 126 Forest %, 1.63%, 2.99%, 3.53%, 6.15%, 36.36%, 48.76% 15916

22 Performance—frequent pattern

23 Performance — mine rules with multiple minimum supports relative minsup, proportional to each class FP-growth is only for frequent pattern mining

24 Performance — mine rules with confidence pruning

25 Conclusions and future work Conclusions of TD-FP-Growth algorithm –more efficient in finding both frequent patterns and association rules –more effective in mining rules by using multiple minimum supports –Introduce a new pruning method: confidence pruning, and push it inside the mining process; thus further speed up mining

26 Conclusions and future work Future work –Explore other constraint-based association rule mining method –Mine association rules with item concept hierarchy –Apply TD-FP-Growth to applications based on association rule mining Clustering Classification

27 Reference (1) R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. Proc ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’93), pages , Washington, D.C., May (2) U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.). Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, (3) H. Toivonen. Sampling large databases for association rules. Proc Int. Conf. Very Large Data Bases (VLDB’96), pages , Bombay, India, September (4) R. Agrawal and S. Srikant. Mining sequential patterns. Proc Int. Conf. Data Engineering (ICDE’95), pages 3-14, Taipei, Taiwan, March (5) J. Han, J. Pei and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proc ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’00), pages 1-12, Dallas, TX, May (6) J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. Proc ACM-SIGMOD Int. Conf., Santa Barbara, CA, May And more!