732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Graph Mining Laks V.S. Lakshmanan
The FP-Growth/Apriori Debate Jeffrey R. Ellis CSE 300 – 01 April 11, 2002.
FP-Growth algorithm Vasiljevic Vladica,
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
CPS : Information Management and Mining
Data Mining: Association Rule Mining CSE880: Database Systems.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms
2015年5月25日星期一 2015年5月25日星期一 2015年5月25日星期一 Data Mining: Concepts and Techniques1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic.
Mining Association Rules in Large Databases
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
EECS 800 Research Seminar Mining Biological Data
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
1 Association Rule Mining (II) Instructor: Qiang Yang Thanks: J.Han and J. Pei.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Performance and Scalability: Apriori Implementation.
SEG Tutorial 2 – Frequent Pattern Mining.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Mining Frequent Patterns without Candidate Generation.
Chapter 4 – Frequent Pattern Mining Shuaiqiang Wang ( 王帅强 ) School of Computer Science and Technology Shandong University of Finance and Economics Homepage:
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Frequent Pattern  交易資料庫中頻繁的被一起購買的產品  可以做為推薦產品、銷售決策的依據  兩大演算法 Apriori FP-Tree.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Frequent itemset mining and temporal extensions Sunita Sarawagi
KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
UNIT 4 December 5, 2015 Data Mining: Concepts and Techniques1.
1 Data Mining: Mining Frequent Patterns, Association and Correlations.
Association Analysis (3)
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
2016年6月14日星期二 2016年6月14日星期二 2016年6月14日星期二 Data Mining: Concepts and Techniques1 Mining Frequent Patterns, Associations, and Correlations (Chapter 5)
Association Rule Mining CENG 514 Data Mining
Association Rule Mining CENG 514 Data Mining July 2,
Overview on Association Rule Mining CENG 770 Advanced Data Mining
Reducing Number of Candidates
Association Rule Mining
Data Mining: Concepts and Techniques
Mining Association Rules in Large Databases
Find Patterns Having P From P-conditional Database
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Mining Frequent Patterns without Candidate Generation
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Mining Association Rules in Large Databases
Presentation transcript:

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis

 Apriori = candidate generate-and-test.  Problems  Too many candidates to generate, e.g. if there are 10 4 frequent 1-itemsets, then more than 10 7 candidate 2-itemsets.  Each candidate implies expensive operations, e.g. pattern matching and subset checking.  Can candidate generation be avoided ? Yes, frequent pattern (FP) grow algorithm. FP grow algorithm

{} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 min_support = 3 TIDItems bought items bought (f-list ordered) 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o, w}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} 1.Scan the database once, and find the frequent items. Record them as the frequent 1-itemsets. 2.Sort frequent items in frequency descending order 3.Scan the database again and construct the FP-tree. f-list=f-c-a-b-m-p. FP grow algorithm

 For each frequent item in the header table  Traverse the tree by following the corresponding link.  Record all of prefix paths leading to the item. This is the item’s conditional pattern base. Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 FP grow algorithm Frequent itemsets found: f: 4, c:4, a:3, b:3, m:3, p:3 

FP grow algorithm  For each conditional pattern base  Start the process again (recursion). m-conditional pattern base: fca:2, fcab:1 {} f:3 c:3 a:3 m-conditional FP-tree  am-conditional pattern base: fc:3 {} f:3 c:3 am-conditional FP-tree  cam-conditional pattern base: f:3 {} f:3 cam-conditional FP-tree Frequent itemset found: fcam: 3 Backtracking !!! Frequent itemsets found: fam: 3, cam:3 Frequent itemsets found: fm: 3, cm:3, am:3   

FP grow algorithm

With small threshold there are many and long candidates, which implies long runtime due to expensive operations such as pattern matching and subset checking. FP grow algorithm

 Exercise Run the FP grow algorithm on the following database (min_sup=2) FP grow algorithm TIDItems bought 100{a,b,e} 200{b,d} 300 {b,c} 400 {a,b,d} 500 {a,c} 600 {b,c} 700 {a,c} 800 {a,b,c,e} 900 {a,b,c}

Prefix vs. suffix. FP grow algorithm

 Frequent itemsets can be represented as a tree (the children of a node are a subset of its siblings).  Different algorithms traverse the tree differently, e.g.  Apriori algorithm = breadth first.  FP grow algorithm = depth first.  Breadth first algorithms cannot typically store the projections and, thus, have to scan the databases more times.  The opposite is typically true for depth first algorithms.  Breadth (resp. depth) is typically less (resp. more) efficient but more (resp. less) scalable. Frequent itemsets min_sup=3

 Milk  cereal [40%, 66.7%] is misleading/uninteresting: The overall % of students buying cereal is 75% > 66.7% !!!  Milk  not cereal [20%, 33.3%] is more accurate (25% < 33.3%).  Measure of dependent/correlated events: lift for A  B MilkNot milkSum (row) Cereal Not cereal Sum(col.) Correlation analysis lift >1 positive correlation, lift <1 negative correlation, = 1 independence

Correlation analysis Generalization to A,B  C: Exercise Find an example where A  C has lift(A,C) < 1, but A,B  C has lift(A,B,C) > 1.