1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Association Rule Mining
Mining Association Rules
Recap: Mining association rules from large datasets
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
1 Finding Recent Frequent Itemsets Adaptively over Online Data Streams J. H, Chang and W.S. Lee, in Proc. Of the 9th ACM International Conference on Knowledge.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
COMP53311 Data Stream Prepared by Raymond Wong Presented by Raymond Wong
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Mining Association Rules
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams Jeffrey Xu Yu, Zhihong Chong, Hongjun Lu, Aoying.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
August 21, 2002VLDB Gurmeet Singh Manku Frequency Counts over Data Streams Frequency Counts over Data Streams Stanford University, USA.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
Data Mining Find information from data data ? information.
1 Finding (Recently) Frequent Items in Distributed Data Streams Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, Christopher Olston CMU-CS-05 Speaker.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Data Mining Find information from data data ? information.
Frequency Counts over Data Streams
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
An Efficient Algorithm for Incremental Mining of Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004.
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis: Basic Concepts and Algorithms
Approximate Frequency Counts over Data Streams
Frequent-Pattern Tree
Maintaining Frequent Itemsets over High-Speed Data Streams
Association Analysis: Basic Concepts
Presentation transcript:

1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶

2 Introduction Difficulties of Data Stream Mining Huge High speed Continuous Solution : one-pass algorithm Summary data structure Mines the maximal frequent itemsets

3 Definition Ψ= {i 1, i 2, …, i n } : a set of items W i : basic window i Data stream= [W 1, W 2, …, W N ) : an infinite sequence of basic windows N : the window identifier of the latest basic window Current length of data stream (CL) = |W 1 | + |W 2 | + … + |W N | CL = 3xN W 1 abc bcd acd W 2 cd abd bc W N a b cd ··· time

4 Definition X.tsup : true support of itemset X X.esup : estimated support of itemset X, 1 ≤ X.esup ≤ X.tsup X.CL = |W j |+|W j+1 |+ … +|W N | W j : the first window containing X in the summary data structure S : minimum support ε : maximum support error threshold

5 Data Stream Mining for maximal Frequent Itemsets (DSM-MFI) Step1, reads a window of transactions Step2, constructs and maintains the summary data structure Step3, prunes the infrequent information Step4, searches the maximal frequent itemsets

6 Summary Frequent Itemsets forest (SFI-forest) Composed of a FI-list and a set of SFI-trees SFI-trees item-id, the item identifier esup, the number of transactions reaching the node with the item-id window-id, assigned to a new node of the current basic window identifier node-link, links to the next node with the same item-id in the same SFI-tree

7 Summary Frequent Itemsets forest (SFI-forest) FI-list item-id, the item identifier esup, the number of transactions containing the item window-id, assigned to a new entry of the current basic window identifier head link, links to the root node of the item-id.SFI-tree

8 Summary Frequent Itemsets forest (SFI-forest) Each SFI-tree has a specific opposite frequent item list (OFI-list) OFI-list (item-id, esup, window-id, head link) head link links to the first node carrying the item-id in the SFI-tree

9 Example W 1 abc bcd acd (item-id, esup, window-id, node link) FI-list T = abc (1,1,1) (2,1,1) (3,1,1) X = aX = bX = c Transaction Projection (T)  abc bc c a.OFI-listX = bX = c (2,1,1) (3,1,1) SFI-tree-maintenance (abc)SFI-tree-maintenance (bc)SFI-tree-maintenance (c) a.SFI-tree 1:1:12:1:13:1:1 b.OFI-list (3,1,1) 2:1:13:1:1 b.SFI-tree c.OFI-list c.SFI-tree 3:1:1

10 Example W 1 abc bcd acd (item-id, esup, window-id, node link) FI-list T = bcd (1,1,1) (2,1,1) (3,1,1) X = dX = bX = c Transaction Projection (T)  bcd cd d SFI-tree-maintenance (d)SFI-tree-maintenance (bcd)SFI-tree-maintenance (cd) a.SFI-tree b.OFI-list (3,1,1) 2:1:13:1:1 b.SFI-tree c.OFI-list c.SFI-tree 3:1:1 (2,1,2) (3,1,2) (4,1,1)(4,1,1) X = cX = d (4,1,1)(4,1,1) (4,1,1)(4,1,1) (3,1,2) 3:1:24:1:1 3:1:22:1:2 d.SFI-tree 4:1:1 d.OFI-list

11 Example W 1 abc bcd acd (item-id, esup, window-id, node link) FI-list T = acd (1,1,1) X = dX = aX = c Transaction Projection (T)  acd cd d SFI-tree-maintenance (acd) a.SFI-tree b.SFI-treec.SFI-tree (2,1,2) (3,1,2) (4,1,1) X = dX = c d.SFI-tree 1:1:12:1:13:1:1 (2,1,1) (3,1,1) a.OFI-list (1,1,2) (3,1,3) (4,1,2) (3,1,2) (4,1,1)(4,1,1) 1:1:2 3:1:1 4:1:1

12 Pruning infrequent items from SFI-forest X : 1-itemset in the FI-list if X.esup < X.CL*ε then X and its supersets are deleted from SFI-forest Step 1 deletes item-id.OFI-list item-id.SFI-tree the entry with item-id from the FI-list 2 removes the infrequent item from other OFI-lists by traversing the FI-list

13 Pruning infrequent items from SFI-forest 3 deletes the infrequent item from other SFI-trees 4 reconstructs SFI-trees by reinserting these modified item-suffix transactions or join the remainder subtrees into SFI-tree

14 Example (1,1,3)(2,1,2)(3,1,3)(4,1,3) s = 0.3, ε= 0.2 FI-list a.SFI-tree c.SFI-treeb.SFI-treed.SFI-tree 2:1:1 1:1:3 3:1:1 4:1:1 3:1:2 2:1:2 3:1:2 4:1:1 4:1:3 4:1:2 (3,1,2) (2,1,1) (4,1,1) (3,1,2) (4,1,1) (4,1,2) a.OFI-list b.OFI-list c.OFI-list d.OFI-list a.CL = b.CL = c.CL = d.CL = x 0.2 = 2.4 3:1:1 3:1:3

15 Determining maximal frequent itemsets There are k frequent 1-itemsets, e 1, e 2, …, e k, in the FI-list o 1, o 2, …, o j, the items in the e i.OFI-list Generates a candidate maximal frequent (j+1)-itemset, E = (e i, o 1, o 2, …, o j ) starts from a frequent item with the smallest estimated support traverses the path via node link to count E ’ s estimated support

16 Determining maximal frequent itemsets if E.esup ≥ s . e i.CL then E is MFI else enumerate E into itemsets with size |E|−1 until finds the set of all maximal frequent itemsets with respect to entry e

17 Example (1,1,3)(2,1,2)(3,1,3)(4,1,3) s = 0.3, ε= 0.2 FI-list a.SFI-tree c.SFI-treeb.SFI-treed.SFI-tree 2:1:1 1:1:3 3:1:1 4:1:1 3:1:2 2:1:2 4:1:1 4:1:3 4:1:2 (3,1,2) (2,1,1) (4,1,1) (3,1,2) (4,1,1) (4,1,2) a.OFI-list b.OFI-list c.OFI-list d.OFI-list a.CL = b.CL = c.CL = d.CL = 5 3:1:3 5 x 0.3 = 1.5 Caculate support (bcd)Caculate support (bc) = 1

18 Sliding Window Mining over Data Streams Modifications : uses DSM-MFI algorithm to construct a SFI-forest i for each basic window W i find local maximal frequent itemsets (local MFI i ), all local MFI are stored in a queue global MFI-list store all local MFI from W 1 to W N

19 Sliding Window Mining over Data Streams When basic window N+1 arrives removes the local MFI 1 from the queue subtracts the support of the local MFI 1 from the global MFI uses DSMMFI algorithm to mine all local maximal frequent itemsets of W N+1 Increases the support of global MFI or insert local MFI N+1 into it

20 Experiment 1GHz IBMx24, 384MB, Visual C s = 0.1%, ε= 0.01%. IBM synthetic datasets T10.I5.D1000K T30.I20.D1000K the data is broken into 20 basic windows for simulating the streaming data

21 Experiment