Maintaining Frequent Itemsets over High-Speed Data Streams

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Frequent Closed Pattern Search By Row and Feature Enumeration
Adaptive Frequency Counting over Bursty Data Streams Bill Lin, Wai-Shing Ho, Ben Kao and Chun-Kit Chui Form CIDM07.
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
FP-Growth algorithm Vasiljevic Vladica,
1 Finding Recent Frequent Itemsets Adaptively over Online Data Streams J. H, Chang and W.S. Lee, in Proc. Of the 9th ACM International Conference on Knowledge.
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
COMP53311 Data Stream Prepared by Raymond Wong Presented by Raymond Wong
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
1 Synthesizing High-Frequency Rules from Different Data Sources Xindong Wu and Shichao Zhang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
Rule Generation [Chapter ]
NGDM’02 1 Efficient Data-Reduction Methods for On-line Association Rule Mining H. Bronnimann B. ChenM. Dash, Y. Qiao, P. ScheuermannP. Haas Polytechnic.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.
False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams Jeffrey Xu Yu, Zhihong Chong, Hongjun Lu, Aoying.
A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Adaptive Mining Techniques for Data Streams using Algorithm Output Granularity Mohamed Medhat Gaber, Shonali Krishnaswamy, Arkady Zaslavsky In Proceedings.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
1 Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences Ajumobi Udechukwu, Ken Barker, Reda Alhajj Proceedings of.
1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶.
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international.
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Frequency Counts over Data Streams
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
New ideas on FP-Growth and batch incremental mining with FP-Tree
Sequential Pattern Mining Using A Bitmap Representation
Data Mining: Concepts and Techniques
Frequent Pattern Mining
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
CARPENTER Find Closed Patterns in Long Biological Datasets
False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams Jeffrey Xu Yu , Zhihong Chong(崇志宏) , Hongjun Lu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Action Association Rules Mining
Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,
Mining Frequent Itemsets over Uncertain Databases
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz, ICDM 2004.
Approximate Frequency Counts over Data Streams
Finding Frequent Itemsets by Transaction Mapping
K.L Ong, W. Li, W.K. Ng, and E.P. Lim
Association Analysis: Basic Concepts
Presentation transcript:

Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Proceeding of The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006) Advisor:Jia-Ling Koh Speaker:Chun-Wei Hsieh 05/26/06

Introduction Existing approximation techniques for mining frequent itemsets are mainly false-positive using an error parameter, ε ε= γ* min_sup, 0 γ 1 ε is smaller a larger number of itemsets to be maintained ε is larger lower accuracy

Introduction MineSW progressively increasing ε a false-negative approach for recent data (sliding window) for batches , not for transactions

Introduction w_size = 2 min_sup,σ = 3/5 sup(bc, W1) = 4, sup(bc, W2) = 2 the set of FIs over W1 is {b, c, bc} the set of FIs over W2 is {b, c, d, bd} W1 W2

Preliminaries A computed support A time interval The computed support of an itemset X over a time interval T the number of transactions that arrive in a time interval T

MST Function Requiring the support of the itemset to progressively as it stays longer in a window K : the time of a itemset stays in a window, MST Function

MST Function For example, Let σ = 0.01, r = 0.1 and w = 10, 2000 transactions in each time unit. r1=[(1-0.1)/10](1-1)+0.1=0.1 m1= 0.01*2000*1=20 r2=[(1-0.1)/10](2-1)+0.1=0.19 m2= 0.01*2000*2=40 r3=[(1-0.1)/10](3-1)+0.1=0.28 m3= 0.01*2000*3=60 r4=[(1-0.1)/10](4-1)+0.1=0.37 m4= 0.01*2000*4=80 r5=[(1-0.1)/10](5-1)+0.1=0.46 m5= 0.01*2000*5=100 r6=[(1-0.1)/10](6-1)+0.1=0.55 m6= 0.01*2000*6=120 r7=[(1-0.1)/10](7-1)+0.1=0.64 m7= 0.01*2000*7=140 r8=[(1-0.1)/10](8-1)+0.1=0.73 m8= 0.01*2000*8=160 r9=[(1-0.1)/10](9-1)+0.1=0.82 m9= 0.01*2000*9=180 r10=[(1-0.1)/10](10-1)+0.1=0.91 m10= 0.01*2000*10=200

MST Function ab and cd are retained in windows with Lossy Counting(ε=20) With MineSW, the computed support of ab: t1:3, sup(ab):3 > minsup(1)= 2 t2:0, sup(ab):3 < minsup(2)= 8 : : t7:4, sup(ab):4 > minsup(1)= 2 t8:7, sup(ab):11 > minsup(2)= 8

MineSW Algorithm Mining FIs from each batch with γσ Using a prefix tree to keep the FI and semi-FI of the window The node in the prefix tree has: item uid(X) sup(X)

MineSW Algorithm When the first window is not full:

MineSW Algorithm processing the expiring time unit

MineSW Algorithm processing the new time unit

MineSW Algorithm Pruning and Outputting

Approximation Quality The error bound of the computed support of a semi-frequent itemset X over T k : The set of false-negatives are

Experiments Compare with LCSW 900 MHz CPU 4G RAM Data stream: t10i4, t15i6 t: the average size of a transaction i : a maximal frequent itemset Stream :3M transactions W_Size: 20 time units 1 time unit : 50K transactions

Experiments

Experiments

Experiments

Experiments