Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Slides:



Advertisements
Similar presentations
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Advertisements

Adaptive Frequency Counting over Bursty Data Streams Bill Lin, Wai-Shing Ho, Ben Kao and Chun-Kit Chui Form CIDM07.
STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM /10/021Chen.
Resource-oriented Approximation for Frequent Itemset Mining from Bursty Data Streams SIGMOD’14 Toshitaka Yamamoto, Koji Iwanuma, Shoshi Fukuda.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Marzena Kryszkiewicz DaWak 2009 Non-Derivable Item Set and Non- Derivable Literal Set Representations of Patterns Admitting Negation.
Verify and mining frequent patterns from large windows over data streams Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo ICDE2008.
COMP53311 Data Stream Prepared by Raymond Wong Presented by Raymond Wong
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.
Maintenance of Discovered Association Rules S.D.LeeDavid W.Cheung Presentation : Pablo Gazmuri.
What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.
Statistical Analysis of Transaction Dataset Data Visualization Homework 2 Hongli Li.
What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically By Graham Cormode & S. Muthukrishnan Rutgers University, Piscataway NY Presented by.
Fast Algorithms for Association Rule Mining
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
7-2 Estimating a Population Proportion
Horng-Chyi HorngStatistics II41 Inference on the Mean of a Population - Variance Known H 0 :  =  0 H 0 :  =  0 H 1 :    0, where  0 is a specified.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Fast and Robust Worm Detection Algorithm Tian Bu Aiyou Chen Scott Vander Wiel Thomas Woo bearhsu.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/IrwinCopyright © 2012 by The McGraw-Hill Companies, Inc.
Frequent Pattern Mining from Time-Fading Streams of Uncertain Data Carson Kai-Sang Leung and Fan Jiang DaWaK
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
ICDE 2012 Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data Yongxin Tong 1, Lei Chen 1, Bolin Ding 2 1 Department of Computer.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 7 Estimates and Sample Sizes 7-1 Review and Preview 7-2 Estimating a Population Proportion.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.
False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams Jeffrey Xu Yu, Zhihong Chong, Hongjun Lu, Aoying.
A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
Association Rule Mining
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶.
1 Efficient Data Reduction Methods for Online Association Rule Discovery -NGDM’02 Herve Bronnimann, Bin Chen, Manoranjan Dash, Peter Haas, Yi Qiao, Peter.
Mining of Massive Datasets Ch4. Mining Data Streams.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Frequency Counts over Data Streams
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
The Stream Model Sliding Windows Counting 1’s
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams Jeffrey Xu Yu , Zhihong Chong(崇志宏) , Hongjun Lu.
Mining Frequent Itemsets over Uncertain Databases
Daniela Stan Raicu School of CTI, DePaul University
Daniela Stan Raicu School of CTI, DePaul University
Farzaneh Mirzazadeh Fall 2007
Approximate Frequency Counts over Data Streams
By: Ran Ben Basat, Technion, Israel
Maintaining Frequent Itemsets over High-Speed Data Streams
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Presentation transcript:

Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen

Outline Motivation Objective Definition Adaptive Load Shedding in Data Stream Performace Results Conclusion 2008/3/192Yi-Chun Chen

Motivation Finding frequent itemsets plays an important role in analyzing data streams Only assuming that the machinery itself is fast enough to handle all incoming transactions without incurring any unwanted latencies 2008/3/19Yi-Chun Chen3

(Cont.) The arrival rate of data streams usually exceeds the system capacity Algorithms mining from data streams must cope with system overload situations 2008/3/19Yi-Chun Chen4

Objective Given a processing capacity C of a mining system and a data stream DS with high arrival rates Load(DS) : the workload of the system If, a load shedding is invoked Guarantee Discover a set of patterns closely approximates to the set of actual frequent itemsets 2008/3/19Yi-Chun Chen5

(Cont.) How to determine overload situations? How much load to shed? How to approximate frequent patterns under the introduction of load shedding? 2008/3/19Yi-Chun Chen6

Definition : the occurrence count of X in DS up to the transaction MFIs: maximal frequent itemset 2008/3/19Yi-Chun Chen7

Adaptive Load Shedding in Data Streams Overload Detection Load Shedding by Sampling Transactions 2008/3/19Yi-Chun Chen8

Overload Detection To quickly estimate the system workload, we propose an approximate method on MFIs –MFIs also contains all frequent itemsets –The # of MFIs is smaller than the # of frequent itemsets –The support of MFIs is always closest to 2008/3/19Yi-Chun Chen9

(Cont.) load coefficient: –k be the # of MFIs in a transaction – be a MFI, where Suppose we measure the above statistics for n transactions over one time unit –r be the current rate of the data stream 2008/3/19Yi-Chun Chen10

Load Shedding by Sampling Transactions In order to estimate how much load to shed –P be a parameter expressing the fraction of transactions that should be discarded –Suppose P < 1, then we use Hoeffding bound to discard transactions and to approximate frequent patterns 2008/3/19Yi-Chun Chen11

(Cont.) Hoeffding bound: –, – r be the number of times that occurs in these transactions –sup(X) = p : the true support of X – : the estimated support of X –We want to satisfy the inequality, so the required number of sampling transactions is at least 2008/3/19Yi-Chun Chen12

(Cont.) Sample batch: each incoming transaction is chosen with probability P until we sample enough transactions Local patterns: all freq. itemsets in this sample batch are found only within part of the stream Global freq. itemsets in the entire stream 2008/3/19Yi-Chun Chen13

(Cont.) Due to the non-uniform distribution of the stream –False global patterns –Significant support : the max. support error of each pattern : frequent : sub-frequent : infrequent 2008/3/19Yi-Chun Chen14 Significant patterns

(Cont.) The required number of sampling transactions is at least If and,then is too huge we assume that each itemset appearing more than 0.01%,then if, then every itemset will be chosen, 2008/3/19Yi-Chun Chen15

Performance Results Accuracy Measurements Adaptability Recall: 找到的 true freq. patterns / 實際上是 true freq. patterns Precision: 找到 true freq. patterns / 找到的 total freq. patterns Synthetic: T5I3D1000K, T8I4D1000K with unique items Real-life: “BMS-POS” T6.5 D with 1657 distinct items Fix, select 2008/3/19Yi-Chun Chen16

2008/3/19Yi-Chun Chen17

2008/3/19Yi-Chun Chen18

Conclusion To address the problem of finding frequent patterns from data streams where the mining system may not keep up with the arrival reat of the stream 2008/3/19Yi-Chun Chen19