STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM 2006 2007/10/021Chen.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Mining Association Rules
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
gSpan: Graph-based substructure pattern mining
Mining Multiple-level Association Rules in Large Databases
Frequent Closed Pattern Search By Row and Feature Enumeration
Maintaining Sliding Widow Skylines on Data Streams.
Adaptive Frequency Counting over Bursty Data Streams Bill Lin, Wai-Shing Ho, Ben Kao and Chun-Kit Chui Form CIDM07.
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams YING YANG, XINDONG WU, XINGQUAN ZHU Data Mining and Knowledge.
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.
Verify and mining frequent patterns from large windows over data streams Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo ICDE2008.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Exam 1 – 115a. Basic Probability For any event E, The union of two sets A and B, A  B, includes items that are in either A or B. The intersection, A.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
Mining Association Rules
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Mining Association Rules
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Ch5 Mining Frequent Patterns, Associations, and Correlations
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
ICDE 2012 Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data Yongxin Tong 1, Lei Chen 1, Bolin Ding 2 1 Department of Computer.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
1 Balanced Trees There are several ways to define balance Examples: –Force the subtrees of each node to have almost equal heights –Place upper and lower.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
SeqStream: Mining Closed Sequential Pattern over Stream Sliding Windows Lei Chang Tengjiao Wang Dongqing Yang Hua Luan ICDM’08 Lei Chang Tengjiao Wang.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
Online Interval Skyline Queries on Time Series ICDE 2009.
Association Analysis (3)
1 Finding Periodic Partial Patterns in Time Series Database Huiping Cao Apr. 30, 2003.
TreeFinder : a first step towards XML data mining Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Alexandre Termier Marie-Christine Michele Sebag.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Frequent Pattern Mining
Advanced Pattern Mining 02
Jiawei Han Department of Computer Science
Market Basket Many-to-many relationship between different objects
Association Rule Mining
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Pei Lee, ICDE 2014, Chicago, IL, USA
Discovering Frequent Poly-Regions in DNA Sequences
Presentation transcript:

STAGGER: Periodicity Mining of Data Streams using Expanding Sliding Windows Mohamed G. Elfeky Walid G.Aref Ahmed K. Elmagarmid ICDM /10/021Chen Yi-Chun

Outline Motivation Previous Approach –SPD algorithm –Max-Subpattern Tree Approximate Incremental Technique Conclusion 2007/10/022Chen Yi-Chun

Motivation abcabcabcabcabc…. p=3 Single sliding window Smaller w, real-time output supported Lager w, long period found possible Real-time output and long period found ………………………. Multiple sliding window is proposed p=3 abc,*b*,a**,… p=3 abc,*b*,a**,… p=3,5 abc,*b*,a**,… Period detection : SPD algorithm is used Patterns mining : max-subpattern tree is used 2007/10/02 3Chen Yi-Chun

Periodicity Detection : the projection of a data stream S according to a period p starting from position l,where n is the length of S. Ex. If S= abcabbabdb outlier 2007/10/024Chen Yi-Chun

Cont. : the number of times the symbol s occurs in two consecutive positions in the data stream Ex. If S = abbaaabaa indicates how often the sysbol s occurs every p timestamps in a data stream S 2007/10/025Chen Yi-Chun

Cont. If a data stream S of length n contains a symbol s and Then s is said to be periodic in S with a period of length p at position l with respect to periodicity threshold Ex. S= abcabbabdb, –The symbol a is periodic with a period of length 3 at position 0 where respect to a periodicity threshold –The pattern a * * is a frequent single periodic pattern of length /10/026Chen Yi-Chun

SPD-algorithm To detect the symbols that are periodic with period length p within S Shift S by p positions, denoted as Ex. If S = a b c a b b a b c b.. = * * * a b c a b b a 2007/10/027Chen Yi-Chun

SPD algorithm in Time-Series a:001 b:010 c:100 (a c c c a b b) P=1 ……….. P=4 ………………………………………… =XXX =YYY Reference “Periodicity Detection in Time Series Databases” [TKDE05] 2007/10/028Chen Yi-Chun

Single Window with SPD Shift 1 slide 2 (a c c c a b b) 2007/10/029Chen Yi-Chun

Multi Windows with SPD output Smaller w, real-time output supported Lager w, long period found possible 2007/10/0210Chen Yi-Chun

Max-Subpattern Tree Reference “Incremental, Online, and Merge Mining of Partial Periodic Patterns in Time-Series Databases” [TKDE04] Reference “Efficient Mining of Partial Periodic Patterns in Time Series Database” [ICDE99] abdeacdfabdjacdsabdxakdy For p= cb /10/02

Approximate Incremental Tech. Streaming data = > maintain the max-subpattern tree over the new data Q=a{b,c}d* Q’=a{b,e}df Intersection with Q and Q’ is abd* (equal to Q without c) Difference from Q’ and abd* are e and f (equal to Q’ adding f and e) The approximation happens on the insertion step 2007/10/02

Hysteresis Threshold A pattern q will lose all the history information as soon as it becomes infrequent. When q becomes frequent again, it will be treated as a newly appeared frequent pattern. As a pattern is –Frequent i.e. the frequency is above the higher threshold –Infrequent i.e. the frequency is below the lower threshold –The frequencies are above the lower threshold are kept in the tree. 2007/10/0213

Conclusion Discover potential periodicity rates in data streams Use a incremental tree-structure to mining periodic patterns Use two thresholds to preserving the history of candidate frequent patterns 2007/10/0214Chen Yi-Chun