Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Association Rule Mining
Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Fast Algorithms For Hierarchical Range Histogram Constructions
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Adaptive Frequency Counting over Bursty Data Streams Bill Lin, Wai-Shing Ho, Ben Kao and Chun-Kit Chui Form CIDM07.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
COMP53311 Data Stream Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Exploratory Tools for Follow-up Studies to Microarray Experiments Kaushik Sinha Ruoming Jin Gagan Agrawal Helen Piontkivska Ohio State and Kent State.
Data Mining Association Analysis: Basic Concepts and Algorithms
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Maintenance of Discovered Association Rules S.D.LeeDavid W.Cheung Presentation : Pablo Gazmuri.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Fast Algorithms for Association Rule Mining
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
NGDM’02 1 Efficient Data-Reduction Methods for On-line Association Rule Mining H. Bronnimann B. ChenM. Dash, Y. Qiao, P. ScheuermannP. Haas Polytechnic.
Approximation schemes Bin packing problem. Bin Packing problem Given n items with sizes a 1,…,a n  (0,1]. Find a packing in unit-sized bins that minimizes.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
August 21, 2002VLDB Gurmeet Singh Manku Frequency Counts over Data Streams Frequency Counts over Data Streams Stanford University, USA.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Research issues on association rule mining Loo Kin Kong 26 th February, 2003.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers Yinghui Wang
Association Rule Mining
Association Analysis (3)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
1 Efficient Data Reduction Methods for Online Association Rule Discovery -NGDM’02 Herve Bronnimann, Bin Chen, Manoranjan Dash, Peter Haas, Yi Qiao, Peter.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Mining Data Streams (Part 1)
Frequency Counts over Data Streams
Data Science Algorithms: The Basic Methods
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Approximate Frequency Counts over Data Streams
Association Analysis: Basic Concepts
Data Mining CSCI 307, Spring 2019 Lecture 18
Presentation transcript:

Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Outline Motivation Sensor set data Finding frequency counts of itemsets from sensor set data Future work

Stock quotes Closing prices of some HK stocks … Date / / / / / / / / / / /

Stock quotes Intra-day stock price of TVB (0511) on 23rd June 2003 (Source: quamnet.com)

Motivation Fluctuation of the price of a stock may be related to that of another stock or other conditions Online analysis tools can help to give more insight on such variations The case of stock market can be generalized... We use “ sensors ” to monitor some conditions, for example: We monitor the prices of stocks by getting quotations from a finance website We monitor the weather by measuring temperature, humidity, air pressure, wind, etc.

Sensors Properties of a sensor include: A sensor reports values, either spontaneously or by request, reflecting the state of the condition being monitored Once a sensor reports a value, the value remains valid until the sensor reports again The lifespan of a value is defined as the length of time when the value is valid The value reported must be one of the possible states of the condition The set of all possible states of a sensor is its state set time ss ss ss ss ss ss t6t5t1t2t4t3

Sensor set data A set of sensors (say, n of them) is called a sensor set At any time, we can obtain an n-tuple, which is composed of the values of the n sensors, attached with a time stamp where: t is the time when the n-tuple is obtained v x is the value of the x-th sensor If the n sensors have the same state set, we call the sensor set homogeneous

Mining association rules from sensor set data An association rule is a rule, satisfying certain support and confidence restrictions, in the form X  Y where X and Y are two disjoint itemsets We redefine the support to reflect the time factor in sensor set data supp(X) =  lifespan(X) / length of history

Transformations of sensor-set data The n-tuples need transformation for finding frequent itemsets Transformation 1: Each (z x, s y ) pair, where z x is a sensor and s y a state for z x, is treated as an item in traditional association rule mining Hence, the i-th n-tuple is transformed as where t i is the timestamp of the i-th n-tuple Thus, association rules of the form {(z 1, s 1 ), (z 2, v 2 ),..., (z n, v n )}  {(z x, v x )} can be obtained

Transformations of sensor-set data Transformation 2: Assuming a homogeneous sensor set, each s in the state set is treated as an item in traditional association rule mining The i-th n-tuple is transformed as where t i is the timestamp of the i-th n-tuple, e x is a boolean value, showing whether the state s x exists in the tuple Thus, association rules of the form {s 1, s 2,..., s j }  {s k } can be obtained

The Lossy Counting (LC) Algorithm for items User specifies the support threshold s and error tolerance  Transactions of single item are conceptually kept in buckets of size  1/  At the end of each bucket, counts smaller than the error tolerance are discarded Counts, kept in a data structure D, of items are kept in the form ( e, f,  ), where e is the item f is the frequency of e since the entry is inserted in D  is the maximum count of e before the entry is added to D

The Lossy Counting (LC) Algorithm for items 1. D   ; N  0 2. w   1/  ; b  1 3. e  next transaction; N  N if (e,f,  ) exists in D do 5. f  f else do 7. insert (e,1,b-1) to D 8. endif 9. if N mod w = 0 do 10. prune(D, b); b  b endif 12. Goto 3; D: The set of all counts N: Curr. len. of stream e: Transaction (of item) w: Bucket width b: Current bucket id

The Lossy Counting (LC) Algorithm for items 1. function prune(D, b) 2. for each entry (e,f,  ) in D do 3. if f +   b do 4. remove the entry from D 5. endif

The Lossy Counting (LC) Algorithm for itemsets Transactions are kept in buckets Multiple (say m) buckets are processed at a time. The value m depends on the amount of memory available For each transaction E, essentially, every subset of E is enumerated and treated as if an item in LC algorithm for items

Extending the LC Algorithm for sensor-set data We can extend the LC Algorithm for finding approximate frequency counts of itemsets for SSD: Instead of using a fixed sized bucket, size of which is determined by , we can use a bucket which can hold an arbitrary number of transactions During the i-th bucket, when a count is inserted to D, we set  =  T 1,i-1 where T i,j denotes the total time elapsed since bucket i up to bucket j At the end of the i-th bucket, we prune D by removing the counts such that  +f   T 1,i

Extending the LC Algorithm for sensor-set data 1. D   ; N  0 2. w  (user defined value); b  1 3. E  next transaction; N  N foreach subset e of E 5. if (e,f,  ) exists in D do 6. f  f else do 8. insert (e,1,  T 1,b-1 ) to D 9. endif 10. if N mod w = 0 do 11. prune(D, T 1,b ); b  b endif 13. Goto 3; D: The set of all counts N: Curr. len. of stream E: Transaction (of itemset) w: Bucket width b: Current bucket id

Observations The choice of w can affect the efficiency of the algorithm A small w may cause the pruning procedure being invoked too frequently A big w may cause that many transactions being kept in the memory It may be possible to derive a good w w.r.t. mean lifespan of the transactions If the lifespans of the transactions are short, potentially we need to prune D frequently Difference between adjacent transactions may be little

Future work Evaluate the efficiency of the LC Algorithm for sensor-set data Investigate how to exploit the observation that adjacent transactions may be very similar

Q & A