A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008
Outline Introduction Algorithm for Change Detection Statistical Test Experimental Evaluation Related work Conclusion
Introduction A pattern is considered useful if it can be used to help a person to achieve his goal. Unfortunately, traditional association rule mining (ARM) algorithms only consider if an item is absence or present in a transaction. Utility mining-- utility refers to the measuring of how valuable an itemset is. Discoverer & verifier
Problem Statement
Preliminaries We denote by AHI the set of all high utility itemsets.
Two complementary hypotheses The null hypothesis, H 0 – no detectable change The alternative hypothesis, H 1 – there is a detectable change
Hoeffding Bound be used to compute a sample size n so that a given statistics on the sample is no more than ε away from the same statistics on the entire database, where ε is a tolerated error.
Statistical Test Sample 1: Item A - Utility 30, Item B - Utility 70 Sample 2: Item A - Utility 70, Item B - Utility 30 Paired t-test – mean difference Nonparametric Tests – sign test and Wilcoxon signed-rank test Chi-square Test – the p-value is computed to be 0 with 1 (r-1)*(c-1) degrees of freedom
Chi-square Test where O i is the observed count, and E i is the expected count
Experimental Evaluation --Test for False Alarm
Experimental Evaluation --Test for Changes
Experimental Evaluation --Test for Sensitive
Related Work For data stream mining, there are three types of data stream mining models: – landmark, – sliding windows – and damped.
Conclusions A change detector, ACD, incorporates a statistical tool and is used to detect significant changes in a data stream. It’s not good enough for stream in binary form Outlier