A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008.

Slides:



Advertisements
Similar presentations
Unlocking the Mysteries of Hypothesis Testing
Advertisements

Introduction to Hypothesis Testing
There are two statistical tests for mean: 1) z test – Used for large samples (n ≥ 30) 1) t test – Used for small samples (n < 30)
Economics 105: Statistics Go over GH 11 & 12 GH 13 & 14 due Thursday.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Using Statistics in Research Psych 231: Research Methods in Psychology.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Student’s t statistic Use Test for equality of two means
15-1 Introduction Most of the hypothesis-testing and confidence interval procedures discussed in previous chapters are based on the assumption that.
Statistics 270– Lecture 25. Cautions about Z-Tests Data must be a random sample Outliers can distort results Shape of the population distribution matters.
Quantitative Business Methods for Decision Making Estimation and Testing of Hypotheses.
Nonparametrics and goodness of fit Petter Mostad
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Inferential Statistics
Hypothesis Testing.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Chapter 8 Hypothesis Testing “Could these observations really have occurred by chance?” Shannon Sprott GEOG /3/2010.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
NONPARAMETRIC STATISTICS
Independent samples- Wilcoxon rank sum test. Example The main outcome measure in MS is the expanded disability status scale (EDSS) The main outcome measure.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Overview Basics of Hypothesis Testing
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Hypothesis testing Chapter 9. Introduction to Statistical Tests.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
1 10 Statistical Inference for Two Samples 10-1 Inference on the Difference in Means of Two Normal Distributions, Variances Known Hypothesis tests.
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
1 Statistical Significance Testing. 2 The purpose of Statistical Significance Testing The purpose of Statistical Significance Testing is to answer the.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Outline Goodness of Fit test Test of Independence.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
CHAPTERS HYPOTHESIS TESTING, AND DETERMINING AND INTERPRETING BETWEEN TWO VARIABLES.
© Copyright McGraw-Hill 2004
Formulating the Hypothesis null hypothesis 4 The null hypothesis is a statement about the population value that will be tested. null hypothesis 4 The null.
Warm-up Wednesday, You are a scientist and you finished your experiment. What do you do with your data? Discuss with your group members and we.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1.  What inferential statistics does best is allow decisions to be made about populations based on the information about samples.  One of the most useful.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Testing Differences in Means (t-tests) Dr. Richard Jackson © Mercer University 2005 All Rights Reserved.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
1 Underlying population distribution is continuous. No other assumptions. Data need not be quantitative, but may be categorical or rank data. Very quick.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Slide 9-1 Copyright © 2012, 2008, 2005 Pearson Education, Inc. Chapter 9 Hypothesis Tests for One Population Mean.
Testing Claims about a Population Mean Objective: test a claim.
Ex St 801 Statistical Methods Part 2 Inference about a Single Population Mean (HYP)
Statistics: Unlocking the Power of Data Lock 5 Section 6.6 Test for a Single Mean.
Review and Preview and Basics of Hypothesis Testing
P-value Approach for Test Conclusion
Unlocking the Mysteries of Hypothesis Testing
Presentation transcript:

A Test Paradigm for Detecting Changes in Transactional Data Streams Willie Ng and Manoranjan Dash DASFAA 2008

Outline Introduction Algorithm for Change Detection Statistical Test Experimental Evaluation Related work Conclusion

Introduction A pattern is considered useful if it can be used to help a person to achieve his goal. Unfortunately, traditional association rule mining (ARM) algorithms only consider if an item is absence or present in a transaction. Utility mining-- utility refers to the measuring of how valuable an itemset is. Discoverer & verifier

Problem Statement

Preliminaries We denote by AHI the set of all high utility itemsets.

Two complementary hypotheses The null hypothesis, H 0 – no detectable change The alternative hypothesis, H 1 – there is a detectable change

Hoeffding Bound be used to compute a sample size n so that a given statistics on the sample is no more than ε away from the same statistics on the entire database, where ε is a tolerated error.

Statistical Test Sample 1: Item A - Utility 30, Item B - Utility 70 Sample 2: Item A - Utility 70, Item B - Utility 30 Paired t-test – mean difference Nonparametric Tests – sign test and Wilcoxon signed-rank test Chi-square Test – the p-value is computed to be 0 with 1 (r-1)*(c-1) degrees of freedom

Chi-square Test where O i is the observed count, and E i is the expected count

Experimental Evaluation --Test for False Alarm

Experimental Evaluation --Test for Changes

Experimental Evaluation --Test for Sensitive

Related Work For data stream mining, there are three types of data stream mining models: – landmark, – sliding windows – and damped.

Conclusions A change detector, ACD, incorporates a statistical tool and is used to detect significant changes in a data stream. It’s not good enough for stream in binary form Outlier