Statistical Significance Test

Slides:



Advertisements
Similar presentations
Statistical Techniques I
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Chapter 16 Introduction to Nonparametric Statistics
ANOVA: Analysis of Variation
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Nonparametric tests and ANOVAs: What you need to know.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
© 2002 Prentice-Hall, Inc.Chap 8-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 8 Two Sample Tests with Numerical Data.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
© 2004 Prentice-Hall, Inc.Chap 10-1 Basic Business Statistics (9 th Edition) Chapter 10 Two-Sample Tests with Numerical Data.
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Basic Business Statistics (9th Edition)
Student’s t statistic Use Test for equality of two means
Hypothesis Testing :The Difference between two population mean :
Chapter 12 Chi-Square Tests and Nonparametric Tests
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Hypothesis Testing for Variance and Standard Deviation
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
Pengujian Hipotesis Dua Populasi By. Nurvita Arumsari, Ssi, MSi.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Nonparametric Statistics. In previous testing, we assumed that our samples were drawn from normally distributed populations. This chapter introduces some.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 12-1 Chapter 12 Chi-Square Tests and Nonparametric Tests Statistics for Managers using.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
366_7. T-distribution T-test vs. Z-test Z assumes we know, or can calculate the standard error of the distribution of something in a population We never.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lecture 8 Estimation and Hypothesis Testing for Two Population Parameters.
ANOVA: Analysis of Variation
Chapter 11 Analysis of Variance
Comparing Multiple Groups:
ANOVA: Analysis of Variation
Comparing Systems Using Sample Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Statistics for Managers using Microsoft Excel 3rd Edition
Inference about Two Means - Independent Samples
STAT Single-Factor ANOVA
Statistics for Managers Using Microsoft Excel 3rd Edition
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Estimation & Hypothesis Testing for Two Population Parameters
Comparing Three or More Means
Y - Tests Type Based on Response and Measure Variable Data
STEM Fair: Statistical Analysis
Data Analysis and Interpretation
Lesson Inferences about the Differences between Two Medians: Dependent Samples.
Comparing Multiple Groups: Analysis of Variance ANOVA (1-way)
Environmental Modeling Basic Testing Methods - Statistics
Hypothesis Tests for a Population Mean in Practice
Chapter 11 Analysis of Variance
Chapter 11: The ANalysis Of Variance (ANOVA)
Do you know population SD? Use Z Test Are there only 2 groups to
One way ANALYSIS OF VARIANCE (ANOVA)
Nonparametric Statistics
Tests of inference about 2 population means
Statistical Inference for the Mean: t-test
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Statistical Significance Test

Why Statistical Significance Test Suppose we have developed an EC algorithm A We want to compare with another EC algorithm B Both algorithms are stochastic How can we be sure that A is better than B? Assume we run A and B once, and get the results x and y, respectively. If x < y (minimisation), is it because A is better than B, or just because of randomness?

Why Statistical Significance Test Treat a stochastic algorithm as a random number generator, and its output follows some distribution The random output depends on the algorithm and random seed Collect samples: run algorithms many times independently (using different random seeds) Carry out statistical significance tests based on the collected samples

Statistical Significance Test Parametric/Non-parametric: assume/do not assume the random variables follow normal distribution Paired: Unpaired Paired Parametric T-test/z-test Paired t-test Non-parametric Wilcoxon rank sum Wilcoxon signed rank

One-sample z-test The z-test is used when 𝑛 ≥30 Test the population mean using The sample mean The sample standard deviation (σ) The number of samples z < -2 z > 2

One-sample z-test (Null) hypothesis: Reject the hypothesis if the samples do not support it statistically (z < -2 or z > 2 under significance level of 0.05. Note: the exact critical value is 1.96 at 0.05 significance level. We use 2 as a rough value.) P-value for two-tailed for lower-tailed for upper-tailed Reject the hypothesis if p-value < significance level

One-sample t-test It is used when 𝑛<30 Assume the population follows a normal distribution Almost the same as one-sample z-test The sample mean of a random variable does not follow a normal distribution, but a t-distribution depends on n (degree of freedom)

Two-sample t-test (Null) hypothesis: Reject the hypothesis if the samples do not support it statistically Unpaired: Paired: Calculate the difference Use one-sample t-test with null hypothesis

Unpaired vs Paired y1 = x*x – x + N(0, 0.1) (1) Step 1: generate 30 random x values for y1 from normal distribution N(0, 1) Step 2: obtain 30 y1 values using the 30 x values and Eq. (1) Step 3: generate 30 random x values for y2 from normal distribution N(0, 1) Step 4: obtain 30 y2 values using the 30 x values and Eq. (2)

Unpaired vs Paired Which one is smaller? Red or green? P-value = 0.56

Unpaired vs Paired y1 = x*x – x + N(0, 0.1) (1) Step 1: generate 30 random x values for both y1 and y2 from normal distribution N(0, 1) Step 2: obtain 30 y1 and 30 y2 values using the 30 x values and Eqs. (1) and (2)

Unpaired vs Paired Which one is smaller? Red or green? P-value = 0.00

Unpaired vs Paired If we can eliminate the effect of all the other factors, then paired tests can give us stronger conclusions Example: for the compared algorithms, use the same random seed to generate the same initial population At least the results will not be affected by the initial population

Wilcoxon Rank Sum Test Do not require the distribution to be normal (non-parametric) or a large sample Unpaired (Null) hypothesis: , comparing two medians. U-statistic for each variable: number of wins out of all pairwise contests (count 0.5 for each tie) Check table for p-value Reject the hypothesis if p-value < significance level

Wilcoxon Signed Rank Test Non-parametric Paired Steps: 1. Calculate the sign and absolute value of 2. Exclude the pairs with 3. Sort the pairs in the increasing order of 4. Get the rank from the sorted pairs: 5. Calculate the statistic 6. Reject the hypothesis if Check table

Wilcoxon Signed Rank Test Fail to reject

Using Statistical Significance Tests R t.test(y1, y2, paired=TRUE/FALSE) wilcox.test(y1, y2, paired=TRUE/FALSE) Matlab [h,p] = ttest(x,y) [p,h] = ranksum(x,y) [p,h] = signrank(x,y) Java Apache Commons Math Library

Compare p population means (Null) hypothesis: μ 1 = μ 2 = … = μ 𝑝 Method : One-Way ANOVA (Analysis of Variance) Compare the performance of p algorithms Model assumptions : Data from the 𝑖th algorithm are assumed to come from a normal distribution Common variance Data are independent

ANOVA Test statistic: 𝐹 ~ 𝐹(𝑝−1, 𝑛−𝑝) under the null hypothesis 𝐹= 𝑀𝑆𝑅 𝑀𝑆𝐸 𝐹 ~ 𝐹(𝑝−1, 𝑛−𝑝) under the null hypothesis Demonstrate in R

R code test.data<-data.frame( Time = c(5, 4, 3, 4, 2, 3, 5, 3, 6, 4, 6, 0, 7, 7, 1, 7, 7, 0, 4, 6, 2, 6, 6, 1, 6, 4, 0 ), Method = factor(rep(c("A", "B", "C"), 9))) test.data tapply(test.data$Time, test.data$Method, mean) m1<-lm(Time ~ Method, data=test.data) summary(m1) anova(m1)

R code-continued > tapply(test.data$Time, test.data$Method, mean) A B C 5.333333 5.000000 1.777778 > anova(m1) Analysis of Variance Table Response: Time Df Sum Sq Mean Sq F value Pr(>F) Method 2 69.407 34.704 11.974 0.0002473 *** Residuals 24 69.556 2.898 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > P-value

ANOVA Results P-value: if p-value < α (e.g., 0.05), reject the null hypothesis Otherwise, do not reject the null hypothesis What does it mean when the null hypothesis is rejected? Not all algorithms have the same mean, that is, at least one mean is different.

Multiple Comparisons Tukey Test: Suppose there are p=3 algorithm Test 𝐻 0 : μ 1 = μ 2 versus 𝐻 1 : μ 1 ≠ μ 2 𝐻 0 : μ 1 = μ 3 versus 𝐻 1 : μ 1 ≠ μ 3 𝐻 0 : μ 2 = μ 3 versus 𝐻 1 : μ 2 ≠ μ 3

R code m1.anova<-aov(Time ~ Method, data=test.data) Mult.test <- TukeyHSD(m1.anova, conf.level=0.95) Mult.test library(gplots) attach(test.data) jpeg("graph.jpeg") plotmeans(Time ~ Method,xlab="Method", ylab="Time", main="Mean Plot\nwith 95% CI")

R code-continued > Mult.test Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Time ~ Method, data = test.data) $Method diff lwr upr p adj B-A -0.3333333 -2.337448 1.670781 0.9096407 C-A -3.5555556 -5.559670 -1.551441 0.0005013 C-B -3.2222222 -5.226337 -1.218108 0.0014204 P-value to compare methods B and A P-value to compare methods C and B

R code -continued > library(gplots) > attach(test.data) > jpeg("graph.jpeg") > plotmeans(Time ~ Method,xlab="Method", + ylab="Time", main="Mean Plot\nwith 95% CI")