Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 18, 2012.

Slides:



Advertisements
Similar presentations
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Advertisements

Chapter 7 Hypothesis Testing
STA305 Spring 2014 This started with excerpts from STA2101f13
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Relationship Mining Correlation Mining Week 5 Video 1.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 11, 2012.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
C82MST Statistical Methods 2 - Lecture 4 1 Overview of Lecture Last Week Per comparison and familywise error Post hoc comparisons Testing the assumptions.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Analysis of Variance (ANOVA) Statistics for the Social Sciences Psychology 340 Spring 2010.
Multiple testing adjustments European Molecular Biology Laboratory Predoc Bioinformatics Course 17 th Nov 2009 Tim Massingham,
Locating Variance: Post-Hoc Tests Dr James Betts Developing Study Skills and Research Methods (HL20107)
Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )
PSY 1950 Post-hoc and Planned Comparisons October 6, 2008.
Lecture 9: One Way ANOVA Between Subjects
Statistics for the Social Sciences Psychology 340 Spring 2005 Analysis of Variance (ANOVA)
One-way Between Groups Analysis of Variance
K-group ANOVA & Pairwise Comparisons ANOVA for multiple condition designs Pairwise comparisons and RH Testing Alpha inflation & Correction LSD & HSD procedures.
Today Concepts underlying inferential statistics
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Hypothesis Testing:.
Multiple testing in high- throughput biology Petter Mostad.
Significance Testing 10/15/2013. Readings Chapter 3 Proposing Explanations, Framing Hypotheses, and Making Comparisons (Pollock) (pp ) Chapter 5.
Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.
Intermediate Applied Statistics STAT 460
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
More About Tests and Intervals Chapter 21. Zero In on the Null Null hypotheses have special requirements. To perform a hypothesis test, the null must.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Sociology 5811: Lecture 14: ANOVA 2
Statistics (cont.) Psych 231: Research Methods in Psychology.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 27, 2013.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, October 15, 2013 Analysis of Variance (ANOVA)
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Chapter 21: More About Test & Intervals
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)
Research Methods and Data Analysis in Psychology Spring 2015 Kyle Stephenson.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Posthoc Comparisons finding the differences. Statistical Significance What does a statistically significant F statistic, in a Oneway ANOVA, tell us? What.
Step 1: Specify a null hypothesis
Core Methods in Educational Data Mining
Elementary Statistics
I. Statistical Tests: Why do we use them? What do they involve?
One Way ANOVAs One Way ANOVAs
HUDM4122 Probability and Statistical Inference
Sequence comparison: Multiple testing correction
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 18, 2012

Today’s Class Post-hoc Adjustments

The Problem If you run 20 statistical tests, you get a statistically significant effect in one of them If you report that effect in isolation, as if it were significant, you add junk to the open literature

The Problem To illustrate this, let’s run a simulation a few times, and do a probability estimation spurious-effect-v1.xlsx

The Problem Comes from the paradigm of conducting a single statistical significance test How many papers have just one statistical significance test? How big is the risk if you run two tests, or eight tests? – Back to the simulation!

The Solution Adjust for the probability that your results are due to chance, using a post-hoc control

Two paradigms FWER – Familywise Error Rate – Control for the probability that any of your tests are falsely claimed to be significant (Type I Error) FDR – False Discovery Rate – Control for the overall rate of false discoveries

Bonferroni Correction

Ironically, derived by Miller rather than Bonferroni

Bonferroni Correction Ironically, derived by Miller rather than Bonferroni Also ironically, there appear to be no pictures of Miller on the internet

Bonferroni Correction A classic example of Stigler’s Law of Eponomy – “No scientific discovery is named after its original discoverer”

Bonferroni Correction A classic example of Stigler’s Law of Eponomy – “No scientific discovery is named after its original discoverer” – Stigler’s Law of Eponomy proposed by Robert Merton

Bonferroni Correction If you are conducting n different statistical tests on the same data set Adjust your significance criterion  to be –  / n E.g. For 4 statistical tests, use statistical significance criterion of rather than 0.05

Bonferroni Correction Sometimes instead expressed by multiplying p * n, and keeping statistical significance criterion  = 0.05 Mathematically equivalent… As long as you don’t try to treat p like a probability afterwards… or meta-analyze it… etc., etc. For one thing, can produce p values over 1, which doesn’t really make sense

Bonferroni Correction: Example Five tests – p=0.04, p=0.12, p=0.18, p=0.33, p=0.55 Five corrections – All p compared to  = 0.01 – None significant anymore – p=0.04 seen as being due to chance

Bonferroni Correction: Example Five tests – p=0.04, p=0.12, p=0.18, p=0.33, p=0.55 Five corrections – All p compared to  = 0.01 – None significant anymore – p=0.04 seen as being due to chance – Does this seem right?

Bonferroni Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Five corrections – All p compared to  = 0.01 – Only p=0.001 still significant

Bonferroni Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Five corrections – All p compared to  = 0.01 – Only p=0.001 still significant – Does this seem right?

Bonferroni Correction Advantages Disadvantages

Bonferroni Correction Advantages You can be “certain” that an effect is real if it makes it through this correction Does not assume tests are independent (in the same data set, they probably aren’t!) Disadvantages Massively over-conservative Essentially throws out every effect if you run a lot of tests

Often attacked these days Arguments for rejecting the sequential Bonferroni in ecological studies. MD Moran - Oikos, JSTOR Arguments for rejecting the sequential Bonferroni in ecological studies Beyond Bonferroni: less conservative analyses for conservation genetics. SR Narum - Conservation Genetics, 2006 – Springer Beyond Bonferroni: less conservative analyses for conservation genetics What's wrong with Bonferroni adjustments. TV Perneger - Bmj, bmj.com What's wrong with Bonferroni adjustments p Value fetishism and use of the Bonferroni adjustment. JF Morgan - Evidence Based Mental Health, 2007 p Value fetishism and use of the Bonferroni adjustment

Holm Correction Also called Holm-Bonferroni Correction And the Simple Sequentially Rejective Multiple Test Procedure And Holm’s Step-Down And the Sequential Bonferroni Procedure

Holm Correction Order your n tests from most significant (lowest p) to least significant (highest p) Test your first test according to significance criterion  / n Test your second test according to significance criterion  / (n-1) Test your third test according to significance criterion  / (n-2) Quit as soon as a test is not significant

Holm Correction: Example Five tests – p=0.001, p=0.01, p=0.02, p=0.03, p=0.04

Holm Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 First correction – p = compared to  = 0.01 – Still significant!

Holm Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Second correction – p = compared to  = – Still significant!

Holm Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Third correction – p = 0.02 compared to  = – Not significant

Holm Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Third correction – p = 0.02 compared to  = – Not significant – p=0.03 and p=0.04 not tested

Less Conservative p=0.011 no longer seen as not statistically significant But p=0.02, p=0.03, p=0.04 still discarded Does this seem right?

Tukey’s Honestly Significant Difference (HSD)

Tukey’s HSD Method for conducting post-hoc correction on ANOVA Typically used to assess significance of pair-wise comparisons, after conducting omnibus test – E.g. We know there is an overall effect in our scaffolding * agent 2x2 comparison – Now we can ask is Scaffolding+Agent better than Scaffolding + ~Agent, etc. etc.

Tukey’s HSD The t distribution is adjusted such that the number of means tested on is taken into account Effectively, the critical value for t goes up with the square root of the number of means tested on E.g. for 2x2 = 4 means, critical t needed is double E.g. for 3x3 = 9 means, critical t needed is triple

Tukey’s HSD Not quite as over-conservative as Bonferroni, but errs in the same fashion

Other FWER Corrections Sidak Correction – Less conservative than Bonferroni – Assumes independence between tests – Often an undesirable assumption Hochberg’s Procedure/Simes Procedure – Corrects for number of expected true hypotheses rather than total number of tests – Led in the direction of FDR

Questions? Comments?

FDR Correction

Different paradigm, probably a better match to the original conception of statistical significance

Comparison of Paradigms

Statistical significance p<0.05 A test is treated as rejecting the null hypothesis if there is a probability of under 5% that the results could have occurred if there were only random events going on This paradigm accepts from the beginning that we will accept junk (e.g. Type I error) 5% of the time

FWER Correction p<0.05 Each test is treated as rejecting the null hypothesis if there is a probability of under 5% divided by N that the results could have occurred if there were only random events going on This paradigm accepts junk far less than 5% of the time

FDR Correction p<0.05 Across tests, we will attempt to accept junk exactly 5% of the time – Same degree of conservatism as the original conception of statistical significance

Example Twenty tests, all p=0.05 Bonferroni rejects all of them as non- significant FDR notes that we should have had 1 fake significant, and 20 significant results is a lot more than 1

FDR Procedure Order your n tests from most significant (lowest p) to least significant (highest p) Test your first test according to significance criterion  / n Test your second test according to significance criterion  / n Test your third test according to significance criterion  / n Quit as soon as a test is not significant

FDR Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 First correction – p = compared to  = 0.01 – Still significant!

FDR Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Second correction – p = compared to  = 0.02 – Still significant!

FDR Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Third correction – p = 0.02 compared to  = 0.03 – Still significant!

FDR Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Fourth correction – p = 0.03 compared to  = 0.04 – Still significant!

FDR Correction: Example Five tests – p=0.001, p=0.011, p=0.02, p=0.03, p=0.04 Fifth correction – p = 0.04 compared to  = 0.05 – Still significant!

FDR Correction: Example Five tests – p=0.04, p=0.12, p=0.18, p=0.33, p=0.55 First correction – p = 0.04 compared to  = 0.01 – Not significant; stop

How do these results compare To Bonferroni Correction To Holm Correction To just accepting p<0.05, no matter how many tests are run

q value extension in FDR (Storey, 2002) Remember how Bonferroni Adjustments could either be made by adjusting  or adjusting p? There is a similar approach in FDR, where p values are transformed to q values Unlike in Bonferroni, q values can be interpreted the same way as p values

q value extension in FDR (Storey, 2002) p = probability that the results could have occurred if there were only random events going on q = probability that the current test is a false discovery, given the post-hoc adjustment

q value extension in FDR (Storey, 2002) q can actually be lower than p In the relatively unusual case where there are many statistically significant results

Questions? Comments?

Asgn. 10 Kim, can you please go over your solution?

Asgn. 11 No assignment 11!

Next Class I need to move class next Monday to either Thursday or Friday What do folks prefer?

Next Class Wednesday, April 25 3pm-5pm AK232 Social Network Analysis Readings Haythornthwaite, C. (2001) Exploring Multiplexity: Social Network Structures in a Computer-Supported Distance Learning Class. The Information Society: An International Journal, 17 (3), Dawson, S. (2008) A study of the relationship between student social networks and sense of community. Educational Technology & Society, 11(3), Assignments Due: None

The End