Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 11, 2012.

Slides:



Advertisements
Similar presentations
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Advertisements

Effect Size and Power.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 18, 2012.
Statistical Issues in Research Planning and Evaluation
Effect Size and Power. ► Two things mentioned previously:  P-values are heavily influenced by sample size (n)  Statistics Commandment #1: P-values are.
Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
RIMI Workshop: Power Analysis Ronald D. Yockey
C82MST Statistical Methods 2 - Lecture 4 1 Overview of Lecture Last Week Per comparison and familywise error Post hoc comparisons Testing the assumptions.
Useful Statistical Tools February 19, Today’s Class Aphorisms Useful Statistical Tools Probing Question Assignments Surveys.
Introduction to Power Analysis  G. Quinn & M. Keough, 2003 Do not copy or distribute without permission of authors.
Using Statistics in Research Psych 231: Research Methods in Psychology.
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Understanding Research Results. Effect Size Effect Size – strength of relationship & magnitude of effect Effect size r = √ (t2/(t2+df))
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Major Points An example Sampling distribution Hypothesis testing
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Hypothesis Testing Is It Significant?. Questions What is a statistical hypothesis? What is the null hypothesis? Why is it important for statistical tests?
Chapter 14 Inferential Data Analysis
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Standard Error of the Mean
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Copyright © 2010 Lumina Decision Systems, Inc. Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”) Lonnie Chrisman,
Introduction to Hypothesis Testing for μ Research Problem: Infant Touch Intervention Designed to increase child growth/weight Weight at age 2: Known population:
Comparing Means From Two Sets of Data
Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.
Statistical Power The ability to find a difference when one really exists.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Say not, „I have found the truth," but rather, „I have found a truth.„ Kahlin Gibran “The Prophet” Hypothesis testing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 26, 2012.
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
I271B The t distribution and the independent sample t-test.
Analysis of RT distributions with R Emil Ratko-Dehnert WS 2010/ 2011 Session 07 –
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 16, 2012.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Chapter 10 The t Test for Two Independent Samples
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
POWER. Background  Power  Ability to detect a statistically significant effect (p
Day 10 Analysing usability test results. Objectives  To learn more about how to understand and report quantitative test results  To learn about some.
Results: How to interpret and report statistical findings Today’s agenda: 1)A bit about statistical inference, as it is commonly described in scientific.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 10: The t Test For Two Independent Samples.
Chapter 9 Introduction to the t Statistic
Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
Hypothesis Testing Is It Significant?.
Psych 231: Research Methods in Psychology
Presentation transcript:

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 11, 2012

Today’s Class Power Analysis

Statistical Power

Power Analysis A set of methods for determining The probability that you will obtain a statistically significant result, assuming a true effect size and sample size of a certain magnitude

Or The reverse Given a certain true effect size, and a desired probability of obtaining a statistically significant result, what sample size is needed?

Why? When? Why might a researcher want to do each type of power analysis? When might a researcher want to do each type of power analysis?

When used Effect size + Power --> Sample Size – Usually used before running study to pick sample size

When used Effect size + Sample Size --> Power – Usually used after running study to explain to thesis committee why more subjects are needed – Can also be used to choose between or justify choice of statistical test

Power analysis Can be computed from – “Effect Size”/ Cohen’s d (M1 – M2)/ (pooled SD, e.g.  ) – r – Difference in two r values – And several other metrics

Power analysis Can be computed for – Single-group t-test – Two-group t-test – Paired t-test – F test – Sign test – Etc., etc., etc.

Mathematical Details Differ for different statistical tests and metrics (see Lachin paper for several examples) Possible to do this in online power calculators

Big Idea Given the sample size, true effect size, and true standard deviation Every time you run the study you will get a different exact result, governed by the true standard deviation and sample size With power analysis, you can determine the probability that you’ll get – a sample effect size in the range you want – a statistically significant result

Sign Test Example (Courtesy of John McDonald)

Monte Carlo Example Simple test: Is the group mean greater or less than zero? Let’s compare to online power calculator at

What is a good value for power? Conventionally, power = 0.80 is treated as “good” Kind of a magic number – How much risk are you willing to accept?

Comments? Questions?

I need 3 volunteers

Play with calculator Two-sample t-test

Volunteer #1 If the true effect size is 0.5  how big a sample do you need to achieve Power = 0.8?

Volunteer #2 If the true effect size is 0.2  how big a sample do you need to achieve Power = 0.8?

Volunteer #3 If your control condition gains 20 points pre-post And your experimental condition gains 40 points pre- post And the pooled standard deviation is 30 points And you have 20 students in each condition What’s your statistical power?

Comments? Questions?

How can statistical power be increased? Both in theory, and in real life

How can statistical power be increased? Increase sample size

How can statistical power be increased? Increase difference in means – Make your intervention better

How can statistical power be increased? Increase difference in means – Make your control condition worse Some researchers make the mistake of picking a control condition that’s impossibly good – ScienceAssistments versus ScienceAssistments, with one less potential IV This doesn’t mean you should fish for a control condition that is absurdly awful – Principle-based explanation in ASSISTments versus Explaining fractions through interpretive dance – Fun fractions game versus Learning math through reading textbooks

How can statistical power be increased? Increase difference in means – Make your control condition worse Some researchers make the mistake of picking a control condition that’s impossibly good – ScienceAssistments versus ScienceAssistments, with one less potential IV This doesn’t mean you should fish for a control condition that is absurdly awful – Principle-based explanation in ASSISTments versus Explaining fractions through interpretive dance – Fun fractions game versus Learning math through reading textbooks written in Danish

How can statistical power be increased? Reduce standard deviation through – more homogenous sample – stratification of sample (and include stratification in model) – Improvement of measure reliability

Comments? Questions?

Jacob Cohen on Statistical Power “It is not at all clear why researchers continue to ignore power analysis. The passive acceptance of this state of affairs by editors and reviewers is even more of a mystery. At least part of the reason may be the low level of consciousness about effect size… I have suggested that the neglect of power analysis simply exemplifies the slow movement of methodological advance… An associate editor of this journal suggests another reason: Researchers find too complicated, or do not have at hand, either my book or other reference material for power analysis”

Jacob Cohen on Statistical Power It’s true that one sees – Relatively few statistical power analyses in published papers – More than a few analyses with low statistical power Why might that be? – Aside from “low levels of consciousness”

Some Reasons Why Sample sizes determined by feasibility rather than a priori decision

Some Reasons Why Larger sample sizes in real-world research often imply heterogeneity (and higher variance) Between running a study with insignificant statistical power, and running no study at all…

Some Reasons Why Hard to guess a priori what level of effect size is reasonable

Some Reasons Why Statistical power harder to do for the complex statistical tests that are increasingly common

Asgn. 10 Questions? Comments?

Next Class No Class Monday! Patriots Day

Next Class Wednesday, April 18 3pm-5pm AK232 Post-hoc Adjustments Readings Rosenthal, R., Rosnow, R.L. (1991) Essentials of Behavioral Research: Methods and Data Analysis, 2nd edition. p Verhoven, K.J.F., Simonsen, K.L., McIntyre, L.M. (2005)Implementing false discovery rate control: increasing your power. Oikos, 108, Assignments Due: None

The End