The “Big 4” – Significance, Effect Size, Sample Size, and Power 18 Sep 2009 Dr. Sean Ho CPSY501 cpsy501.seanho.com.

Slides:



Advertisements
Similar presentations
Comparing Two Means Dr. Andy Field.
Advertisements

CPSY 501: Class 2 Outline Please log-in and download the lec-2 data set from the class web-site. Statistical significance Effect size Combining effect.
RIMI Workshop: Power Analysis Ronald D. Yockey
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Designing Experiments In designing experiments we: Manipulate the independent.
Inferential Stats for Two-Group Designs. Inferential Statistics Used to infer conclusions about the population based on data collected from sample Do.
Independent Samples and Paired Samples t-tests PSY440 June 24, 2008.
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
Lecture 9: One Way ANOVA Between Subjects
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
Today Concepts underlying inferential statistics
Chapter 14 Inferential Data Analysis
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
PSY 307 – Statistics for the Behavioral Sciences
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
AM Recitation 2/10/11.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
© 2011 Pearson Prentice Hall, Salkind. Introducing Inferential Statistics.
ANOVA Analysis of Variance.  Basics of parametric statistics  ANOVA – Analysis of Variance  T-Test and ANOVA in SPSS  Lunch  T-test in SPSS  ANOVA.
Choosing and using statistics to test ecological hypotheses
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Comparing Two Means Prof. Andy Field.
User Study Evaluation Human-Computer Interaction.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Intro: “BASIC” STATS CPSY 501 Advanced stats requires successful completion of a first course in psych stats (a grade of C+ or above) as a prerequisite.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
12: Basic Data Analysis for Quantitative Research.
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Review Hints for Final. Descriptive Statistics: Describing a data set.
Mixed-Design ANOVA 5 Nov 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download: treatment5.sav.
ANOVA: Analysis of Variance.
Experimental Design and Statistics. Scientific Method
I271B The t distribution and the independent sample t-test.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
CPSY 501: Advanced Statistics 11 Sep 2009 Instructor: Dr. Sean Ho TA: Marci Wagner cpsy501.seanho.com No food/drink in the computer lab, please! Please.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Comparing Two Means Chapter 9. Experiments Simple experiments – One IV that’s categorical (two levels!) – One DV that’s interval/ratio/continuous – For.
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
PART 2 SPSS (the Statistical Package for the Social Sciences)
Power Analysis: Sample Size, Effect Size, Significance, and Power 16 Sep 2011 Dr. Sean Ho CPSY501 cpsy501.seanho.com Please download: SpiderRM.sav.
T tests comparing two means t tests comparing two means.
Chapter 13 Understanding research results: statistical inference.
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 7 Analyzing and Interpreting Quantitative Data.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
Mixed-Design ANOVA 13 Nov 2009 CPSY501 Dr. Sean Ho Trinity Western University Please download: treatment5.sav.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Some Terminology experiment vs. correlational study IV vs. DV descriptive vs. inferential statistics sample vs. population statistic vs. parameter H 0.
CPSY 501: Advanced Statistics
Exploring Group Differences
Logic of Hypothesis Testing
Comparing Two Means Prof. Andy Field.
Comparing Two Means t-tests: Rationale for the tests t-tests as a GLM
Inferential statistics,
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Some statistics questions answered:
Analysis of Variance: repeated measures
Presentation transcript:

The “Big 4” – Significance, Effect Size, Sample Size, and Power 18 Sep 2009 Dr. Sean Ho CPSY501 cpsy501.seanho.com

18 Sep 2009CPSY501: "Big 4"2 Outline for today Stats review: Correlation (Pearson, Spearman) t-tests (indep, paired) Discussion of research article (Missirlian et al) The “Big 4”: Statistical significance (p-value, α ) Effect size Power Finding needed sample size SPSS tips

11 Sep 2009CPSY501: Intro3 Measuring correl: Pearson's r The most common way to measure correlation is Pearson's product-moment correlation coefficient, named r: Requires parametric data Indep obs, scale level, normally distrib! Example: ExamAnxiety.sav Measured anxiety before exam, time spent reviewing before exam, and exam performance (% score)

11 Sep 2009CPSY501: Intro4 Pearson's correlation coeff Name of Correlation Statistic Each variable is perfectly correlated with itself! Significance Value (p)

Spearman's Rho ( ρ or r s ) Another way of calculating correlation Non-parametric: can be used when data violate parametricity assumptions No free lunch: loses information about data Spearman's works by first ranking the data, then applying Pearson's to those ranks Example (grades.sav): grade on a national math exam (GCSE) grade in a univ. stats course (STATS) coded by “letter” (A=1, B=2, C=3,...)

Spearman's Rho ( ρ or r s ): ex Sample Size The correlation is positive Name of Correlation Statistic

Chi-Square test ( χ 2 ) Evaluates whether there is a relationship between 2 categorical variables The Pearson chi-square statistic tests whether the 2 variables are independent If the significance is small enough (p< α, usually α =.05), we reject the null hypothesis that the two variables are independent (unrelated) i.e., we think that they are in some way related.

t-Tests: comparing two means Moving beyond correlational research… We often want to look at the effect of one variable on another by systematically changing some aspect of that variable That is, we want to manipulate one variable to observe its effect on another variable. t-tests are for comparing two means Two types of application of t-tests: Related/dependent measures Independent groups

Related/dependent t-tests A repeated measures experiment that has 2 conditions (levels of the IV) the same subjects participate in both conditions We expect that a person’s behaviour will be the same in both conditions external factors – age, gender, IQ, motivation,... – should be same in both conditions Experimental Manipulation: we do something different in Condition 1 than what we do in Condition 2 (so the only difference between conditions is the manipulation the experimenter made) e.g., Control vs. test

Independent samples t-tests We still have 2 conditions (levels of the IV), but different subjects in each condition. So, differences between the two group means can possibly reflect: The manipulation (i.e., systematic variation) Differences between characteristics of the people allotted to each group (i.e., unsystematic variation) Question: what is one way we can try to keep the ‘noise’ in an experiment to a minimum?

t-Tests t-tests work by identifying sources of systematic and unsystematic variation, and then comparing them. The comparison lets us see whether the experiment created considerably more variation than we would have got if we had just tested the participants w/o the experimental manipulation.

Example: dependent samples “Paired” samples t-test 12 ‘spider phobes’ exposed to a picture of a spider (picture), and on a separate occasion, a real live tarantula (real) Their anxiety was measured at each time (i.e., in each condition).

Paired samples t-test

Example: paired t-Tests Standard Deviation of the pairwise difference Standard error of the differences b/w subjects’ scores in each condition Degrees of Freedom (in a repeated measures design, it's N-1) SPSS uses df to calculate the exact probability that the value of the ‘t’ obtained could occur by chance The probability that ‘t’ occurred by chance is reflected here

Example: indep samples t-test Used in situations where there are 2 experimental conditions – and different participants are used in each condition Example: SpiderBG.sav 12 spider phobes exposed to a picture of a spider (picture); 12 different spider phobes exposed to a real-life tarantula Anxiety was measured in each condition

Summary Statistics for the 2 experimental conditions Parametric tests (e.g., t- tests) assume variances in the experimental conditions are ‘roughly’ equal. If Levene’s test is sig., the assumption of homogeneity of variance has been violated (N1 + N2) - 2 = 22 Significance (p-value): > α =.05, so there is no significant difference between the means of the 2 samples

18 Sep 2009CPSY501: "Big 4"17 Outline for today Stats review: Correlation (Pearson, Spearman) t-tests (indep, paired) Discussion of research article (Missirlian et al) The “Big 4”: Statistical significance (p-value, α ) Effect size Power Finding needed sample size SPSS tips

Practice reading article For practice, try reading this journal article, focusing on their statistical methods: see how much you can understand Missirlian, et al., “Emotional Arousal, Client Perceptual Processing, and the Working Alliance in Experiential Psychotherapy for Depression”, Journal of Consulting and Clinical Psychology, Vol. 73, No. 5, pp. 861–871, Download from website, under today's lecture

For discussion: What research questions do the authors state that they are addressing? What analytical strategy was used, and how appropriate is it for addressing their questions? What were their main conclusions, and are these conclusions warranted from the actual results /statistics /analyses that were reported? What, if any, changes/additions need to be made to the methods to give a more complete picture of the phenomenon of interest (e.g., sampling, description of analysis process, effect sizes, dealing with multiple comparisons, etc.)?

18 Sep 2009CPSY501: "Big 4"20 Outline for today Stats review: Correlation (Pearson, Spearman) t-tests (indep, paired) Discussion of research article (Missirlian et al) The “Big 4”: Statistical significance (p-value, α ) Effect size Power Finding needed sample size SPSS tips

18 Sep 2009CPSY501: "Big 4"21 Central Themes of Statistics Is there a real effect/relationship amongst certain variables? How big is that effect? We evaluate these by looking at Statistical significance (p-value) and Effect size (r 2, R 2, η, μ 1 - μ 2, etc.) Along with sample size (n) and statistical power (1-β), these form the “Big 4” of any test

18 Sep 2009CPSY501: "Big 4"22 The “Big 4” of every test Any statistical test has these 4 facets Set 3 as desired → derive required level of 4 th Alpha Sample sizeEffect size Power 0.05 as desired, depends on test usually 0.80 derived! (GPower)

18 Sep 2009CPSY501: "Big 4"23 Significance ( α ) vs. power (1- β ) α is the chance of Type-I error: incorrectly rejecting the null hypothesis (H 0 ) β is the chance of Type-II error: failing to reject H 0 when should have rejected H 0. Power is 1- β e.g., parachute inspections α, β in independent groups t-test:

18 Sep 2009CPSY501: "Big 4"24 Statistical significance An effect is “real” (statistically significant) if the probability that the observed result came about due to random variation is so small that we can reject random variation as an explanation. This probability is the p-value Can never truly “rule out” random variation Set the level of significance ( α ) as our threshold tolerance for Type-I error If p < α, we confidently say there is a real effect Usually choose α =0.05 (what does this mean?)

18 Sep 2009CPSY501: "Big 4"25 Myths about significance (why are these all myths?) Myth 1: “If a result is not significant, it proves there is no effect.” Myth 2: “The obtained significance level indicates the reliability of the research finding.” Myth 3: “The significance level tells you how big or important an effect is.” Myth 4: “If an effect is statistically significant, it must be clinically significant.”

18 Sep 2009CPSY501: "Big 4"26 Impact of changing α α =0.05 α =0.01

18 Sep 2009CPSY501: "Big 4"27 Effect size Historically, researchers only looked at significance, but what about the effect size? A small study might yield non-significance but a strong effect size Could be spurious, but could also be real Motivates meta-analysis – repeat the experiment, combine results Current research standards require reporting both significance as well as effect size

18 Sep 2009CPSY501: "Big 4"28 Measures of effect size For t-test and any between-groups comparison: Difference of means: d = ( μ 1 – μ 2 )/ σ For ANOVA: η 2 (eta-squared): Overall effect of IV on DV For bivariate correlation (Pearson, Spearman): r and r 2 : r 2 is fraction of variability in one var explained by the other var For regression: R 2 and ΔR 2 (“R 2 -change”) Fraction of variability in DV explained by overall model (R 2 ) or each predictor (ΔR 2 )

18 Sep 2009CPSY501: "Big 4"29 Interpreting effect size What constitutes a “big” effect size? Consult literature for the phenomenon of study Cohen '92, “A Power Primer”: rules of thumb Somewhat arbitrary, though! For correlation-type r measures: 0.10 → small effect (1% of var. explained) 0.30 → medium effect (9% of variability) 0.50 → large effect (25%)

18 Sep 2009CPSY501: "Big 4"30 Example: dependent t-test Dataset: SpiderRM.sav 12 individuals, first shown picture of spider, then shown real spider → measured anxiety Compare Means → Paired-Samples T Test SPSS results: t(11)=-2.473, p<0.05 Calculate effect size: see text, p.332 ( § 9.4.6) r ~ (big? Small?) Report sample size (df), test statistic (t), p-value, and effect size (r)

18 Sep 2009CPSY501: "Big 4"31 Finding needed sample size Experimental design: what is the minimum sample size needed to attain the desired level of significance, power, and effect size (assuming there is a real relationship)? Choose level of significance: α = 0.05 Choose power: usually 1- β = 0.80 Choose desired effect size: (from literature or Cohen's rules of thumb) → use GPower or similar to calculate the required sample size (do this for your project!)

18 Sep 2009CPSY501: "Big 4"32 Power vs. sample size Fix α =0.05, effect size d=0.50:

18 Sep 2009CPSY501: "Big 4"33 SPSS tips! Plan out the characteristics of your variables before you start data-entry You can reorder variables: cluster related ones Create a var holding unique ID# for each case Variable names may not have spaces: try “_” Add descriptive labels to your variables Code missing data using values like '999' Then tell SPSS about it in Variable View Clean up your output file (*.spv) before turn-in Add text boxes, headers; delete junk