Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Variables David L. Streiner Nour Kteily PSY 1950.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Brief introduction on Logistic Regression
Rebecca Sleeper July  Statistical  Analysis of test taker performance on specific exam items  Qualitative  Evaluation of adherence to optimal.
Logistic Regression Psy 524 Ainsworth.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Statistical Decision Making
Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.
Statistical Tests Karen H. Hagglund, M.S.
Topic 6: Introduction to Hypothesis Testing
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
NORMAL CURVE Needed for inferential statistics. Find percentile ranks without knowing all the scores in the distribution. Determine probabilities.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Clustered or Multilevel Data
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
Normal Distributions What is a Normal Distribution? Why are Many Variables Normally Distributed? Why are Many Variables Normally Distributed? How Are Normal.
8-2 Basics of Hypothesis Testing
1 EXPLORING PSYCHOLOGY (7th Edition) David Myers PowerPoint Slides Aneeq Ahmad Henderson State University Worth Publishers, © 2008.
Standard Scores & Correlation. Review A frequency curve either normal or otherwise is simply a line graph of all frequency of scores earned in a data.
Chapter 14 Inferential Data Analysis
Sample Size Determination Ziad Taib March 7, 2014.
Inferential Statistics
Measures of Central Tendency
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Lecture Slides Elementary Statistics Twelfth Edition
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
Descriptive Statistics
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
Introductory Topics PSY Scientific Method.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter Eight: Using Statistics to Answer Questions.
Data Analysis.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Measurement Math DeShon Univariate Descriptives Mean Mean Variance, standard deviation Variance, standard deviation Skew & Kurtosis Skew & Kurtosis.
Hypothesis test flow chart
Logistic Regression and Odds Ratios Psych DeShon.
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Effect Sizes.
Logistic Regression APKC – STATS AFAC (2016).
Basic Statistics Overview
Intro to Psychological Testing (part II)
Psychology Statistics
Chapter 7: Statistical Issues in Research planning and Evaluation
Chapter Nine: Using Statistics to Answer Questions
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Presentation transcript:

Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Variables David L. Streiner Nour Kteily PSY 1950

The Danger of Dichotomizing Reduced statistical power Increased probability of type II error Difficulty reinterpreting data once definitions have changed

The Rationale for Dichotomizing Outcomes 1) “Clinicians have to make dichotomous decisions to treat or not to treat, so it makes sense to have a binary outcome” 2) “Physicians find it easier to understand results when they are expressed in proportions or odds ratios rather than beta weights and other indices”

Striener’s Retort 1) Confuses measurement with decision making. 2) Many ‘binary’ disorders could actually be seen as a continuum. 3) All research using the old dichotomy becomes much more difficult to interpret if the definition of the dichotomy changes. 4) Many treatments for ‘binary’ disorders actually fall along a continuum.

Example 1 Scale dichotomized- scores below 15 considered normal; above 15 = ‘case’ If treat scores in Group 1 and Group 2 continuously: Mean G1 = Mean G2 = t(18)= 2.16, p = If treat dichotomously: G1: 9 normal, 1 ‘case’ G2: 4 normal, 6 ‘cases’ Fisher’s test: P = 0.057

Example 2 40 subjects, measured on 4 variables A-D Testing correlations (continuous), you would get 4 significant correlations (upper triangle) at p<0.01 level If you dichotomize the data using median splits, you get only 2 significant correlations (lower triangle). Run regression with A as the dependent variable and B-D as the predictors: Variables kept as continua: R 2 = Variables dichotomized: R 2 = 0.211

Issues with Dichotomizing 1) Magnitude of the effects were lower when considering outcomes as dichotomous versus continuous. 2) Findings that were significant using continuous variables were not significant using dichotomous variables. Why? Dichotomizing results in a ‘tremendous’ loss of information Misclassification Signal/Noise ratio Taken together, these issues result in decreased statistical power and increased probability of type II error

It’s Not All Bad There are actually a few cases, based on statistical not clinical considerations, when we should divide variables into a dichotomy or ordinal data. 1) J-shaped distributions 2) Non-linear relationships

Conclusion Gather data as continua whenever possible Unless your variable deviates considerably from normality, avoid decreased power and increased type II error - don’t dichotomize!