The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD.

Slides:



Advertisements
Similar presentations
ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Advertisements

Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Departments of Medicine and Biostatistics
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Getting to know your variables Jane E. Miller, PhD The Chicago Guide to Writing.
Data and the Nature of Measurement
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
The Simple Regression Model
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Organizing data in tables and charts: Criteria for effective presentation Jane E. Miller, Ph.D. Rutgers University.
Understanding Research Results
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Organizing data in tables and charts: Different criteria for different tasks Jane.
Logarithmic specifications Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating effective tables and charts Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction patterns from logit coefficients: Interaction between two.
Comparing overall goodness of fit across models
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating the shape of a polynomial from regression coefficients Jane E. Miller,
The Chicago Guide to Writing about Numbers, 2nd Edition. Getting to know your variables Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate.
Types of quantitative comparisons Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Chapter Eleven A Primer for Descriptive Statistics.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane.
The Chicago Guide to Writing about Numbers, 2 nd edition. Summarizing a pattern involving many numbers: Generalization, example, exception (“GEE”) Jane.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Implementing “generalization, example, exception”: Behind-the-scenes work for summarizing.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Writing about ratios Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2nd Edition.
The Chicago Guide to Writing about Numbers, 2 nd edition. Basics of writing about numbers: Reporting one number Jane E. Miller, PhD.
The Chicago Guide to Writing about Numbers, 2 nd edition. Differentiating between statistical significance and substantive importance Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Writing prose to present results of interactions Jane E. Miller, PhD.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Criteria for choosing a reference category Jane E. Miller, PhD.
Choosing tools to present numbers: Tables, charts, and prose Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2nd Edition.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Standardized coefficients Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing tools to present numbers: Tables, charts, and prose Jane E. Miller, PhD.
The Chicago Guide to Writing about Numbers, 2 nd edition. Choosing a comparison group Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Chapter 6: Analyzing and Interpreting Quantitative Data
Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing statistical significance of differences between coefficients Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns between two categorical independent.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns with continuous independent variables.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Presenting results Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating charts to present interactions Jane E. Miller, PhD.
Approaches to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction effects from OLS coefficients: Interaction between 1 categorical.
SESSION 1 & 2 Last Update 15 th February 2011 Introduction to Statistics.
Nonparametric Statistics
Overview of categorical by categorical interactions: Part I: Concepts, definitions, and shapes Interactions in regression models occur when the association.
Using alternative reference categories to test statistical significance of an interaction This podcast is the last in the series on testing statistical.
Creating variables and specifying models to test for interactions between two categorical independent variables This lecture is the third in the series.
Nonparametric Statistics
Introduction to interactions in regression models: Concepts and equations Jane E. Miller, PhD Interactions in regression models occur when the association.
Overview of categorical by continuous interactions: Part II: Variables, specifications, and calculations Interactions in regression models occur when.
Testing whether a multivariate specification can be simplified
Presentation transcript:

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Overview Defining the Goldilocks problem Understanding why type of variable matters Understanding why range of values matters Outlining the steps to avert Goldilocks problems – Later podcasts fill in the details

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. What is “the Goldilocks problem” in multivariate regression? As Goldilocks discovered, she and each of the Three Bears preferred different sized chairs. – One chair was too big, – One chair was too small, – One chair was just right! Likewise, different variables in a multivariate regression often require different-sized contrasts to illustrate the meaning of their coefficients.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Review: Interpretation of regression coefficients Ordinary least squares (OLS) coefficients (βs) change in dependent variable (Y) for a 1-unit increase in independent variable (X i ), with the result in the units of the dependent variable. Logit coefficients estimate the effect of a 1- unit increase in X i on the log-odds of the outcome under study.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Common pitfalls in interpreting regression coefficients Assessing which independent variables are the “most important” by directly comparing the sizes of the estimated coefficients (βs). Direct comparison of βs implies that a 1-unit increase in each independent variable is the pertinent contrast for that variable. – Problematic because many multivariate models include different: Types of variables (levels of measurement). Ranges and scales of continuous variables.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Why does type of variable matter? Continuous independent variables Categorical independent variables – Nominal – Ordinal

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Considerations for contrast size: Continuous variables Different continuous variables have different levels and ranges of values: – Age in a sample of students might vary from 5 to 17 years A 12-unit range among values in the single to double digits – Their annual family incomes could vary from $0 to $millions A million+ unit range with a median value likely in the five digits

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Problem: Directly comparing βs for continuous variables with different scales Although a 1-year increase in age might be a relevant contrast, a $1 increase in annual family income in the US today would be trivial. Directly comparing the βs on age and income implicitly assumes that a 1-unit increase fits the scale of both variables.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Considerations for contrast size: Categorical variables The numeric codes used as shorthand for categorical variables have no mathematical meaning. – E.g., dummy variable “boy” coded 1 = boy 0 = girl – No such thing as a 1-unit increase in “genderness.” Such binary variables only span a 1-unit range, so multiunit changes are not applicable.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Problem: Interpreting directionality of nominal variable codes The values of nominal variables such as gender or race have no natural order. – Any rank ordering of categories of those variables is arbitrary. An artifact of how the analyst chose to code the categories. – Could equally well code gender as 1 = male, 2 = female – Thus the directionality implied by a 1-unit increase is misleading.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Codes for ordinal variables Codes for categories of ordinal variables are rankable and might appear to have numeric meaning. – E.g., categories for self-rated health might be coded: 1: excellent 2: very good 3: good 4: fair 5: poor

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Problem: Interpreting ordinal values as if they were continuous Unlike integer values of a continuous variable like age in years, the numeric distance between categories of an ordinal variable cannot be assumed to be uniform. E.g., respondents might perceive a bigger difference between “good” and “fair” health than between “very good” and “excellent” health.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Problem: Interpreting ordinal values as if they were continuous Unlike integer values of a continuous variable, the numeric distance between categories of an ordinal variable cannot be assumed to be uniform, even when categories have numeric units attached. E.g., income groups often Are of varying widths – E.g., <$20K, $20K–39K, $40K–$79K, $80K–$160K Include an open-ended top category (e.g., >$160K).

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Problem: Comparing βs on categorical and continuous variables Given the different interpretations of βs on continuous and categorical variables, if a model includes both types of independent variables, cannot compare their βs without considering the pertinent size contrast for each variable. – For mother’s age (a continuous variable), the contrast can vary >1 unit (year) across cases. – For gender, the contrast is one category versus the other, and no more than a “1-unit” increase is possible. – Even if β boy > β mother’s age (117.2 and 10.7, respectively), one cannot conclude that gender is a “more important” determinant of birth weight than mother’s age.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Why does range of values matter? When is a 1-unit change – Too big? – Too small?

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. When is a 1-unit increase too big? For independent variables whose values in your data: – fall mostly between 0 and 1, – are clustered within a few units of one another, – or are by definition restricted to between 0.0 and 1.0, e.g., Variables measured in proportions Gini coefficients In such situations, apply a <1.0 unit contrast to assess the effect of a change in X i on Y.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Proportions versus percentages Researchers are often sloppy about variables measured in proportions, instead labeling them as percentages (or vice versa) – The percentage equivalent of a proportion is by definition 100 times as large. Must convey the correct scale of the variable used in the model so β can be interpreted correctly. – For variables measured as a proportion, a 1-unit increase is too large, – For those measured in percentages, a 1-unit increase often is too small.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. When is a 1-unit increase too small? For independent variables with – A high level or wide range of values – Imprecise measurement of values E.g., for blood pressure, a 1 millimeter mercury (mm Hg) difference is too small to be – clinically meaningful – observed with precision In such situations, apply a >1.0 unit contrast to assess the effect of a change in X i on Y.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Goldilocks issues for the dependent variable: range of values Evaluate what a 1-unit increase means given the range and scale of the dependent variable. β = 1.0 on a dummy variable is – a trivially small effect in a model predicting birth weight (which ranges from about 400 to 5,900 grams). – A substantial effect in a model predicting grade point average on the usual 4-point scale.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Goldilocks issues for the dependent variable: model specification Ordinal dependent variables such as birth weight categories (e.g., very low, low, normal, and high birth weight) should not be modeled using OLS models. – OLS models imply that the numeric codes for those categories are values of a continuous dependent variable. – Instead, use techniques such as ordered logit or other methods for ordered categorical dependent variables (Powers and Xie 2000).

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Prose interpretation and comparison of βs is critical If βs are only reported in a table or prose, you leave it to readers to: – Notice the different types and scales of the variables, – Figure out pertinent-sized contrasts for each variable in the model. Readers will then be more likely to make Goldilocks errors when they assess the meaning of the βs on different variables in your model. Your job as the author is to write about the results in ways that avert Goldilocks errors of interpretation.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Steps for resolving Goldilocks problems 1.Getting acquainted with the units and distribution of your independent and dependent variables. 2.Applying theoretical and empirical criteria to choose a suitably-sized contrast for each independent variable. 3.Using precise, complete labeling of units and categories in prose, tables, and charts. 4.Interpreting the results in prose to clearly communicate the substantive meaning of the βs based on suitably-sized contrasts for each variable.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Summary A “one-size-fits-all” approach to interpreting regression coefficients is often misleading because variables – Have different types (levels of measurement), – Have different units of measurement, – Have varying distributions of values, – Occur in different real-world circumstances. These issues require careful thought about how to present βs to convey the substantive meaning of the β for each variable.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested resources Miller, J. E The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Chapter 10, on the Goldilocks problem – Chapter 4, on types of variables, units, and distribution Miller, J. E. and Y. V. Rodgers, “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research.” Feminist Economics 14 (2): 117–49.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested online resources Podcasts on – Interpreting multivariate regression coefficients – Resolving the Goldilocks problem Measurement and variables Model specification Presenting results

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Suggested practice exercises Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. – Questions #1,2, and 7 in the problem set for chapter 10. – Suggested course extensions for chapter 10: “Reviewing” exercises #1 through 5. “Applying statistics and writing” question #1. “Revising” questions #1, 2, 3, 5, and 9. – “Getting to know your variables” assignment

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Contact information Jane E. Miller, PhD Online materials available at