Running models and Communicating Statistics

Slides:



Advertisements
Similar presentations
Brief introduction on Logistic Regression
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression
Correlation and Regression Analysis
The Simple Regression Model
Simple Linear Regression Analysis
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Understanding Multivariate Research Berry & Sanders.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 6: Analyzing and Interpreting Quantitative Data
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chemistry – Sept 9, 2016  P3 Challenge –  If olive oil has a density of 0.93 g/cm 3, what is the mass of 25 mL of olive oil?  Get out Al Foil Lab materials.
Stats Methods at IC Lecture 3: Regression.
Howard Community College
EMPA Statistical Analysis
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Chapter 13 Simple Linear Regression
Intro to Research Methods
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Writing Scientific Research Paper
Regression Analysis AGEC 784.
Inference for Least Squares Lines
BUS 308 mentor innovative education/bus308mentor.com
Bivariate & Multivariate Regression Analysis
PSY 325 aid Something Great/psy325aid.com
Topic 10 - Linear Regression
26134 Business Statistics Week 5 Tutorial
Parts of a Lab Write-up.
Components of thesis.
Analyzing and Interpreting Quantitative Data
Simple Linear Regression
Regression Analysis.
QM222 A1 On tests and projects
Chapter 11 Simple Regression
QM222 A1 Nov. 27 More tips on writing your projects
Objectives Assignment and quizzes review
Chapter Eight: Quantitative Methods
Correlation and Regression
Lecture Slides Elementary Statistics Eleventh Edition
Regression Analysis Week 4.
LESSON 24: INFERENCES USING REGRESSION
Identifying Inquiry and Stating the Problem
Multiple Regression Models
The Math Studies Project for Internal Assessment
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Product moment correlation
Regression Forecasting and Model Building
Topic 8 Correlation and Regression Analysis
Regression Analysis.
Introduction to Regression
3 basic analytical tasks in bivariate (or multivariate) analyses:
STEPS Site Report.
MGS 3100 Business Analysis Regression Feb 18, 2016
Pearson Correlation and R2
Presentation transcript:

Running models and Communicating Statistics EMPA PMGT 630 Statistical Analysis Eva Witesman, Ph.D.

Types of Questions for Analysis Bivariate analysis: Does X correlate with Y? Do the post-test values differ from pre-test values? Is the distribution of Y, given X, different from what we would expect based on randomness? Multivariate analysis: Does X correlate with Y even in the presence of Z? Which of several X factors have the greatest impact on Y? Predicting Y using a variety of X variables Predicting post-test values using pre-test values and other factors

Dependent variables Dependent variables are the “Y” or outcome or effect variables. The dependent variable is generally the phoneomenon we are interested in describing, predicting, or explaining. We can only have one dependent variable per model. If your dependent variable construct has more than one measure, you may consider creating an index or running more than one model.

Independent variables Independent variables are the “X” or cause variables that are used to describe, predict, or explain the dependent variable. “Research variables” are independent variables we are explicitly interested in. “Control variables” are independent variables we include in the model just to rule out possible interactions or alternate explanations. The distinction between research variables and control variables is entirely theoretical.

Coefficient values Beta coefficients, coefficients, estimates, parameter estimates, etc. are all terms for the value that predicts the impact of a specific X on Y. This value should only be interpreted if the X variable is found to correlate with Y (p<0.05). The interpretation of “beta” is that for a one unit increase in the independent variable, the dependent variable increases by [the value for beta] units. Units are however they were measured in your data. The beta value also represents the slope of the line estimated by mapping values of x and values of y, holding all other variables constant.

Examples Mother’s build has no statistical relationship with student weight (α=0.05), all other variables being held constant. Eating fast food has a statistically significant positive relationship with student weight. For a one meal per week increase in eating fast food, student weight increases by 12.26 pounds, all other variables being held constant (t=2.11, p=.0174). Being female has a statistically significant negative relationship with weight. On average, being female is associated with being 14.13 pounds lighter, all other variables being held constant (t=2.03, p=.0212).

R-squared The R-squared value (currently available for linear regression models only) is an estimate of how much of the variation in Y is explained by the independent variables in your model. The adjusted R-squared value is an estimate of how effectively the variation in Y is explained by the independent variables in your model, given the size of the model. Large models with lots of useless variables will have lower adjusted R-squared values. There are pseudo-R-sqared values available for some nonlinear models.

The formula The formula for predicting Y based on your linear regression model is: Y = intercept + slope1*(value of X1) + slope2*(value of X2) +…+ slopen*(value of Xn). Include all variables (and intercept) from the model even if they are not statistically significant. Note this applies only to linear regression, not to logistic regression (which does not estimate a line of best fit).

Factor Change If you are using a logistic regression model, the “slope coefficient” estimates do not make sense because of the log-transformation (the slope is not constant). To adjust for this, we use “factor change coefficents,” which must be calculated. The formula is: Factor change for a positive relationship=ebeta Factor change for a negative relationship=1/ebeta The factor change indicates how many times more (positive relationship) or less (negative relationship) likely you are to observe a one level increase in the dependent variable. For binary logistic regression, a one level increase means observing a “1” instead of a “0.”

Example Controlling for all other variables in the model, being male has a statistically significant positive relationship with serving a mission. On average, being male increases the likelihood of serving a mission by a factor of 54 (z=3.45, p=0.001).

Hints on modeling

Does X correlate with Y given Z? Identify a cause (X) and effect (Y) relationship. Identify variables that might affect BOTH X AND Y. These are your Z variables. Include the X and Z variables in your model predicting Y. Look at the p-value for X. If p<0.05, then X correlates with Y even controlling for Z. Interpreting Z is ancillary but can be instructive. R-squared indicates how much of Y is explained by all variables in the model.

Examples Does marriage predict homeownership, even controlling for age and income? Does race predict education, even controlling for income, location, and socioeconomic status? Does our intervention predict outcomes, even controlling for the beginning state of participants?

Which X factors impact Y? Identify several competing factors (X) that may impact Y. Include the X variables in your model predicting Y. Look at the p-value for each X. If p<0.05, then that X correlates with Y. Look at the slope (beta) coefficient for each statistically significant X. The largest coefficients indicate the largest change in Y for a unit change in X. Negative values indicate negative relationships. R-squared indicates how much variation in Y is explained by this set of X factors.

Examples Which of several factors had the greatest impact on citizen or client satisfaction? Which of several factors has the most impact on program success?

Predicting Y Identify a variable you would like to predict (Y) when it is unknown. Identify variables that are not unknown and could be used to predict Y. These are your X variables. The goal of this type of analysis is to get a high R-squared, which means you are doing a good job of predicting the Y values you observed in the past, based on data you could expect to have on hand in the future. Include as many X variables as possible, reasonable, and theoretically justifiable. Do not worry about multicollinearity. A “good” model is one with a high R-squared value (as close as possible to 1). Use the formula form of your regression results to predict Y in the future.

Examples What formula could

Predicting post-test values Use the final outcome or post-test value from a paired value set as the dependent (Y) variable. Include the pre-test value from a paired value set as an independent variable (Z). Include, if available, a measure of the intervention (X). Include control variables (Z) as desired. There will generally be high correlation between the pre-test and the post-test. Look at the intervention variable to see if it is also significant (p<0.05). If so, the intervention had an impact on post-test values.

Communicating Statistics

In General Know your audience. Use their language Know your audience. Focus on results they can do something about. Know your audience. Predict what questions they will ask and answer them. Know your audience. Provide visual and written communication they will want to pass on to others.

In General Statistics are BACKGROUND information to help you determine what to report Statistics are SUPPORTING information to help you justify your conclusions Statistics are NOT THE MAIN POINT. Start and end every paragraph with the real-world findings and conclusions. Limit what numbers are included in the sentences (rather than in parentheses, tables, or charts)

In General Report every piece of information necessary for having a complete understanding of the data and findings. Include enough information that somebody else could literally REPLICATE your work if the only thing they had to go on was your report. Use appendices, footnotes, parentheses, and other tools to keep the non-substantive information out of the write-up but in the report.

In General Organize your paper clearly using subheadings Make the introduction and conclusion self-contained Professionally format and title all tables and charts Format the report so that it is visually pleasing Write your report using the smallest, most common words that are appropriate for the topic

Introduction and key question(s) In the first paragraph of your report, use a phrase like “The purpose of this study is to…” In the first paragraph of your report, describe why your study is important or relevant, including what the reader of your report should be able to DO with the information. If you are interested in multiple variables or questions, use bulleted or numbered lists to itemize the questions. Put them in order of importance and follow the same order later in the paper.

Data and methods Where the data comes from How it was collected When it was collected What the population is that you hope to extrapolate to What the sampling frame is How the sample was selected Response rate (if appropriate) Data collection instrument(s) including forms, input procedures, survey instruments, etc. Detailed description of the creation of any indexes Descriptive information about any variable that shows up in a sentence like “the purpose of this study is to…” The names of any statistical tests used in the report: “In this report, we use [THIS MANY] analytical techniques. These include…”

Data and methods Consider using a chronological approach to describing data collection and sampling methodology. Provide information on levels of measurement for key variables Consider providing tables with descriptive information (measures of central tendency, measures of dispersion, confidence intervals) In general, report confidence intervals, not just descriptive statistics. Be careful to specify the population you are extrapolating to. It is appropriate to acknowledge limitations in the data and methods in the data/methods section, or to refer the reader to the limitations of the study section of the paper.

Findings Restate the purpose of the study State what you found regarding the major question(s) on which the study is based. Go into more detail, building a case for the findings you just stated. Start and end every paragraph with the real-world meaning of your findings, rather than the statistics behind these findings. It is ok to group “non-findings” together in a single paragraph.

Findings Provide information about “how much,” but only provide information on “how much” when there is a statistically significant finding. For statistical information that does not answer the question “how much,” provide the relevant statistics and numbers in parentheses. In general, report the “how much” finding and, in parentheses, the test statistic, p-value, and name of the statistical test.

Findings Provide tables for multiple regression analysis and other number-heavy analytic techniques For multiple regression, always include information about what variables were included in the model. Where appropriate, provide statistics about the overall validity or usefulness of the model (e.g. R-squared or AIC values)

Limitations of study Go through the data, methods, and findings sections of the paper and make notes about Issues with the data Issues with the methodology Weaknesses in or questions about the conclusions Research designs or changes that would have been better Organize and write these notes into a section about the limitations of the study.

Conclusions/implications Suggest a course of action Be very conservative in your conclusions and recommended actions, particularly in terms of managerial or policy actions to be taken Acknowledge the limitations of the study and how they should temper your recommendations Make strong suggestions about future analysis and data collection