Download presentation
Presentation is loading. Please wait.
1
Running models and Communicating Statistics
EMPA PMGT 630 Statistical Analysis Eva Witesman, Ph.D.
2
Types of Questions for Analysis
Bivariate analysis: Does X correlate with Y? Do the post-test values differ from pre-test values? Is the distribution of Y, given X, different from what we would expect based on randomness? Multivariate analysis: Does X correlate with Y even in the presence of Z? Which of several X factors have the greatest impact on Y? Predicting Y using a variety of X variables Predicting post-test values using pre-test values and other factors
3
Dependent variables Dependent variables are the “Y” or outcome or effect variables. The dependent variable is generally the phoneomenon we are interested in describing, predicting, or explaining. We can only have one dependent variable per model. If your dependent variable construct has more than one measure, you may consider creating an index or running more than one model.
4
Independent variables
Independent variables are the “X” or cause variables that are used to describe, predict, or explain the dependent variable. “Research variables” are independent variables we are explicitly interested in. “Control variables” are independent variables we include in the model just to rule out possible interactions or alternate explanations. The distinction between research variables and control variables is entirely theoretical.
5
Coefficient values Beta coefficients, coefficients, estimates, parameter estimates, etc. are all terms for the value that predicts the impact of a specific X on Y. This value should only be interpreted if the X variable is found to correlate with Y (p<0.05). The interpretation of “beta” is that for a one unit increase in the independent variable, the dependent variable increases by [the value for beta] units. Units are however they were measured in your data. The beta value also represents the slope of the line estimated by mapping values of x and values of y, holding all other variables constant.
6
Examples Mother’s build has no statistical relationship with student weight (α=0.05), all other variables being held constant. Eating fast food has a statistically significant positive relationship with student weight. For a one meal per week increase in eating fast food, student weight increases by pounds, all other variables being held constant (t=2.11, p=.0174). Being female has a statistically significant negative relationship with weight. On average, being female is associated with being pounds lighter, all other variables being held constant (t=2.03, p=.0212).
7
R-squared The R-squared value (currently available for linear regression models only) is an estimate of how much of the variation in Y is explained by the independent variables in your model. The adjusted R-squared value is an estimate of how effectively the variation in Y is explained by the independent variables in your model, given the size of the model. Large models with lots of useless variables will have lower adjusted R-squared values. There are pseudo-R-sqared values available for some nonlinear models.
8
The formula The formula for predicting Y based on your linear regression model is: Y = intercept + slope1*(value of X1) + slope2*(value of X2) +…+ slopen*(value of Xn). Include all variables (and intercept) from the model even if they are not statistically significant. Note this applies only to linear regression, not to logistic regression (which does not estimate a line of best fit).
9
Factor Change If you are using a logistic regression model, the “slope coefficient” estimates do not make sense because of the log-transformation (the slope is not constant). To adjust for this, we use “factor change coefficents,” which must be calculated. The formula is: Factor change for a positive relationship=ebeta Factor change for a negative relationship=1/ebeta The factor change indicates how many times more (positive relationship) or less (negative relationship) likely you are to observe a one level increase in the dependent variable. For binary logistic regression, a one level increase means observing a “1” instead of a “0.”
10
Example Controlling for all other variables in the model, being male has a statistically significant positive relationship with serving a mission. On average, being male increases the likelihood of serving a mission by a factor of 54 (z=3.45, p=0.001).
11
Hints on modeling
12
Does X correlate with Y given Z?
Identify a cause (X) and effect (Y) relationship. Identify variables that might affect BOTH X AND Y. These are your Z variables. Include the X and Z variables in your model predicting Y. Look at the p-value for X. If p<0.05, then X correlates with Y even controlling for Z. Interpreting Z is ancillary but can be instructive. R-squared indicates how much of Y is explained by all variables in the model.
13
Examples Does marriage predict homeownership, even controlling for age and income? Does race predict education, even controlling for income, location, and socioeconomic status? Does our intervention predict outcomes, even controlling for the beginning state of participants?
14
Which X factors impact Y?
Identify several competing factors (X) that may impact Y. Include the X variables in your model predicting Y. Look at the p-value for each X. If p<0.05, then that X correlates with Y. Look at the slope (beta) coefficient for each statistically significant X. The largest coefficients indicate the largest change in Y for a unit change in X. Negative values indicate negative relationships. R-squared indicates how much variation in Y is explained by this set of X factors.
15
Examples Which of several factors had the greatest impact on citizen or client satisfaction? Which of several factors has the most impact on program success?
16
Predicting Y Identify a variable you would like to predict (Y) when it is unknown. Identify variables that are not unknown and could be used to predict Y. These are your X variables. The goal of this type of analysis is to get a high R-squared, which means you are doing a good job of predicting the Y values you observed in the past, based on data you could expect to have on hand in the future. Include as many X variables as possible, reasonable, and theoretically justifiable. Do not worry about multicollinearity. A “good” model is one with a high R-squared value (as close as possible to 1). Use the formula form of your regression results to predict Y in the future.
18
Examples What formula could
19
Predicting post-test values
Use the final outcome or post-test value from a paired value set as the dependent (Y) variable. Include the pre-test value from a paired value set as an independent variable (Z). Include, if available, a measure of the intervention (X). Include control variables (Z) as desired. There will generally be high correlation between the pre-test and the post-test. Look at the intervention variable to see if it is also significant (p<0.05). If so, the intervention had an impact on post-test values.
20
Communicating Statistics
21
In General Know your audience. Use their language
Know your audience. Focus on results they can do something about. Know your audience. Predict what questions they will ask and answer them. Know your audience. Provide visual and written communication they will want to pass on to others.
22
In General Statistics are BACKGROUND information to help you determine what to report Statistics are SUPPORTING information to help you justify your conclusions Statistics are NOT THE MAIN POINT. Start and end every paragraph with the real-world findings and conclusions. Limit what numbers are included in the sentences (rather than in parentheses, tables, or charts)
23
In General Report every piece of information necessary for having a complete understanding of the data and findings. Include enough information that somebody else could literally REPLICATE your work if the only thing they had to go on was your report. Use appendices, footnotes, parentheses, and other tools to keep the non-substantive information out of the write-up but in the report.
24
In General Organize your paper clearly using subheadings
Make the introduction and conclusion self-contained Professionally format and title all tables and charts Format the report so that it is visually pleasing Write your report using the smallest, most common words that are appropriate for the topic
25
Introduction and key question(s)
In the first paragraph of your report, use a phrase like “The purpose of this study is to…” In the first paragraph of your report, describe why your study is important or relevant, including what the reader of your report should be able to DO with the information. If you are interested in multiple variables or questions, use bulleted or numbered lists to itemize the questions. Put them in order of importance and follow the same order later in the paper.
26
Data and methods Where the data comes from How it was collected
When it was collected What the population is that you hope to extrapolate to What the sampling frame is How the sample was selected Response rate (if appropriate) Data collection instrument(s) including forms, input procedures, survey instruments, etc. Detailed description of the creation of any indexes Descriptive information about any variable that shows up in a sentence like “the purpose of this study is to…” The names of any statistical tests used in the report: “In this report, we use [THIS MANY] analytical techniques. These include…”
27
Data and methods Consider using a chronological approach to describing data collection and sampling methodology. Provide information on levels of measurement for key variables Consider providing tables with descriptive information (measures of central tendency, measures of dispersion, confidence intervals) In general, report confidence intervals, not just descriptive statistics. Be careful to specify the population you are extrapolating to. It is appropriate to acknowledge limitations in the data and methods in the data/methods section, or to refer the reader to the limitations of the study section of the paper.
28
Findings Restate the purpose of the study
State what you found regarding the major question(s) on which the study is based. Go into more detail, building a case for the findings you just stated. Start and end every paragraph with the real-world meaning of your findings, rather than the statistics behind these findings. It is ok to group “non-findings” together in a single paragraph.
29
Findings Provide information about “how much,” but only provide information on “how much” when there is a statistically significant finding. For statistical information that does not answer the question “how much,” provide the relevant statistics and numbers in parentheses. In general, report the “how much” finding and, in parentheses, the test statistic, p-value, and name of the statistical test.
30
Findings Provide tables for multiple regression analysis and other number-heavy analytic techniques For multiple regression, always include information about what variables were included in the model. Where appropriate, provide statistics about the overall validity or usefulness of the model (e.g. R-squared or AIC values)
31
Limitations of study Go through the data, methods, and findings sections of the paper and make notes about Issues with the data Issues with the methodology Weaknesses in or questions about the conclusions Research designs or changes that would have been better Organize and write these notes into a section about the limitations of the study.
32
Conclusions/implications
Suggest a course of action Be very conservative in your conclusions and recommended actions, particularly in terms of managerial or policy actions to be taken Acknowledge the limitations of the study and how they should temper your recommendations Make strong suggestions about future analysis and data collection
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.