Lecture note on statistics, data analysis planning – week 14 Elspeth Slayter, M.S.W., Ph.D.
Today’s class Administrative matters & check-in Review, worksheet Take-home message: Match stats to question 2 more types of tests Quantitative data analysis planning Group consultations
Assignment 3 Introduction RQ for the literature Literature review Starts with annotated bibliography Recommendation What you pulled from literature Proposed evaluation plan RQ for the evaluation Sub-sections
4 basic types of statistical tests: Mean, standard deviation Median, Mode Percentage, frequency Description Pearson’s correlation Correlation Student’s t tests Chi-square tests ANOVA Odds ratios Comparison OLS regression Logit regression Prediction
What you need to know: I. Purpose of test Match question to test II. Logistics for test Structure of variable needed for test ○ Continuous ○ Nominal Number of variables needed for test III. Interpret test findings Basic knowledge of the “squigglies” in reporting statistical results
Review: Variable structure Continuous (a.k.a. numeric) Examples? Nominal (a.k.a. dummy variable, categorical variables) Examples?
Breathe, people.
What, exactly, is regression? Researchy-language-lite: A statistical procedure used to find relationships among a set of variables Vernacular language: How well do all of these independent variables explain the variation in the dependent variable How well does this “model” explain the outcome of interest
Ordinary least squares (OLS) regression Vernacular language: There is a dependent variable, which is the one you are trying to explain, and 1+ independent variables that are related to it Researchy-language: Tests a ‘model’ of how a group of independent variables explain variation in the dependent variable Requirements: Dependent variable is continuous Certain conditions are met among independent variables
Two major factors to assess for/consider before running an OLS regression Multicollinearity: Occurs when one or more of your independent variables are related to one another Omitted Variables: If independent variables that have significant relationships with the dependent variable are left out of the model, results will not be as good as if they are included
Some relationships are perfectly linear Your cell phone bill, for instance, may be: Total Charges = Base Fee + 30¢ (overage minutes) + ε If you know the base fee and the number of overage minutes, you can predict the total charges exactly.
Other relationships are not so linear Weight - function of height Variations that height does not explain If you take a sample of actual heights and weights, you might see something like the graph to the right.
Examples with one CONTINUOUS dependent variable, 2+ independent variables OF ANY TYPE Weight = Height + X1 + X2… + ε What factors are known to impact weight? Separate relationships with weight What is a “model” for explaining weight? This is multiple regression analysis
Interpreting “coefficients” in OLS regression analysis The coefficient for each independent variable shows how much an increase of one in its value will change the dependent variable, holding all other independent variables constant. The p-value is a percentage. It tells you how likely it is that the coefficient for that independent variable emerged by chance and does not describe a real relationship.
Interpret: OLS regression results Standardized coefficient is what to focus on Interpret like a Pearson’s correlation Association with dependent variable Look at R 2
Logistic regression (a.k.a. logit) Vernacular language: There is a dependent variable, which is the one you are trying to explain, and 1+ independent variables that are related to it Researchy-language: ‘Models’ how a set of independent variables function as collective predictors of the dependent variable Requirements: Dependent variable is nominal (2 values only)
How ‘Good’ is the ‘Model?’ OLS Regression How well the model explains the data measured via R 2 Tells you percent of differences explained by model R 2=.68 means: 68% of the variance in the dependent variable is explained by the model 32% of those differences remains unexplained Logit/Logistic Regression Nagelkerke R 2 Cox & Snell R 2 Same idea, different formulae used to get it
Interpret: Logistic Regression results Interpret the power of the model Nagelkerke R 2 = 0.33 Separate from reporting of odds ratios with lots of control variables
…put your quantitative hat on…
Moving in on the questions – but what to do w/the answers? Developing a data analysis plan ahead of time is crucial to conducting good research Relates to research design Cross-sectional, longitudinal? Relates to how you ask questions: Topic, question structure, answer design
Introduction to table shells: What is a table shell? Tables with no data in them – and a process Take each question and think about how you will use it Use table shells to: Clarify your thinking pre-data collection Prepare for your data’s arrival Get a jump on preparing for data interpretation, writing your final report
Thinking about groupings, or not Table for overall data Frequencies, means Table for comparative data Frequencies, means and test results Two groups (Chi-square, t-test, odds ratio, etc.) ○ Choose test based on variable structure Three groups or more (ANOVA, etc.)
Table shells – your examples What results do you envision? Are you comparing groups? What groups are you comparing? On what variables?
Time to slice and dice:
Data analysis basics: First – run frequencies on all Conduct data cleaning for data entry errors See what the emerging “story” is Second – run means on all data measured continuously See what the emerging story is Third – run correlations per plan Fourth – run comparisons per plan Fifth – look at additional unexpected analysis possibilities