Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intervention Study: Kenya PRIMR Case Regression Analysis

Similar presentations


Presentation on theme: "Intervention Study: Kenya PRIMR Case Regression Analysis"— Presentation transcript:

1 Intervention Study: Kenya PRIMR Case Regression Analysis
March 2017 Susan Edwards, RTI International Sarrynna Sou, RTI International

2 Overview Linear Regression Replicating T-Test Analysis
Replicating DiD Analysis Controlling for Other Variables Interpreting Estimates Logistic Regression STATA Code: Similar to Linear Regression Interpreting Odds Ratios

3 Linear Regression Analysis
When would you want to use linear regression? Used to Model Continuous Data Examples: Super Summary Variables: CLPM, ORF, etc. Estimates Averages General Form 𝑌=∝+ 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 …+ 𝜖 𝑖 Model Assumptions Linearity Observed Data are Fixed Constants Errors are IID and N(0, σ)

4 Linear Regression Analysis - Example
Recall: 𝑌= ∝+ 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 …+ 𝜖 𝑖 orf = I(female) – 3.7 (age) Interpretations: This model suggests that girls read on average 2.7 words per minute more than boys when controlling for student age. Assuming age is a continuous variable, for every year of age students have an average decrease of 3.7 words per minute when controlling for gender. A 10 year old male student will read on average 75 – 3.7*10 = 38 words per minute.

5 Linear Regression Analysis – Reference Cell Coding
Recall: orf = I(female) – 3.7 (age) Interpretations: This model suggests that girls read on average 2.7 words per minute more than boys when controlling for student age. Reference Cell Coding: One level of a categorical variable is determined to be the reference. All other estimates are presented in comparison to the reference. Example: Female 2 levels 0 = Male 1 = Female 2.7 I(female) = the # of wpm difference between males and females

6 Linear Regression Analysis – Reference Cell Coding
Example with More than 2 Levels: Age Category 3 levels 0 = Younger than Grade Level = Below 7 1 = At Grade Level = 7 or 8 2 = Older than Grade Level = Above 8 Model for ORF: ORF = I(At Grade Level) I(Older than Grade Level) Questions: What is the average fluency for students in public schools? What is the average fluency for students in private schools? Do students in public schools preform better on average than students in religious schools? Do students in private schools preform better on average than students in religious schools?

7 Categorical vs. Continuous Independent Variables
Why do we care? STATA cares. Categorical Definition: A variable that can be divided into distinct categories. Examples: gender age category STATA code: Start variables with “i.” followed by variable name i.<variable name> Reference Cell Coding Continuous Definition: A variable that theoretically could go on forever Examples: orf age Reading comprehension score? Generally ranges from 0 to 5. STATA code: List variable name in equation line.

8 Linear Regression Analysis – STATA Example
Recall: 𝑌= ∝+ 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 …+ 𝜖 𝑖 orf = I(female) – 3.7 (age) STATA Code: svy: reg eq_orf i.female age

9 Linear Regression Analysis – STATA Activity
Recall: STATA code to fit a model for gender and age. svy: reg eq_orf i.female age Fit a linear model for English fluency (eq_orf) that accounts for the following school factors (nonformal; enrolment) svy: reg eq_orf i.nonformal enrolment Why does nonformal have an “i.” in front of the variable name? What type of variable is enrolment in this model? How would we change enrolment to be a categorical variable? Would the model work if we typed the following? svy: reg enrolment i.nonformal eq_orf

10 T-Test Results with Linear Regression in STATA
Recall: T-Tests compare the means of two groups. Example: ttest eq_orf, by (treat_phase) Is there a different between baseline and endline scores? How can we use Linear Regression to duplicate these results? October 2012 October 2013 Mean (N) 48 wpm (913) 53 wpm (922) Difference (S.E.) 4.4 wpm (1.7) T-Stat (DOF) 2.59 (1833) H0: = ; Ha: != P-Value = ; Reject H0 Followed by hands on activity with Oral Reading Fluency.

11 T-Test Results with Linear Regression in STATA
Recall: ttest eq_orf, by (treat_phase) How can we use Linear Regression to duplicate these results? How many variables are in used in the ttest command? eq_orf treat_phase Use a linear regression model that only contains the two variables of interest. What would the STATA code for the model look like? reg eq_orf i.treat_phase Followed by hands on activity with Oral Reading Fluency.

12 T-Test Results with Linear Regression in STATA
Recall: ttest eq_orf, by (treat_phase) reg eq_orf i.treat_phase October 2012 October 2013 Mean (N) 48 wpm (913) 53 wpm (922) Difference (S.E.) 4.4 wpm (1.7) T-Stat (DOF) 2.59 (1833) H0: = ; Ha: != P-Value = ; Reject H0 Followed by hands on activity with Oral Reading Fluency.

13 T-Test Results with Linear Regression in STATA
Recall: ttest eq_orf, by (treat_phase) reg eq_orf i.treat_phase October 2012 October 2013 Mean (N) 48 wpm (913) 53 wpm (922) Difference (S.E.) 4.4 wpm (1.7) T-Stat (DOF) 2.59 (1833) H0: = ; Ha: != P-Value = ; Reject H0 These results are unweighted. Wait! How can we make our results reflect the population not the sample? svy: reg eq_orf i.treat_phase Followed by hands on activity with Oral Reading Fluency.

14 Linear Regression in STATA – Controlling for Other Variables
Want to Know: Effect of certain variables when other variables we know to be influential are controlled. Recall: orf = I(female) – 3.7 (age) In this model, we may already know that older students are less fluent readers because they are repeating the grade or have taken a long break between school years. But we want to know if gender influences fluency once age is controlled. When do we use models with multiple variables? Determine Demographic and SSME Impact What variables must be in these models? Variables that we know strongly influence the outcome. Sample design variables Treatment; Gender; Time Followed by hands on activity with Oral Reading Fluency.

15 Linear Regression in STATA – Controlling for Other Variables - Example
Fit a model for English fluency that accounts for treatment, time, gender, and formal/nonformal school type. Question of Interest: Once design variables are controlled for, is there a difference between students in formal and nonformal schools? STATA Code: svy: reg eq_orf i.treatment i.treat_phase i.treatment#i.treat_phase i.female i.nonformal Interpretation: Students in nonformal schools read on average 29 wpm more than students in formal schools when study design is controlled. Followed by hands on activity with Oral Reading Fluency.

16 Linear Regression in STATA –
Linear Regression in STATA – Controlling for Other Variables - Activity Activity: Determine if any of the other SSME variables make a difference on student English reading fluency (eq_orf). Followed by hands on activity with Oral Reading Fluency.

17 Linear vs Logistic Regression
When would you want to use logistic regression? Used to Model Binomial Categorical Data Examples: Zero Scores 0 = Score above Zero on Task 1 = Score equal Zero on Task Reading Comprehension of 80% or Better 0 = Reading Comprehension Score < 80% 1 = Reading Comprehension Score >= 80% Estimates Probabilities and Odds Ratios

18 Linear vs Logistic Regression
When would you want to use logistic regression? Used to Model Binomial Categorical Data Estimates Probabilities and Odds Ratios General Form 𝑙𝑜𝑔𝑖𝑡 𝑝 = log 𝑝 𝑖 1− 𝑝 𝑖 = 𝛽 0 + 𝛽 1 𝑥 1 , 𝑤ℎ𝑒𝑟𝑒 𝑝=% 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠= exp 𝑚𝑜𝑑𝑒𝑙 1+exp(𝑚𝑜𝑑𝑒𝑙) Model Assumptions Data are from a stratified SRS Independence of responses between respondents. Sample size is large; 80% of predicted counts at or about 5; all expected counts are larger than 2 Model is specified correctly

19 Linear vs Logistic Regression – Covariates & Odds Ratios
Covariates Connected to Odds Ratios Example: Reading Comprehension 80%+ = I(Has English Book) Odds Ratio: English Book vs. No English Book = exp(0.76) = 2.14 Interpretation: On average students with English books will be 2 times more likely than students without English books to comprehend at least 80% of a connected text.

20 Linear vs Logistic Regression – Covariates & Probabilities
Covariates Connected to Probabilities Example: Reading Comprehension 80%+ = I(Has English Book) Probability: Pr(80%+ | English book) = exp − exp(−3+0.76) = 0.098 Interpretation: On average a student with an English book is 9.8% likely to comprehend at least 80% of a passage.

21 Logistic Regression Analysis – STATA Example
Recall: Reading Comprehension 80%+ = I(Has English Book) NOTE: code is very similar to linear regression STATA Code: svy: logistic eq_read_comp_score_pcnt80 i.e_book svy: logistic eq_read_comp_score_pcnt80 i.e_book, coef Why does e_book have an “i.” in front of the variable name? Why doesn’t eq_read_comp_score_pcnt80 have an “i.”? What is the difference between the two lines of code?

22 More Information Susan Edwards Research Statistician Sarrynna Sou Statistician


Download ppt "Intervention Study: Kenya PRIMR Case Regression Analysis"

Similar presentations


Ads by Google