What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll.

Slides:



Advertisements
Similar presentations
DC Responses Received WA OR ID MT WY CA NV UT CO AZ NM AK HI TX ND SD NE KS OK MN IA MO AR LA WI IL MI IN OH KY TN MS AL GA FL SC NC VA WV PA NY VT NH.
Advertisements

Background Information on the Newspoets Total Number: 78 active newspoets. 26 (of the original 36) newspoets from returned this year.
Visual description is an art
NICS Index State Participation As of 12/31/2007 DC NE NY WI IN NH MD CA NV IL OR TN PA CT ID MT WY ND SD NM KS TX AR OK MN OH WV MSAL KY SC MO ME MA DE.
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll.
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 9/III.2:
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll finish.
Agencies’ Participation in PBMS January 20, 2015 PA IL TX AZ CA Trained, Partial Data Entry (17) Required Characteristics & 75% of Key Indicators (8) OH.
Essential Health Benefits Benchmark Plan Selection, as of October 2012
Medicaid Eligibility for Working Parents by Income, January 2013
Visual Description of Data
House Price
Train-the-Trainer Sessions 240 sessions with 8,187 participants
House price index for AK
WY WI WV WA VA VT UT TX TN SD SC RI PA OR* OK OH ND NC NY NM* NJ NH
Children's Eligibility for Medicaid/CHIP by Income, January 2013
NJ WY WI WV WA VA VT UT TX TN SD SC RI PA OR OK OH ND NC NY NM NH NV
Comprehensive Medicaid Managed Care Models in the States, 2014
Share of Births Covered by Medicaid, 2006
Train-the-Trainer Sessions 386 sessions with 11,336 participants
Non-Citizen Population, by State, 2011
Share of Women Ages 18 – 64 Who Are Uninsured, by State,
Coverage of Low-Income Adults by Scope of Coverage, January 2013
Executive Activity on the Medicaid Expansion Decision, May 9, 2013
Populations included in States’ SIMRs for Part C FFY 2013 ( )
WY WI WV WA VA VT UT TX TN1 SD SC RI PA1 OR OK OH ND NC NY NM NJ NH2
WY WI WV WA VA VT UT TX TN1 SD SC RI PA OR OK OH1 ND NC NY NM NJ NH NV
Mobility Update and Discussion as of March 25, 2008
Current Status of the Medicaid Expansion Decision, as of May 30, 2013
IAH CONVERSION: ELIGIBLE BENEFICIARIES BY STATE
WAHBE Brokers / QHPs across the country as of
619 Involvement in State SSIPs
State Health Insurance Marketplace Types, 2015
State Health Insurance Marketplace Types, 2018
HHGM CASE WEIGHTS Early/Late Mix (Weighted Average)
Train-the-Trainer Sessions 386 sessions with 11,336 participants
Train-the-Trainer Sessions 394 sessions with 11,460 participants
Percent of Women Ages 19 to 64 Uninsured by State,
Train-the-Trainer Sessions 392 sessions with 11,432 participants
Sampling Distribution of a Sample Mean
Sampling Distribution of a Sample Mean
Medicaid Income Eligibility Levels for Parents, January 2017
State Health Insurance Marketplace Types, 2017
S Co-Sponsors by State – May 23, 2014
Seventeen States Had Higher Uninsured Rates Than the National Average in 2013; Of Those, 11 Have Yet to Expand Eligibility for Medicaid AK NH WA VT ME.
Employer Premiums as Percentage of Median Household Income for Under-65 Population, 2003 and percent of under-65 population live where premiums.
Employer Premiums as Percentage of Median Household Income for Under-65 Population, 2003 and percent of under-65 population live where premiums.
Average annual growth rate
Train-the-Trainer Sessions 250 sessions with 8,352 participants
Sampling Distribution of a Sample Mean
Uninsured Rate Among Adults Ages 19–64, 2008–09 and 2019
Percent of Children Ages 0–17 Uninsured by State
Train-the-Trainer Sessions 402 sessions with 11,649 participants
Executive Activity on the Medicaid Expansion Decision, May 9, 2013
How State Policies Limiting Abortion Coverage Changed Over Time
United States: age distribution family households and family size
Train-the-Trainer Sessions 402 sessions with 11,649 participants
Employer Premiums as Percentage of Median Household Income for Under-65 Population, 2003 and percent of under-65 population live where premiums.
Percent of Adults Ages 18–64 Uninsured by State
Uninsured Nonelderly Adult Rate Has Increased from Percent to 20
States’ selected SIMRs for Part C FFY 2013 ( )
Train-the-Trainer Sessions 401 sessions with 11,639 participants
States including quality standards in their SSIP improvement strategies for Part C FFY 2013 ( ) States including quality standards in their SSIP.
States including their fiscal systems in their SSIP improvement strategies for Part C FFY 2013 ( ) States including their fiscal systems in their.
Train-the-Trainer Sessions 416 sessions with 11,878 participants
Current Status of State Individual Marketplace and Medicaid Expansion Decisions, as of September 30, 2013 WY WI WV WA VA VT UT TX TN SD SC RI PA OR OK.
Income Eligibility Levels for Children in Medicaid/CHIP, January 2017
WY WI WV WA VA VT UT TX TN SD SC RI PA OR OK OH ND NC NY NM NJ NH NV
Train-the-Trainer Sessions 429 sessions with 12,141 participants
Train-the-Trainer Sessions 436 sessions with 12,254 participants
Presentation transcript:

What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll finish their doctorate earlier?  Are computer literates less anxious about statistics?  …. ?  Are men more likely to study part-time?  Are women more likely to enroll in CCE?  …. ? Questions that Require Us To Examine Relationships Between Features of the Participants.  How tall are class members, on average?  How many hours a week do class members report that they study?  …. ?  How many members of the class are women?  What proportion of the class is fulltime?  …. ? Questions That Require Us To Describe Single Features of the Participants “Continuous” Data “Categorical” Data Research Is A Partnership Of Questions And Data © Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 2 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis OPTIONS Nodate Pageno=1; TITLE1 'A010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 11/Handout 1: Dissecting Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; * * Input data, name and label variables in the dataset * *; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; * * Representing the nature of the relationship of HSGRADRT and STRATIO * *; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; OUTPUT OUT=DIAGNOSE R=RAWRESID P=PREDVAL; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; Having examined the “smooth” with regression analysis, let’s examine the “rough” with residual analysis … Here are the PC-SAS data input statements that you’ve come to know and love Here’s the OLS regression analysis, using PROC REG, that you’ve seen before (with one additional line that we will discuss later). Standard scatterplot of the HSGRADRT vs. STRATIO relationship

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 3 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept STRATIO 1988 Student/Teacher Ratio Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept STRATIO 1988 Student/Teacher Ratio Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio Here’s the regression output that you’ve seen before, and which specifies the fitted regression line….. These “Parameter Estimates” provide the fitted trend line as the following fitted model: Intercept Slope

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 4 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis The fitted equation is telling us PROC REG’s best prediction for HSGRADRT at every value of STRATIO. For instance… 1. When STRATIO = 13.3 (the minimum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = – = When STRATIO = 13.3 (the minimum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = – = When STRATIO = 24.7 (the maximum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = – = When STRATIO = 24.7 (the maximum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = – = 66.0 Plot these values to obtain the fitted trend line Here’s the fitted regression model that you recognize …

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 5 This provides us with the “smooth” – where’s the “rough”? … S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio Now, to examine the rough … Let’s pick a few states, and compare our predictions of HS graduation rate to the actual observed values. We call this the “analysis of residuals”… Now, to examine the rough … Let’s pick a few states, and compare our predictions of HS graduation rate to the actual observed values. We call this the “analysis of residuals”…

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 6 Here’s the “rough” for Minnesota … S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio How about Minnesota? Observed values of the outcome and the predictor: STRATIO = 17.1 HSGRADRT = 90.9, & Predicted value of HSGRADRT, obtained from fitted regression line: How about Minnesota? Observed values of the outcome and the predictor: STRATIO = 17.1 HSGRADRT = 90.9, & Predicted value of HSGRADRT, obtained from fitted regression line: Minnesota graduated a higher percentage of its High-School Seniors than we would have predicted, given its student/teacher ratio.

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 7 Here’s the “rough” for Hawaii … S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio How about Hawaii? Observed values of the outcome and the predictor: HSGRADRT = 69.1, & STRATIO = 21.6 Predicted value of HSGRADRT: How about Hawaii? Observed values of the outcome and the predictor: HSGRADRT = 69.1, & STRATIO = 21.6 Predicted value of HSGRADRT: Hawaii graduated about the percentage of its High-School Seniors that we would have predicted, given its student/teacher ratio

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 8 Here’s the “rough” for Minnesota … S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio How about New York State? Observed values of the outcome and the predictor: HSGRADRT = 62.3, & STRATIO = 15.2 Predicted value of HSGRADRT: How about New York State? Observed values of the outcome and the predictor: HSGRADRT = 62.3, & STRATIO = 15.2 Predicted value of HSGRADRT: New York State graduated a much smaller percentage of its High-School seniors than we would have predicted, based on its student/teacher ratio

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 9 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis On a scatterplot with a fitted regression line, the “vertical distance” between the observed value of HSGRADRT and its predicted value is called the residual….. State Residual Computation Conclusion: State graduated HS seniors at a rate that is… Minnesota(90.90 – 74.54) = 16.36… better than predicted, based on STRATIO Hawaii(69.10 – 69.50) = -0.40… about as predicted, based on STRATIO New York State(62.30 – 76.67) = … worse than predicted, based on STRATIO etc. Residuals can be informative and useful:  Residuals represent individual deviations from the average trend:  They tell us about HSGRADRT, while taking “into account” or “controlling for” STRATIO. They tell us whether states are doing “better” or “worse” than we would have predicted, given our knowledge of their student/teacher ratio. Residuals can be informative and useful:  Residuals represent individual deviations from the average trend:  They tell us about HSGRADRT, while taking “into account” or “controlling for” STRATIO. They tell us whether states are doing “better” or “worse” than we would have predicted, given our knowledge of their student/teacher ratio.

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 10 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis > * * Representing the nature of the relationship of HSGRADRT and STRATIO * *; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; OUTPUT OUT=DIAGNOSE R=RAWRESID P=PREDVAL; We don’t have to compute the residuals and predicted values by hand…. You can ask PC-SAS to compute the residuals for you, and to output them into a diagnostic dataset, for you to explore. You can ask PC-SAS to compute the residuals for you, and to output them into a diagnostic dataset, for you to explore. P = PREDVAL P command tells PC-SAS that you also want to put the predicted values into the new output dataset, and call them PREDVAL. P = PREDVAL P command tells PC-SAS that you also want to put the predicted values into the new output dataset, and call them PREDVAL. R = RAWRESID R command tells PC-SAS that you want to put “raw residuals” into the new output dataset, and call them RAWRESID R = RAWRESID R command tells PC-SAS that you want to put “raw residuals” into the new output dataset, and call them RAWRESID OUT = DIAGNOSE OUT command tells PC- SAS that you want to create an OUTput dataset called DIAGNOSE. OUT = DIAGNOSE OUT command tells PC- SAS that you want to create an OUTput dataset called DIAGNOSE.

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 11 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis * * Examining the distribution of the raw residuals * *; PROC UNIVARIATE PLOT DATA=DIAGNOSE; TITLE5 'Univariate descriptive statistics on the Raw Residuals'; VAR RAWRESID; ID STATE; PROC PLOT DATA=DIAGNOSE; TITLE5 'Plot of the Raw Residuals against the Values of the Predictor, STRATIO'; PLOT RAWRESID*STRATIO / HAXIS = 10 TO 25 BY 10 VREF = 0; * * Reranking the States based on the value of their raw residuals * *; PROC SORT DATA=DIAGNOSE; BY DESCENDING RAWRESID; PROC PRINT LABEL DATA=DIAGNOSE; TITLE5 'Listing of State Observed, Predicted and Residual Graduation Rates'; VAR STATE HSGRADRT PREDVAL RAWRESID; Once the residuals and predicted values are output to the DIAGNOSE dataset, you can take a look…. You can use PROC UNIVARIATE to explore the sample distribution of the raw residuals across the states. You can use PROC PLOT to plot the raw residuals against the predictor. You can use PROC SORT to sort the states by the value of their raw residual, and then use PROC PRINT to list them all out for inspection, along with the name of the state, and the observed and predicted values of HSGRADRT

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 12 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis Here are some of the univariate descriptive statistics on the residuals…. Variable: RAWRESID (Residual) N 50 Sum Weights 50 Mean 0 Sum Observations 0 Std Deviation Variance Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode. Range Interquartile Range Quantile Estimate 100% Max % % Q % Median % Q % % Min Extreme Observations Lowest Highest Value STATE Obs Value STATE Obs FL WY NY MT AZ ND GA UT LA MN 23 Sample mean of the raw residuals is exactly zero! Sample standard deviation of the raw residuals is 7.4. This number can be quite useful! Listing of “extreme observations” is useful for identifying states whose observed values of HSGRADRT are wildly different from their predicted values

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 13 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis Here’s the stem.leaf and boxplot of the residual… Stem Leaf # Boxplot | 14 | | | | | | | | + | *-----* | | | | -12 | | | Actually, for the p-values that were computed in the regression analysis to be correct, the residuals must be normally distributed:  You can use stem.leaf and box plots to check roughly if this assumption holds in your analysis … see S-030. Actually, for the p-values that were computed in the regression analysis to be correct, the residuals must be normally distributed:  You can use stem.leaf and box plots to check roughly if this assumption holds in your analysis … see S-030.

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 14 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis H.S. Predicted Graduation Value of STATE Rate HSGRADRT Residual MN UT ND MT WY IA WI NE CT OH WA ID NV KS SD PE AL AR IN MI IL CO WV VT OR HI MD NJ NM MO NH CA TN ME OK MA VA DL KY MS NC RI AK TX SC LA GA AZ NY FL sd +2 sd -1 sd -2 sd Here are the individual states, ordered by their residuals … Which are the truly extraordinary states?  If the residuals are normally distributed, then the truly extraordinary states may be those that lie ±2 standard deviations (= ± 2×7.4) from the mean?  Recall that the mean of the residuals is zero. Which are the truly extraordinary states?  If the residuals are normally distributed, then the truly extraordinary states may be those that lie ±2 standard deviations (= ± 2×7.4) from the mean?  Recall that the mean of the residuals is zero.

© Willett, Harvard University Graduate School of Education, 6/14/2016S010Y/C11 – Slide 15 S010Y: Answering Questions with Quantitative Data Class 12/III.4: Conducting Residual Analysis ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio An Enhanced Conclusion… In our investigation of state-level aggregate statistics, the average percentage of seniors graduating from High School is related to the average student/teacher ratio in the state. With state-wide high-school graduation rate (HSGRADRT) as outcome and state-wide student/teacher ratio (STRATIO) as predictor, the trend-line estimated by OLS regression analysis has a slope of –1.12 (p = ). This suggests that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, where states that enjoy lower student/teacher ratios having higher high-school graduation rates … > However, not all states follow the average trend. Some states graduate high-school seniors at rates considerably different from those predicted from knowledge of their student/teacher ratios. In particular, Minnesota has a very large positive residual indicating that its high-school graduation rate is much higher than we would expect, based on its student/teacher ratio. Florida, on the other hand, has a very large negative residual indicating that it is graduating high-school seniors at a rate that is much lower than we would anticipate … > An Enhanced Conclusion… In our investigation of state-level aggregate statistics, the average percentage of seniors graduating from High School is related to the average student/teacher ratio in the state. With state-wide high-school graduation rate (HSGRADRT) as outcome and state-wide student/teacher ratio (STRATIO) as predictor, the trend-line estimated by OLS regression analysis has a slope of –1.12 (p = ). This suggests that two states whose student/teacher ratios differ by 1 student per teacher will tend to have graduation rates that differ by 1.12 percentage points, where states that enjoy lower student/teacher ratios having higher high-school graduation rates … > However, not all states follow the average trend. Some states graduate high-school seniors at rates considerably different from those predicted from knowledge of their student/teacher ratios. In particular, Minnesota has a very large positive residual indicating that its high-school graduation rate is much higher than we would expect, based on its student/teacher ratio. Florida, on the other hand, has a very large negative residual indicating that it is graduating high-school seniors at a rate that is much lower than we would anticipate … >