© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables What types of data are collected? “Categorical” Data “Continuous” Data What Kinds Of Question Can Be Asked Of Those Data? Questions That Require Us To Describe Single Features of the Participants How many members of the class are women? What proportion of the class is fulltime? …. ? How tall are class members, on average? How many hours a week do class members report that they study? …. ? Questions that Require Us To Examine Relationships Between Features of the Participants. Are men more likely to study part-time? Are women more likely to enroll in USP? …. ? Do people who say they study for more hours think they’ll finish their doctorate earlier? Are computer literates less anxious about statistics? …. ? Research Is A Partnership Of Questions And Data
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 2 Here’s the codebook for the data we’ll use in this part of the module … DatasetWALLCHT.txt Overview Summary information on selected aspects of state educational performance outcomes, resource inputs, and population characteristics, in Source US Department of EducationUS Department of Education and the National Center for Education Statistics.National Center for Education Statistics Sample Size50 states UpdatedDecember 5, 2003 ColVar NameDescriptionMetric 1STATEState postal abbreviationAlphabetic 2TCHRSALAverage teacher salary in the State.1988$ 3STRATIO Average number of students per teacher statewide. ratio 4PPEXPEND Average expenditure per pupil in the State. 1988$ 5HSGRADRT Average high-school graduation rate statewide % S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 3 We can use these data to address a variety of interesting research questions, including this one … Research Question: “Are high school graduation rates higher in states where there are fewer students per teacher?” Research Question: “Are high school graduation rates higher in states where there are fewer students per teacher?” question about a potential relationship between two continuous variables: Statewide High-School graduation rates (HSGRADRT), Student/Teacher ratio (STRATIO) question about a potential relationship between two continuous variables: Statewide High-School graduation rates (HSGRADRT), Student/Teacher ratio (STRATIO) So, in other words, I’m really asking: Are HSGRADRT and STRATIO related? So, in other words, I’m really asking: Are HSGRADRT and STRATIO related? How do we answer this question? S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 4 OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 9/Handout 1: Displaying Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; * * Input data, name and label variables in the dataset * *; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; * * Data Listing, with the States ranked in descending order by values of HSGRADRT * *; PROC SORT DATA=WALLCHT; BY DESCENDING HSGRADRT; PROC PRINT LABEL DATA=WALLCHT; TITLE5 'Listing of Data, in Descending Order of H.S. Graduation Rates'; VAR STATE HSGRADRT STRATIO TCHRSAL PPEXPEND; OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 9/Handout 1: Displaying Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; * * Input data, name and label variables in the dataset * *; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; * * Data Listing, with the States ranked in descending order by values of HSGRADRT * *; PROC SORT DATA=WALLCHT; BY DESCENDING HSGRADRT; PROC PRINT LABEL DATA=WALLCHT; TITLE5 'Listing of Data, in Descending Order of H.S. Graduation Rates'; VAR STATE HSGRADRT STRATIO TCHRSAL PPEXPEND; I begin the analysis in Class9/Handout1 -- here’s the start of the PC-SAS program … S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables Regular data input paragraph STATE is an “string” variable: Values are alphabetic characters (that is, the names of the states), We tell PC_SAS by putting a “$” symbol after the variable name in the input statement. STATE is an “string” variable: Values are alphabetic characters (that is, the names of the states), We tell PC_SAS by putting a “$” symbol after the variable name in the input statement. This paragraph sorts the data in descending order of high-school graduation rate, HSGRADRT, to facilitate comparisons across states. Print out the data for inspection Names the columns in the print listing with the variable labels, rather than the variable names
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 5 The data-listing produced by PC-SAS … demonstrates considerable heterogeneity on all four variables!!! 1988 Statewide H.S. Student/ Average 1988 Graduation Teacher Teacher Expenditure/ STATE Rate Ratio Salary Student MN ND WY MT IA NE CT WI KS OH SD UT VT PE NJ WV AR WA IN NV IL ID AL CO ME MA MD NH MO MI OR NM Statewide H.S. Student/ Average 1988 Graduation Teacher Teacher Expenditure/ STATE Rate Ratio Salary Student MN ND WY MT IA NE CT WI KS OH SD UT VT PE NJ WV AR WA IN NV IL ID AL CO ME MA MD NH MO MI OR NM DL OK VA RI TN HI KY MS NC CA AK TX SC NY LA AZ GA FL DL OK VA RI TN HI KY MS NC CA AK TX SC NY LA AZ GA FL S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 6 * * Descriptive statistics on graduation rates and student/teacher ratios * *; PROC UNIVARIATE PLOT DATA=WALLCHT; TITLE5 'Distribution of H.S. Graduation Rates and Student/Teacher Ratios'; VAR HSGRADRT STRATIO; ID STATE; * * Descriptive statistics on graduation rates and student/teacher ratios * *; PROC UNIVARIATE PLOT DATA=WALLCHT; TITLE5 'Distribution of H.S. Graduation Rates and Student/Teacher Ratios'; VAR HSGRADRT STRATIO; ID STATE; univariate descriptive statistics Then, I asked PC-SAS to provide univariate descriptive statistics on the HSGRADRT and STRATIO variables … S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables Here are the usual PROC UNIVARIATE commands to obtain: Univariate summary statistics, Stem-Leaf & Boxplots. On the WALLCHT data. Here are the usual PROC UNIVARIATE commands to obtain: Univariate summary statistics, Stem-Leaf & Boxplots. On the WALLCHT data. Specifies the variables for which descriptive statistics are required: Notice that you can list both HSGRADRT and STRATIO. Specifies the variables for which descriptive statistics are required: Notice that you can list both HSGRADRT and STRATIO. Implementing the ID command ensures that the cases are identified by the (alphabetic) value of the STATE variable
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 7 The UNIVARIATE Procedure Variable: HSGRADRT (1988 Statewide H.S. Graduation Rate) N 50 Sum Weights 50 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest Value STATE Obs Value STATE Obs 58.0 FL IA GA MT AZ ND LA WY NY MN 1 The UNIVARIATE Procedure Variable: HSGRADRT (1988 Statewide H.S. Graduation Rate) N 50 Sum Weights 50 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest Value STATE Obs Value STATE Obs 58.0 FL IA GA MT AZ ND LA WY NY MN 1 Here are the univariate descriptive statistics for continuous variable HSGRADRT … Can you interpret these univariate descriptive statistics? Stem Leaf # Boxplot | | | | 82 | | | | *--+--* | | | | | | | | | Stem Leaf # Boxplot | | | | 82 | | | | *--+--* | | | | | | | | | S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 8 The UNIVARIATE Procedure Variable: STRATIO (1988 Student/Teacher Ratio) N 50 Sum Weights 50 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest Value STATE Obs Value STATE Obs 13.3 CT NV MA ID VT HI NJ CA WY UT 12 The UNIVARIATE Procedure Variable: STRATIO (1988 Student/Teacher Ratio) N 50 Sum Weights 50 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min Extreme Observations Lowest Highest Value STATE Obs Value STATE Obs 13.3 CT NV MA ID VT HI NJ CA WY UT 12 Here are the univariate descriptive statistics on continuous variable STRATIO ….. Can you interpret these univariate descriptive statistics? Stem Leaf # Boxplot | 22 | | 21 | | | | | | | | | *--+--* | | | | | | | | | Stem Leaf # Boxplot | 22 | | 21 | | | | | | | | | *--+--* | | | | | | | | | S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 9 display simultaneouslybivariate scatterplot … But, are HSGRADRT and STRATIO related? To address this question, we must display HSGRADRT and STRATIO simultaneously in a bivariate scatterplot … * * Displaying the relationship between HSGRADRT and STRATIO * *; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; RUN; * * Displaying the relationship between HSGRADRT and STRATIO * *; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; RUN; S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables PROC PLOT is a PC_SAS routine that produces bivariate scatter-plots of continuous variables vertical axis Choose an appropriate scaling for the vertical axis. horizontal axis Choose an appropriate scaling for the horizontal axis. vertical axis horizontal axis Plot HSGRADRT on the vertical axis versus STRATIO on the horizontal axis
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio Here’s a bivariate plot of HSGRADRT versus STRATIO … ? S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables OHIO display values of outcome HSGRADRT & predictor STRATIO simultaneously Points on the scatterplot – like symbol “A” -- represent each State, and display values of outcome HSGRADRT & predictor STRATIO simultaneously. In Ohio, HSGRADRT=79.6, STRATIO=18.0. display values of outcome HSGRADRT & predictor STRATIO simultaneously Points on the scatterplot – like symbol “A” -- represent each State, and display values of outcome HSGRADRT & predictor STRATIO simultaneously. In Ohio, HSGRADRT=79.6, STRATIO=18.0. Vertical axis HSGRADRT Vertical axis (or ordinate), displays the value of “outcome,” HSGRADRT Horizontal axis STRATIO Horizontal axis (or abscissa), displays the value of “predictor,” STRATIO
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio And, how can we tell if HSGRADRT and STRATIO are related? Is this the case here? Two variables are related if… S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio What kind of line, curve or other construction best summarizes the observed relationship between HSGRADRT and STRATIO? You be the judge? S010Y: Answering Questions with Quantitative Data Class 9/III.2: Displaying Relationships Between Continuous Variables
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C10 – Slide 13 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio ˆ 9 ‚ 8 ‚ ‚ S ‚ t ‚ A a 90 ˆ t ‚ A A e ‚ A w ‚ A A i ‚ A A d ‚ e ‚ 80 ˆ B A A H ‚ A A. ‚ A A A A S ‚ A A A A. ‚ A A AA A A A A ‚ A G ‚ AA A A r 70 ˆ A A a ‚ A A d ‚ A A u ‚ B A a ‚ A t ‚ A i ‚ AB o 60 ˆ n ‚ A ‚ R ‚ a ‚ t ‚ e ‚ 50 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ Student/Teacher Ratio What kind of line, curve or other construction best summarizes the observed relationship between HSGRADRT and STRATIO? Here’s My Best Guess! It was obtained by a mystery process called “Ordinary Least-Squares (OLS) Regression Analysis.” Here’s My Best Guess! It was obtained by a mystery process called “Ordinary Least-Squares (OLS) Regression Analysis.”
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C10 – Slide 14 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; * * Input data, name and label variables in the dataset * *; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; * * Using regression analysis to summarize the relationship of HSGRADRT and STRATIO * *; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; * * Plotting the relationship between HSGRADRT and STRATIO * *; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; RUN; OPTIONS Nodate Pageno=1; TITLE1 ‘S010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 10/Handout 1: Summarizing Relationships Between Continuous Variables'; TITLE3 'The Infamous Wallchart Data'; TITLE4 'Data in WALLCHT.txt'; * * Input data, name and label variables in the dataset * *; DATA WALLCHT; INFILE 'C:\DATA\A010Y\WALLCHT.txt'; INPUT STATE $ TCHRSAL STRATIO PPEXPEND HSGRADRT; LABEL TCHRSAL = '1988 Average Teacher Salary' STRATIO = '1988 Student/Teacher Ratio' PPEXPEND = '1988 Expenditure/Student' HSGRADRT = '1988 Statewide H.S. Graduation Rate'; * * Using regression analysis to summarize the relationship of HSGRADRT and STRATIO * *; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; * * Plotting the relationship between HSGRADRT and STRATIO * *; PROC PLOT DATA=WALLCHT; TITLE5 'Plot of H.S. Graduation Rates against Student/Teacher Ratios'; PLOT HSGRADRT*STRATIO / HAXIS = 10 TO 25 BY 5 VAXIS = 50 TO 100 BY 10; RUN; Of course, you can also get PC-SAS to tell you where the OLS-fitted regression line is … Here are the usual data input statements Here are the PC- SAS regression analysis commands – we dissect them in detail on the next slide Creates another scatterplot of the data for use later
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C10 – Slide 15 * * Using regression analysis to summarize the relationship of HSGRADRT and STRATIO * *; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; * * Using regression analysis to summarize the relationship of HSGRADRT and STRATIO * *; PROC REG DATA=WALLCHT; TITLE5 'OLS Regression of H.S. Graduation Rate on Student/Teacher Ratio'; MODEL HSGRADRT = STRATIO; S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Here’s the part of the PC_SAS program that deals specifically with the OLS Regression Analysis of the HSGRADRT versus STRATIO relationship … You request an OLS Regression Analysis by specifying a “Regression Model” that identifies the “Outcome” and the “Predictor(s)” to include in the analysis: Model HSGRADRT = STRATIO You request an OLS Regression Analysis by specifying a “Regression Model” that identifies the “Outcome” and the “Predictor(s)” to include in the analysis: Model HSGRADRT = STRATIO You identify the outcome variable (HSGRADRT) by placing it to the left of the “equals” sign, in the MODEL statement You identify the predictor variable (STRATIO) by placing it to the right of the “equals” sign, in the MODEL statement PROC REG is the command in PC-SAS that requests an OLS Regression Analysis
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C10 – Slide 16 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables The REG Procedure Model: MODEL1 Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept STRATIO 1988 Student/Teacher Ratio Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio Here’s output from the OLS Regression Analysis of Outcome HSGRADRT on Predictor STRATIO….. This is the major part of the “regression analysis” output. I unpack it on the next several slides This is the major part of the “regression analysis” output. I unpack it on the next several slides Ignore this part of the output. When you go on to S030, you’ll learn what it all means
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C10 – Slide 17 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept STRATIO 1988 Student/Teacher Ratio Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio Dependent Variable: HSGRADRT 1988 Statewide H.S. Graduation Rate Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Intercept Intercept STRATIO 1988 Student/Teacher Ratio Parameter Estimates Variable Label DF Pr > |t| Intercept Intercept 1 <.0001 STRATIO 1988 Student/Teacher Ratio The core part of the OLS Regression Output describes the fitted regression line.. How do you work with this “Fitted Model”? These “Parameter Estimates” tell you where PROC REG thinks that the fitted trend line should be drawn … by listing them, it’s telling you that the fitted trend line has the following algebraic equation:
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C10 – Slide 18 S010Y : Answering Questions with Quantitative Data Class 10&11/III.3: Summarizing Relationships Between Continuous Variables Let’s try a couple.. Remember that the fitted equation is telling us PROC REG’s best prediction for HSGRADRT at each value of STRATIO. For instance… 1. When STRATIO = 13.3 (the minimum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = – = When STRATIO = 13.3 (the minimum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(13.3) = – = When STRATIO = 24.7 (the maximum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = – = When STRATIO = 24.7 (the maximum value of STRATIO), Predicted value of HSGRADRT = (93.69) + (-1.12)(24.7) = – = 66.0 You can substitute reasonable values for predictor, STRATIO, into the fitted equation and can then use it to compute the best predictions – or predicted values -- for HSGRADRT, as follows: Recognize these values?