Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Spring 2017 Room 150 Harvill Building 9:00 - 9:50 Mondays, Wednesdays & Fridays. Welcome http://www.youtube.com/watch?v=oSQJP40PcGI http://www.youtube.com/watch?v=oSQJP40PcGI
A note on doodling
Schedule of readings Before our fourth and final exam (May 1st) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
Homework on class website: Please complete homework worksheet #24 Simple Regression Worksheet Due: Monday, April 17th
Lab sessions Everyone will want to be enrolled in one of the lab sessions Project 4
By the end of lecture today 4/12/17 Simple Regression Using correlation for predictions
Project 4 - Two Correlations - We will use these to create two regression analyses This lab builds on the work we did in our very first lab. But now we are using the correlation for prediction. This is called regression analysis
+0.9199 3 0.878
+0.9199 3 0.878 Yes Yes The relationship between the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05
3 -0.73 3 0.878 No No The relationship between wait time and number of operators working is negative and strong, but not reliable enough to reach significance. This correlation is not significant, r(3) = -0.73; n.s.
We are measuring 9 students
Critical r = 0.666 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) Do not reject null r is not significant Do not reject null r is not significant Reject Null r is significant r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 4.0 3.0 2.0 1.0 GPA GPA GPA 0 1 2 3 4 0 200 300 400 500 600 0 200 300 400 500 600 High School GPA SAT (Verbal) SAT (Mathematical) r(7) = 0.50 r(7) = + 0.80 r(7) = + 0.80 r(7) = + 0.911444123 r(7) = + 0.616334867 r(7) = + 0.487295007
How to complete scatterplots, correlations and simple regressions using Excel Real time demo
Correlation: Independent and dependent variables When used for prediction we refer to the predicted variable as the dependent variable and the predictor variable as the independent variable What are we predicting? What are we predicting? Dependent Variable Dependent Variable Independent Variable Independent Variable
Correlation - What do we need to define a line If you probably make this much Expenses per year Yearly Income Y-intercept = “a” (also “b0”) Where the line crosses the Y axis Slope = “b” (also “b1”) How steep the line is If you spend this much The predicted variable goes on the “Y” axis and is called the dependent variable The predictor variable goes on the “X” axis and is called the independent variable
Dustin spends $12 for his Birthday Angelina Jolie Buys Brad Pitt a $24 million Heart-Shaped Island for his 50th Birthday Angelina probably makes this much Expenses per year Yearly Income Dustin probably makes this much Dustin spent this much Angelina spent this much Dustin spends $12 for his Birthday Revisit this slide
Assumptions Underlying Linear Regression For each value of X, there is a group of Y values These Y values are normally distributed. The means of these normal distributions of Y values all lie on the straight line of regression. The standard deviations of these normal distributions are equal. Revisit this slide
Correlation - the prediction line - what is it good for? Prediction line makes the relationship easier to see (even if specific observations - dots - are removed) identifies the center of the cluster of (paired) observations identifies the central tendency of the relationship (kind of like a mean) can be used for prediction should be drawn to provide a “best fit” for the data should be drawn to provide maximum predictive power for the data should be drawn to provide minimum predictive error
Predicting Restaurant Bill Prediction line Y’ = a + b1X1 Cost will be about 95.06 Predicting Restaurant Bill Cost Y-intercept The expected cost for dinner for two couples (4 people) would be $95.06 Cost = 15.22 + 19.96 Persons People If People = 4 Slope If “Persons” = 4, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (4) Cost = 15.22 + 79.84 = 95.06 If “Persons” = 1, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (1) Cost = 15.22 + 19.96 = 35.18
Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (800) Rent = 150 + 840 = 990 Prediction line Y’ = a + b1X1 Rent will be about 990 Predicting Rent Cost Y-intercept Slope If SqFt = 800 Square Feet The expected cost for rent on an 800 square foot apartment is $990 Rent = 150 + 1.05 SqFt If “SqFt” = 800, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (800) Rent = 150 + 840 = 990 If “SqFt” = 2500, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (2500) Rent = 150 + 2625 = 2,775
Thank you! See you next time!!