Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H 131211109 87 Row A 14131211109.

Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H 131211109 87 Row A 14131211109 87 Row B 1514131211109 87 Row C 1514131211109 87 Row D 16 1514131211109 87 Row E 17 16 1514131211109 87 Row F 1716 1514131211109 87 Row G 1716 1514131211109 87 Row H 16 18 table Row A Row B Row C Row D Row E Row F Row G Row H 15141716 1819 16 15 18171920 17161918 2021 18172019 2122 19182120 2223 20192221 2324 18172019 2122 19182120 2223 2143 56 2143 56 2143 56 2143 56 2143 56 2143 56 2143 56 2143 56 Row J Row K Row L Row M Row N Row P 2143 5 2143 5 2143 5 2143 5 2143 5 1 5 Row J Row K Row L Row M Row N Row P 27262928 30 25242726 28 24232625 27 23222524 26 25242726 28 27262928 30 6 14 131211109 87 16151817 19 202122 614131211109 87 16 15 18 17 19 20212223 614131211109 87 16 15 18171920 2122 23 6 14 131211109 87 1624181719 20 2122 231525 6 14 131211109 87 1624181719 20 2122 231525 Row Q 2143 5 27262928 30 6 14 131211109 87 242223 21 - 15 25 37363938 40 34 3132 3335 69 87 13 table 14 18 192021

MGMT 276: Statistical Inference in Management Fall 2015

Before our next exam (December 3 rd ) OpenStax Chapters 1 – 13 (Chapter 12 is emphasized) Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions Schedule of readings

Logic of hypothesis testing with Correlations Interpreting the Correlations and scatterplots Simple and Multiple Regression Using correlation for predictions r versus r 2 Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) Coefficient of correlation is name for “r” Coefficient of determination is name for “r 2 ” (remember it is always positive – no direction info) Standard error of the estimate is our measure of the variability of the dots around the regression line (average deviation of each data point from the regression line – like standard deviation) Coefficient of regression will “b” for each variable (like slope) Over next couple of lectures 11/17/15

Regression - What do we need to define a line Expenses per year Yearly Income Y-intercept = “a” ( also “b 0 ”) Where the line crosses the Y axis Slope = “b” ( also “b 1 ”) How steep the line is If you spend this much If you probably make this much The predicted variable goes on the “Y” axis and is called the dependent variable The predictor variable goes on the “X” axis and is called the independent variable Revisit this slide

Assumptions Underlying Linear Regression These Y values are normally distributed. The means of these normal distributions of Y values all lie on the straight line of regression. For each value of X, there is a group of Y values The standard deviations of these normal distributions are equal. Revisit this slide

Correlation - the prediction line Prediction line makes the relationship easier to see (even if specific observations - dots - are removed) identifies the center of the cluster of (paired) observations identifies the central tendency of the relationship (kind of like a mean) can be used for prediction should be drawn to provide a “best fit” for the data should be drawn to provide maximum predictive power for the data should be drawn to provide minimum predictive error - what is it good for? Revisit this slide

Predicting Restaurant Bill The expected cost for dinner for two couples (4 people) would be $95.06 Cost = 15.22 + 19.96 Persons If “Persons” = 4, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (4) Cost = 15.22 + 79.84 = 95.06 Prediction line Y’ = a + b 1 X 1 Y-intercept Slope If “Persons” = 1, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (1) Cost = 15.22 + 19.96 = 35.18 People Cost If People = 4 Cost will be about 95.06 Revisit this slide

Predicting Rent The expected cost for rent on an 800 square foot apartment is $990 Rent = 150 + 1.05 SqFt If “SqFt” = 800, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (800) Rent = 150 + 840 = 990 Prediction line Y’ = a + b 1 X 1 Y-intercept Slope Square Feet Cost If SqFt = 800 Rent will be about 990 If “SqFt” = 2500, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (2500) Rent = 150 + 840 = 2,775 Revisit this slide

Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least) productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.

Regression Example Do more sales calls result in more sales made? Dependent Variable Independent Variable Ethan Isabella Ava Emma Emily Jacob Joshua 60 70 0 1 2 3 4 Number of sales calls made Number of systems sold 10 20 30 40 50 0 Step 1: Draw scatterplot Step 2: Estimate r

Regression Example Do more sales calls result in more sales made? Step 3: Calculate r Step 4: Is it a significant correlation?

Do more sales calls result in more sales made? Step 4: Is it a significant correlation? n = 10, df = 8 alpha =.05 Observed r is larger than critical r (0.71 > 0.632) therefore we reject the null hypothesis. Yes it is a significant correlation r (8) = 0.71; p < 0.05 Step 3: Calculate r Step 4: Is it a significant correlation?

Regression: Predicting sales Step 1: Draw prediction line What are we predicting? r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation

Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation

Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(1) Y’ = 32.105 If make one sales call You should sell 32.105 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 1 calls? Madison Joshua They should sell 32.105 systems If they sell more  over performing If they sell fewer  underperforming

Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(2) Y’ = 43.684 Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 2 calls? If make two sales call You should sell 43.684 systems Isabella Jacob They should sell 43.68 systems If they sell more  over performing If they sell fewer  underperforming

Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(3) Y’ = 55.263 Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 3 calls? If make three sales call You should sell 55.263 systems Ava Emma They should sell 55.263 systems If they sell more  over performing If they sell fewer  underperforming

Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 4 calls? Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(4) Y’ = 66.842 If make four sales calls You should sell 66.84 systems Emily They should sell 66.84 systems If they sell more  over performing If they sell fewer  underperforming

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels What should you expect from each salesperson They should sell x systems depending on sales calls If they sell more  over performing If they sell fewer  underperforming Madison Isabella Ava Emma Emily Jacob Joshua

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels How did Ava do? Ava sold 14.7 more than expected taking into account how many sales calls she made  over performing Ava 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) 70-55.3=14.7

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels How did Jacob do? Jacob sold 23.684 fewer than expected taking into account how many sales calls he made  under performing Ava -23.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Jacob 20-43.7=-23.7

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels What should you expect from each salesperson They should sell x systems depending on sales calls If they sell more  over performing If they sell fewer  underperforming Madison Isabella Ava Emma Emily Jacob Joshua

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Madison Isabella Ava Emma Emily Jacob Joshua 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) -23.7 -6.8 7.9

14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Does the prediction line perfectly the predicted variable when using the predictor variable? The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions How would we find our “average residual”? No, we are wrong sometimes… How can we estimate how much “error” we have? Exactly? -23.7

Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) How do we find the average amount of error in our prediction The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions How would we find our “average residual”? Step 1: Find error for each value (just the residuals) Y – Y’ Ava is 14.7 Emily is -6.8 Madison is 7.9 Jacob is -23.7 Residual scores The average amount by which actual scores deviate on either side of the predicted score N ΣxΣx Big problem Σ (Y – Y’) = 0 2 Square the deviations Step 2: Add up the residuals Σ (Y – Y’) Divide by df 2 n - 2 Σ (Y – Y’) Square root

Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) How do we find the average amount of error in our prediction The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions How would we find our “average residual”? Step 1: Find error for each value (just the residuals) Y – Y’ Step 2: Find average ∑(Y – Y’) 2 n - 2 √ Diallo is 0” Mike is -4” Hunter is -2 Preston is 2” Deviation scores N ΣxΣx Sound familiar??

These would be helpful to know by heart – please memorize these formula Standard error of the estimate (line) =

Slope doesn’t give “variability” info Intercept doesn’t give “variability info Correlation “r” does give “variability info How well does the prediction line predict the predicted variable when using the predictor variable? Residuals do give “variability info Standard error of the estimate (line) What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate: a measure of the average amount of predictive error the average amount that Y’ scores differ from Y scores a mean of the lengths of the green lines

Shorter green lines suggest better prediction – smaller error Longer green lines suggest worse prediction – larger error Why are green lines vertical? Remember, we are predicting the variable on the Y axis So, error would be how we are wrong about Y (vertical) How well does the prediction line predict the Ys from the Xs? Residuals A note about curvilinear relationships and patterns of the residuals

When would our predictions be perfect (with no error at all)? Perfect correlation = +1.00 or -1.00 One variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line Any Residuals?

Assumptions Underlying Linear Regression These Y values are normally distributed. The means of these normal distributions of Y values all lie on the straight line of regression. For each value of X, there is a group of Y values The standard deviations of these normal distributions are equal.

14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Does the prediction line perfectly the predicted variable when using the predictor variable? The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions No, we are wrong sometimes… How can we estimate how much “error” we have? -23.7 Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

Regression Analysis – Least Squares Principle When we calculate the regression line we try to: minimize distance between predicted Ys and actual (data) Y points (length of green lines) remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

Is the regression line better than just guessing the mean of the Y variable? How much does the information about the relationship actually help? Which minimizes error better? How much better does the regression line predict the observed results? r2r2 Wow!

What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable If mother’s and daughter’s heights are correlated with an r =.8, then what amount (proportion or percentage) of variance of mother’s height is accounted for by daughter’s height? Examples.64 because (.8) 2 =.64

What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If mother’s and daughter’s heights are correlated with an r =.8, then what proportion of variance of mother’s height is not accounted for by daughter’s height? Examples.36 because (1.0 -.64) =.36 or 36% because 100% - 64% = 36%

What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If ice cream sales and temperature are correlated with an r =.5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? Examples.25 because (.5) 2 =.25

What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If ice cream sales and temperature are correlated with an r =.5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? Examples.75 because (1.0 -.25) =.75 or 75% because 100% - 25% = 75%

Some useful terms Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) Coefficient of correlation is name for “r” Coefficient of determination is name for “r 2 ” (remember it is always positive – no direction info) Standard error of the estimate is our measure of the variability of the dots around the regression line (average deviation of each data point from the regression line – like standard deviation)

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Madison Isabella Ava Emma Emily Jacob Joshua 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) -23.7

Summary Interpret r = 0.71 Positive relationship between the number of sales calls and the number of copiers sold. Strong relationship Remember, we have not demonstrated cause and effect here, only that the two variables—sales calls and copiers sold—are related.

Correlation Coefficient – Excel Example Interpret r = 0.71 Does this correlation reach significance? n = 10, df = 8 alpha =.05 Observed r is larger than critical r (0.759 > 0.632) therefore we reject the null hypothesis. r (8) = 0.71; p < 0.05

Coefficient of Determination – Excel Example Interpret r 2 = 0.504 (.71 2 =.504) we can say that 50.4 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls. Remember, we lose the directionality of the relationship with the r 2

Homework Review

the hours worked and weekly pay is a strong positive correlation. This correlation is significant, r(3) = 0.92; p < 0.05 The relationship between +0.92 positive strong up down 6.0857 55.286 y' = 6.0857x + 55.286 207.43 85.71.846231 or 84% 84% of the total variance of “weekly pay” is accounted for by “hours worked” For each additional hour worked, weekly pay will increase by $6.09

400 380 360 340 320 300 4 8 5 6 7 Number of Operators Wait Time 280

-.73 The relationship between wait time and number of operators working is negative and moderate. This correlation is not significant, r(3) = 0.73; n.s. negative strong number of operators increase, wait time decreases 458 -18.5 y' = -18.5x + 458 365 seconds 328 seconds.53695 or 54% The proportion of total variance of wait time accounted for by number of operators is 54%. For each additional operator added, wait time will decrease by 18.5 seconds Critical r = 0.878 No we do not reject the null

39 36 33 30 27 24 21 Median Income Percent of BAs 45 48 51 54 57 60 63 66

0.8875 The relationship between median income and percent of residents with BA degree is strong and positive. This correlation is significant, r(8) = 0.89; p < 0.05. positive strong median income goes up so does percent of residents who have a BA degree 3.1819 25% of residents 35% of residents.78766 or 78% The proportion of total variance of % of BAs accounted for by median income is 78%. For each additional $1 in income, percent of BAs increases by.0005 Percent of residents with a BA degree 10 8 0.0005 y' = 0.0005x + 3.1819 Critical r = 0.632 Yes we reject the null

30 27 24 21 18 15 12 Median Income Crime Rate 45 48 51 54 57 60 63 66

-0.6293 The relationship between crime rate and median income is negative and moderate. This correlation is not significant, r(8) = -0.63; p < n.s. [0.6293 is not bigger than critical of 0.632]. negative moderate median income goes up, crime rate tends to go down 4662.5 2,417 thefts 1,418.5 thefts.396 or 40% The proportion of total variance of thefts accounted for by median income is 40%. For each additional $1 in income, thefts go down by.0499 Crime Rate 10 8 -0.0499 y' = -0.0499x + 4662.5 Critical r = 0.632 No we do not reject the null

Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H 131211109 87 Row A 14131211109.

Similar presentations

Presentation on theme: "Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H 131211109 87 Row A 14131211109."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H 131211109 87 Row A 14131211109.

Similar presentations

Presentation on theme: "Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H 131211109 87 Row A 14131211109."— Presentation transcript:

Similar presentations

About project

Feedback