Presentation is loading. Please wait.

Presentation is loading. Please wait.

MAT 254 – Probability and Statistics Sections 1,2 & 3 2015 - Spring.

Similar presentations


Presentation on theme: "MAT 254 – Probability and Statistics Sections 1,2 & 3 2015 - Spring."— Presentation transcript:

1 MAT 254 – Probability and Statistics Sections 1,2 & 3 2015 - Spring

2 1) Importance and basic concepts of Probability and Statistics. Introduction to Statistics and data analysis  2) Data collection and presentation 3) Measures of central tendency; mean, median, mode 4) Probability 5) Conditional probability 6) Discrete probability distributions 7) Continuous probability distributions Midterm Exam (April 1, 17:30) 8) Hypothesis testing (2 weeks) 9) Student t-test(2 weeks) 10) Chi-square 11) Correlation and regression analysis 12) REVIEW Final Exam (May 25- June 7) web.adu.edu.tr/user/oboyaci MAT254 - Probability & Statistics2

3 3 CORRELATION The correlations term is used when: 1) Both variables are random variables, 2) The end goal is simply to find a number that expresses the relation between the variables REGRESSION The regression term is used when 1) One of the variables is a fixed variable, 2) The end goal is use the measure of relation to predict values of the random variable based on values of the fixed variable

4 Copyright © 2010 Pearson Addison-Wesley. All rights reserved. 11 - 4

5 Copyright © 2010 Pearson Addison-Wesley. All rights reserved. 11 - 5

6 Copyright © 2010 Pearson Addison-Wesley. All rights reserved. 11 - 6

7 Copyright © 2010 Pearson Addison-Wesley. All rights reserved. 11 - 7

8 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-8  A scatter plot (or scatter diagram) is used to show the relationship between two variables  Correlation analysis is used to measure strength of the association (linear relationship) between two variables ◦ Only concerned with strength of the relationship ◦ No causal effect is implied

9 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-9 y x y x y y x x Linear relationshipsCurvilinear relationships

10 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-10 y x y x y y x x Strong relationshipsWeak relationships (continued)

11 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-11 y x y x No relationship (continued)

12 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-12  Correlation measures the strength of the linear association between two variables  The sample correlation coefficient r is a measure of the strength of the linear relationship between two variables, based on sample observations (continued)

13 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-13  Unit free  Range between -1 and 1  The closer to -1, the stronger the negative linear relationship  The closer to 1, the stronger the positive linear relationship  The closer to 0, the weaker the linear relationship

14 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-14 r = +.3r = +1 y x y x y x y x y x r = -1 r = -.6r = 0

15 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-15 where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable Sample correlation coefficient: or the algebraic equivalent:

16 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-16 Tree Height Trunk Diameter yxxyy2y2 x2x2 358280122564 499441240181 27718972949 336198108936 60137803600169 21714744149 45114952025121 51126122601144  =321  =73  =3142  =14111  =713

17 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-17 Trunk Diameter, x Tree Height, y (continued) r = 0.886 → relatively strong positive linear association between x and y

18 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-18  Regression analysis is used to: ◦ Predict the value of a dependent variable based on the value of at least one independent variable ◦ Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable

19 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-19  Only one independent variable, x  Relationship between x and y is described by a linear function  Changes in y are assumed to be caused by changes in x

20 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-20  Error values (ε) are statistically independent  Error values are normally distributed for any given value of x  The probability distribution of the errors is normal  The distributions of possible ε values have equal variances for all values of x  The underlying relationship between the x variable and the y variable is linear

21 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-21 Positive Linear Relationship Negative Linear Relationship Relationship NOT Linear No Relationship

22 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-22 Linear component The population regression model: Population y intercept Population Slope Coefficient Random Error term, or residual Dependent Variable Independent Variable Random Error component

23 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-23 (continued) Random Error for this x value y x Observed Value of y for x i Predicted Value of y for x i xixi Slope = β 1 Intercept = β 0 εiεi

24 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-24 The sample regression line provides an estimate of the population regression line Estimate of the regression intercept Estimate of the regression slope Estimated (or predicted) y value Independent variable The individual random error terms e i have a mean of zero

25 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-25  b 0 and b 1 are obtained by finding the values of b 0 and b 1 that minimize the sum of the squared residuals

26 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-26  The formulas for b 1 and b 0 are: algebraic equivalent for b 1 : and

27 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-27  b 0 is the estimated average value of y when the value of x is zero  b 1 is the estimated change in the average value of y as a result of a one-unit change in x

28 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-28  A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)  A random sample of 10 houses is selected ◦ Dependent variable (y) = house price in $1000s ◦ Independent variable (x) = square feet

29 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-29 House Price in $1000s (y) Square Feet (x) 2451400 3121600 2791700 3081875 1991100 2191550 4052350 3242450 3191425 2551700

30 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-30  House price model: scatter plot and regression line Slope = 0.10977 Intercept = 98.248

31 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-31  b 0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values) ◦ Here, no houses had 0 square feet, so b 0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet

32 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-32  b 1 measures the estimated change in the average value of Y as a result of a one-unit change in X ◦ Here, b 1 =.10977 tells us that the average value of a house increases by.10977($1000) = $109.77, on average, for each additional one square foot of size

33 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-33  The sum of the residuals from the least squares regression line is 0 ( )  The sum of the squared residuals is a minimum (minimized )  The simple regression line always passes through the mean of the y variable and the mean of the x variable  The least squares coefficients are unbiased estimates of β 0 and β 1

34 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-34  Total variation is made up of two parts: Total sum of Squares Sum of Squares Regression Sum of Squares Error where: = Average value of the dependent variable y = Observed values of the dependent variable = Estimated value of y for the given x value

35 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-35  SST = total sum of squares ◦ Measures the variation of the y i values around their mean y  SSE = error sum of squares ◦ Variation attributable to factors other than the relationship between x and y  SSR = regression sum of squares ◦ Explained variation attributable to the relationship between x and y (continued)

36 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-36 (continued) XiXi y x yiyi SST =  (y i - y) 2 SSE =  (y i - y i ) 2  SSR =  (y i - y) 2  _ _ _ y  y y _ y 

37 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-37  The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable  The coefficient of determination is also called R-squared and is denoted as R 2 where

38 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-38 R 2 = +1 y x y x R 2 = 1 Perfect linear relationship between x and y: 100% of the variation in y is explained by variation in x

39 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-39 y x y x 0 < R 2 < 1 Weaker linear relationship between x and y: Some but not all of the variation in y is explained by variation in x (continued)

40 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-40 R 2 = 0 No linear relationship between x and y: The value of Y does not depend on x. (None of the variation in y is explained by variation in x) y x R 2 = 0 (continued)

41 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-41 Coefficient of determination (continued) Note: In the single independent variable case, the coefficient of determination is where: R 2 = Coefficient of determination r = Simple correlation coefficient

42 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-42 House Price in $1000s (y) Square Feet (x) 2451400 3121600 2791700 3081875 1991100 2191550 4052350 3242450 3191425 2551700 Estimated Regression Equation: Predict the price for a house with 2000 square feet

43 Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 14-43 Predict the price for a house with 2000 square feet: The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850 (continued)

44 MAT254 - Probability & Statistics END OF THE LECTURE… 44


Download ppt "MAT 254 – Probability and Statistics Sections 1,2 & 3 2015 - Spring."

Similar presentations


Ads by Google