Presentation is loading. Please wait.

Presentation is loading. Please wait.

Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry

Similar presentations


Presentation on theme: "Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry"— Presentation transcript:

1 Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

2 2 Numerical Descriptive Measures Measures of central location –arithmetic mean, median, mode, (geometric mean) Measures of variability –range, variance, standard deviation, (coefficient of variation) Measures of association –covariance, coefficient of correlation

3 3 –This is the most popular and useful measure of central location Sum of the measurements Number of measurements Mean = Sample meanPopulation mean Sample sizePopulation size § Arithmetic mean Measures of Central Location Sum of the measurements Number of measurements Mean =

4 4 Example The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by 7 7 3 3 9 9 4 4 6 6 4.5 Example Calculate the mean of 212, -46, 52, -14, 66

5 5 26,26,28,29,30,32,60,31 Odd number of observations 26,26,28,29,30,32,60 Example 4.4 Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29. Find the median salary. –The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. Suppose one employee’s salary of $31,000 was added to the group recorded before. Find the median salary. Even number of observations 26,26,28,29, 30,32,60,31 There are two middle values! First, sort the salaries. Then, locate the value in the middle First, sort the salaries. Then, locate the value s in the middle 26,26,28,29, 30,32,60,31 29.5, § The median

6 6 –The mode of a set of measurements is the value that occurs most frequently. –Set of data may have one mode (or modal class), or two or more modes. The modal class § The mode

7 7 – Example The manager of a men’s store observes the waist size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40. What is the modal value?

8 8 Relationship among Mean, Median, and Mode If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode

9 9 ` If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean Median Mode A negatively skewed distribution (“skewed to the left”)

10 10 Measures of variability (Looking beyond the average) Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? How spread out are the measurements about the average value? or

11 11 Observe two hypothetical data sets The average value provides a good representation of the values in the data set. Low variability data set High variability data set The same average value does not provide as good presentation of the values in the data set as before. This is the previous data set. It is now changing to...

12 12 –The range of a set of measurements is the difference between the largest and smallest measurements. –Its major advantage is the ease with which it can be computed. –Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. ? ? ? But, how do all the measurements spread out? Smallest measurement Largest measurement The range cannot assist in answering this question Range § The range

13 13 –This measure of dispersion reflects the values of all the measurements. –The variance of a population of N measurements x 1, x 2,…,x N having a mean  is defined as –The variance of a sample of n measurements x 1, x 2, …,x n having a mean is defined as § The variance

14 14 Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 The mean of both populations is 10... …but measurements in B are much more dispersed then those in A. Thus, a measure of dispersion is needed that agrees with this observation. Let us start by calculating the sum of deviations A B The sum of deviations is zero in both cases, therefore, another measure is needed.

15 15 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 A B The sum of deviations is zero in both cases, therefore, another measure is needed. The sum of squared deviations is used in calculating the variance.

16 16 Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!!

17 17 Which data set has a larger dispersion? 131 32 5 AB Data set B is more dispersed around the mean Let us calculate the sum of squared deviations for both data sets Sum A = (1-2) 2 +…+(1-2) 2 +(3-2) 2 + … +(3-2) 2 = 10 Sum B = (1-3) 2 + (5-3) 2 = 8 5 times However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked  A 2 = Sum A /N = 10/5 = 2  B 2 = Sum B /N = 8/2 = 4 !

18 18 – Example Find the mean and the variance of the following sample of measurements (in years). 3.4, 2.5, 4.1, 1.2, 2.8, 3.7 – Solution A shortcut formula =1/5[3.4 2 +2.5 2 +…+3.7 2 ]-[(17.7) 2 /6] = 1.075 (years)

19 19 –The standard deviation of a set of measurements is the square root of the variance of the measurements. – Example Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05 Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4

20 20 –Solution –Let’s use the Excel printout that is run from the “Descriptive statistics” sub-menu Fund A should be considered riskier because its standard deviation is larger

21 21 Interpreting Standard Deviation The standard deviation can be used to –compare the variability of several distributions –make a statement about the general shape of a distribution. The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval

22 22 – Example The duration of 30 long-distance telephone calls are shown next. Check the empirical rule for the this set of measurements. Solution First check if the histogram has an approximate mound-shape

23 23 Calculate the intervals: Calculate the mean and the standard deviation: Mean = 10.26; Standard deviation = 4.29. Interval Empirical Rule Actual percentage 5.97, 14.5568%70% 1.68, 18.8495%96.7% -2.61, 23.13100%100% Interval Empirical Rule Actual percentage 5.97, 14.5568%70% 1.68, 18.8495%96.7% -2.61, 23.13100%100%

24 24 Measures of Association Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram. –Covariance - is there any pattern to the way two variables move together? –Correlation coefficient - how strong is the linear relationship between two variables

25 25  x (  y ) is the population mean of the variable X (Y) N is the population size. n is the sample size. § The covariance

26 26 If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number. If the two variables are unrelated, the covariance will be close to zero. If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number.

27 27 –This coefficient answers the question: How strong is the association between X and Y. § The coefficient of correlation

28 28 COV(X,Y)=0  or r = +1 0 Strong positive linear relationship No linear relationship Strong negative linear relationship or COV(X,Y)>0 COV(X,Y)<0

29 29 If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.

30 30 – Example Compute the covariance and the coefficient of correlation to measure how advertising expenditure and sales level are related to one another.

31 31 Use the procedure below to obtain the required summations xyxyx2x2 y2y2 Similarly, s y = 8.839

32 32 Excel printout Interpretation –The covariance (10.2679) indicates that advertisement expenditure and sales level are positively related –The coefficient of correlation (.797) indicates that there is a strong positive linear relationship between advertisement expenditure and sales level. Covariance matrixCorrelation matrix


Download ppt "Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry"

Similar presentations


Ads by Google