Download presentation
Presentation is loading. Please wait.
Published byAnabel Craig Modified over 8 years ago
1
1 Economics 173 Business Statistics Lectures 1 & 2 Summer, 2001 Professor J. Petry
2
2 Introduction Purpose of Statistics is to pull out information from data –“without data, ours is just another opinion” –“without statistics, we are just another person on data overload” Because of its broad usage across disciplines, Statistics is probably the most useful course irrespective of major. –More data, properly analyzed allows for better decisions in personal as well as professional lives –Applicable in nearly all areas of business as well as social sciences –Greatly enhances credibility
3
3 Statistics as “Tool Chest” Different types of data, allow different types of analysis Quantitative data –values are real numbers, arithmetic calculations are valid Qualitative data –categorical data, values are arbitrary names of possible categories, calculations involve how many observations in each category Ranked data –categorical data, values must represent the ranked order of responses, calculations are based on an ordering process. Time series data –data collected across different points of time Cross-sectional data –data collected at a certain point in time
4
4 Statistics as “Tool Chest” Different objectives call for alternative tool usage Describe a single population Compare two populations Compare two or more populations Analyze relationship between two variables Analyze relationship among two or more variables By conclusion of Econ 172 & 173, you will have about 35 separate tools to select from depending upon your data type and objective
5
5 Describe a single population Compare two populations Compare two or more populations Analyze relationships between two variables Analyze relationships among two or more variables. Problem Objective?
6
6 Describe a single population Z- test & estimator of p Z- test & estimator of p Central location Variability t- test & estimator of t- test & estimator of - test & estimator of 2 - test & estimator of 2 Data type? QuantitativeQualitative TwoTwo or more Type of descriptive measurements? Number of categories? 2 goodness of fit test 2 goodness of fit test
7
7 Experimental design? Type of descriptive measurements? Compare two populations Data type? Sign test Sign test Central location Variability F- test & estimator of 2 / 2 F- test & estimator of 2 / 2 Experimental design? Continue Wilcoxon rank sum test Wilcoxon rank sum test Independent samples Matched pairs Number of categories Two Two or more Z - test & estimator of p 1 - p 2 2 -test of a contingency table Quantitative Ranked Qualitative Continue
8
8 Independent samples Matched pairs t- test & estimator of D t- test & estimator of D Population variances EqualUnequal Wilcoxon signed rank sum test Wilcoxon signed rank sum test Wilcoxon rank sum test Wilcoxon rank sum test Population distribution NormalNonnormal Distribution of differences NormalNonnormal t- test & estimator of 1 - 2 (equal variances) t- test & estimator of 1 - 2 (equal variances) T-test & estimator of 1 - 2 (unequal variances) T-test & estimator of 1 - 2 (unequal variances) Continue Experimental Design
9
9 Independent samples Blocks NormalNonnormal Normal ANOVA (independent samples) ANOVA (independent samples) Kruskal-Wallis test Kruskal-Wallis test Friedman test Friedman test Compare two or more populations Friedman test Friedman test Kruskal-Wallis test Kruskal-Wallis test Data type? Quantitative Ranked Qualitative ANOVA (randomized blocks) ANOVA (randomized blocks) Population distribution Population distribution 2 - test of a contingency table 2 - test of a contingency table Experimental design? Independent samplesBlocks Experimental design?
10
10 Data type? Quantitative Ranked Qualitative Not covered Multiple regression Analyze relationship between two or more variables Analyze relationship between two variables Data type? 2 - test of a contingency table 2 - test of a contingency table Spearman rank correlation Spearman rank correlation Spearman rank correlation Spearman rank correlation Simple linear regression and correlation Simple linear regression and correlation Error is normal, or x and y are bivariate normal x and y are not bivariate normal Population distribution Ranked QualitativeQuantitative
11
11 Numerical Descriptive Measures Measures of central location –arithmetic mean, median, mode, (geometric mean) Measures of variability –range, variance, standard deviation, coefficient of variation Measures of association –covariance, coefficient of correlation
12
12 –This is the most popular and useful measure of central location Sum of the measurements Number of measurements Mean = Sample meanPopulation mean Sample sizePopulation size § Arithmetic mean Measures of Central Location Sum of the measurements Number of measurements Mean =
13
13 Example The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by 7 7 3 3 9 9 4 4 6 6 4.5 Example Calculate the mean of 212, -46, 52, -14, 66 54
14
14 26,26,28,29,30,32,60,31 Odd number of observations 26,26,28,29,30,32,60 Example 4.4 Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29. Find the median salary. –The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. Suppose one employee’s salary of $31,000 was added to the group recorded before. Find the median salary. Even number of observations 26,26,28,29, 30,32,60,31 There are two middle values! First, sort the salaries. Then, locate the value in the middle First, sort the salaries. Then, locate the value s in the middle 26,26,28,29, 30,32,60,31 29.5, § The median
15
15 –The mode of a set of measurements is the value that occurs most frequently. –Set of data may have one mode (or modal class), or two or more modes. The modal class § The mode
16
16 – Example The manager of a men’s store observes the waist size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40. What is the modal value? This information seems valuable (for example, for the design of a new display in the store), much more than “ the median is 33.2 in.”. 34
17
17 Relationship among Mean, Median, and Mode If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode
18
18 ` If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean Median Mode A negatively skewed distribution (“skewed to the left”)
19
19 Example A professor of statistics wants to report the results of a midterm exam, taken by 100 students. He calculates the mean, median, and mode using excel. Describe the information excel provides. The mean provides information about the over-all performance level of the class. It can serve as a tool for making comparisons with other classes and/or other exams. The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated.Then, the mode becomes a logical measure to compute. Excel results
20
20 Measures of variability (Looking beyond the average) Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? How spread out are the measurements about the average value? or
21
21 Observe two hypothetical data sets The average value provides a good representation of the values in the data set. Low variability data set High variability data set The same average value does not provide as good presentation of the values in the data set as before. This is the previous data set. It is now changing to...
22
22 –The range of a set of measurements is the difference between the largest and smallest measurements. –Its major advantage is the ease with which it can be computed. –Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. ? ? ? But, how do all the measurements spread out? Smallest measurement Largest measurement The range cannot assist in answering this question Range § The range
23
23 –This measure of dispersion reflects the values of all the measurements. –The variance of a population of N measurements x 1, x 2,…,x N having a mean is defined as –The variance of a sample of n measurements x 1, x 2, …,x n having a mean is defined as § The variance
24
24 Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, 16 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 The mean of both populations is 10... …but measurements in B are much more dispersed then those in A. Thus, a measure of dispersion is needed that agrees with this observation. Let us start by calculating the sum of deviations A B The sum of deviations is zero in both cases, therefore, another measure is needed.
25
25 10 98 74 1112 1316 8-10= -2 9-10= -1 11-10= +1 12-10= +2 4-10 = - 6 7-10 = -3 13-10 = +3 16-10 = +6 Sum = 0 A B The sum of deviations is zero in both cases, therefore, another measure is needed. The sum of squared deviations is used in calculating the variance.
26
26 Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!!
27
27 Which data set has a larger dispersion? 131 32 5 AB Data set B is more dispersed around the mean Let us calculate the sum of squared deviations for both data sets Sum A = (1-2) 2 +…+(1-2) 2 +(3-2) 2 + … +(3-2) 2 = 10 Sum B = (1-3) 2 + (5-3) 2 = 8 5 times However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked A 2 = Sum A /N = 10/5 = 2 B 2 = Sum B /N = 8/2 = 4 !
28
28 – Example Find the mean and the variance of the following sample of measurements (in years). 3.4, 2.5, 4.1, 1.2, 2.8, 3.7 – Solution A shortcut formula =1/5[3.4 2 +2.5 2 +…+3.7 2 ]-[(17.7) 2 /6] = 1.075 (years)
29
29 –The standard deviation of a set of measurements is the square root of the variance of the measurements. – Example Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05 Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4
30
30 –Solution –Let’s use the Excel printout that is run from the “Descriptive statistics” sub-menu Fund A should be considered riskier because its standard deviation is larger
31
31 –The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. –This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately large when the mean value is 500 § The coefficient of variation
32
32 Interpreting Standard Deviation The standard deviation can be used to –compare the variability of several distributions –make a statement about the general shape of a distribution. The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval
33
33 – Example The duration of 30 long-distance telephone calls are shown next. Check the empirical rule for the this set of measurements. Solution First check if the histogram has an approximate mound-shape
34
34 Calculate the intervals: Calculate the mean and the standard deviation: Mean = 10.26; Standard deviation = 4.29. Interval Empirical Rule Actual percentage 5.97, 14.5568%70% 1.68, 18.8495%96.7% -2.61, 23.13100%100% Interval Empirical Rule Actual percentage 5.97, 14.5568%70% 1.68, 18.8495%96.7% -2.61, 23.13100%100%
35
35 Measures of Association Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram. –Covariance - is there any pattern to the way two variables move together? –Correlation coefficient - how strong is the linear relationship between two variables
36
36 x ( y ) is the population mean of the variable X (Y) N is the population size. n is the sample size. § The covariance
37
37 If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number. If the two variables are unrelated, the covariance will be close to zero. If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number.
38
38 –This coefficient answers the question: How strong is the association between X and Y. § The coefficient of correlation
39
39 COV(X,Y)=0 or r = +1 0 Strong positive linear relationship No linear relationship Strong negative linear relationship or COV(X,Y)>0 COV(X,Y)<0
40
40 If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.
41
41 – Example Compute the covariance and the coefficient of correlation to measure how advertising expenditure and sales level are related to one another.
42
42 Use the procedure below to obtain the required summations xyxyx2x2 y2y2 Similarly, s y = 8.839
43
43 Excel printout Interpretation –The covariance (10.2679) indicates that advertisement expenditure and sales levelare positively related –The coefficient of correlation (.797) indicates that there is a strong positive linear relationship between advertisement expenditure and sales level. Covariance matrixCorrelation matrix
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.