1 Economics 173 Business Statistics Lectures 1 & 2 Summer, 2001 Professor J. Petry.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

Chapter 3, Numerical Descriptive Measures
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Numerical Descriptive Techniques
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Basic Business Statistics 10th Edition
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Measures of Central Tendency
Math 116 Chapter 12.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Numerical Descriptive Techniques
1 Descriptive Statistics: Numerical Methods Chapter 4.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Review of Measures of Central Tendency, Dispersion & Association
1 Tendencia central y dispersión de una distribución.
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
Business Statistics: Communicating with Numbers
4 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Descriptive Statistics: Numerical Methods
Review of Measures of Central Tendency, Dispersion & Association
Chapter 2 Describing Data.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Descriptive Statistics: Presenting and Describing Data.
Finance 300 Financial Markets Lecture 3 Fall, 2001© Professor J. Petry
Chapter Eight: Using Statistics to Answer Questions.
Economics 173 Business Statistics Lectures 1 Fall, 2001 Professor J. Petry.
Statistics Lecture Notes Dr. Halil İbrahim CEBECİ Chapter 03 Numerical Descriptive Techniques.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
EXPECTATION, VARIANCE ETC. - APPLICATION 1. 2 Measures of Central Location Usually, we focus our attention on two types of measures when describing population.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Descriptive Statistics ( )
Business and Economics 6th Edition
Analysis and Empirical Results
Numerical Descriptive Techniques
Chapter 4 Describing Data (Ⅱ ) Numerical Measures
Ch 4 實習.
Descriptive Statistics: Presenting and Describing Data
Numerical Measures: Centrality and Variability
Introduction to Statistics
Basic Statistical Terms
Numerical Descriptive Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter Nine: Using Statistics to Answer Questions
Business and Economics 7th Edition
Presentation transcript:

1 Economics 173 Business Statistics Lectures 1 & 2 Summer, 2001 Professor J. Petry

2 Introduction Purpose of Statistics is to pull out information from data –“without data, ours is just another opinion” –“without statistics, we are just another person on data overload” Because of its broad usage across disciplines, Statistics is probably the most useful course irrespective of major. –More data, properly analyzed allows for better decisions in personal as well as professional lives –Applicable in nearly all areas of business as well as social sciences –Greatly enhances credibility

3 Statistics as “Tool Chest” Different types of data, allow different types of analysis Quantitative data –values are real numbers, arithmetic calculations are valid Qualitative data –categorical data, values are arbitrary names of possible categories, calculations involve how many observations in each category Ranked data –categorical data, values must represent the ranked order of responses, calculations are based on an ordering process. Time series data –data collected across different points of time Cross-sectional data –data collected at a certain point in time

4 Statistics as “Tool Chest” Different objectives call for alternative tool usage Describe a single population Compare two populations Compare two or more populations Analyze relationship between two variables Analyze relationship among two or more variables By conclusion of Econ 172 & 173, you will have about 35 separate tools to select from depending upon your data type and objective

5 Describe a single population Compare two populations Compare two or more populations Analyze relationships between two variables Analyze relationships among two or more variables. Problem Objective?

6 Describe a single population Z- test & estimator of p Z- test & estimator of p Central location Variability t- test & estimator of  t- test & estimator of    - test & estimator of  2   - test & estimator of  2 Data type? QuantitativeQualitative TwoTwo or more Type of descriptive measurements? Number of categories?  2 goodness of fit test  2 goodness of fit test

7 Experimental design? Type of descriptive measurements? Compare two populations Data type? Sign test Sign test Central location Variability F- test & estimator of   2 /   2 F- test & estimator of   2 /   2 Experimental design? Continue Wilcoxon rank sum test Wilcoxon rank sum test Independent samples Matched pairs Number of categories Two Two or more Z - test & estimator of p 1 - p 2  2 -test of a contingency table Quantitative Ranked Qualitative Continue

8 Independent samples Matched pairs t- test & estimator of  D t- test & estimator of  D Population variances EqualUnequal Wilcoxon signed rank sum test Wilcoxon signed rank sum test Wilcoxon rank sum test Wilcoxon rank sum test Population distribution NormalNonnormal Distribution of differences NormalNonnormal t- test & estimator of  1 -  2 (equal variances) t- test & estimator of  1 -  2 (equal variances) T-test & estimator of  1 -  2 (unequal variances) T-test & estimator of  1 -  2 (unequal variances) Continue Experimental Design

9 Independent samples Blocks NormalNonnormal Normal ANOVA (independent samples) ANOVA (independent samples) Kruskal-Wallis test Kruskal-Wallis test Friedman test Friedman test Compare two or more populations Friedman test Friedman test Kruskal-Wallis test Kruskal-Wallis test Data type? Quantitative Ranked Qualitative ANOVA (randomized blocks) ANOVA (randomized blocks) Population distribution Population distribution  2 - test of a contingency table  2 - test of a contingency table Experimental design? Independent samplesBlocks Experimental design?

10 Data type? Quantitative Ranked Qualitative Not covered Multiple regression Analyze relationship between two or more variables Analyze relationship between two variables Data type?  2 - test of a contingency table  2 - test of a contingency table Spearman rank correlation Spearman rank correlation Spearman rank correlation Spearman rank correlation Simple linear regression and correlation Simple linear regression and correlation Error is normal, or x and y are bivariate normal x and y are not bivariate normal Population distribution Ranked QualitativeQuantitative

11 Numerical Descriptive Measures Measures of central location –arithmetic mean, median, mode, (geometric mean) Measures of variability –range, variance, standard deviation, coefficient of variation Measures of association –covariance, coefficient of correlation

12 –This is the most popular and useful measure of central location Sum of the measurements Number of measurements Mean = Sample meanPopulation mean Sample sizePopulation size § Arithmetic mean Measures of Central Location Sum of the measurements Number of measurements Mean =

13 Example The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by Example Calculate the mean of 212, -46, 52, -14, 66 54

14 26,26,28,29,30,32,60,31 Odd number of observations 26,26,28,29,30,32,60 Example 4.4 Seven employee salaries were recorded (in 1000s) : 28, 60, 26, 32, 30, 26, 29. Find the median salary. –The median of a set of measurements is the value that falls in the middle when the measurements are arranged in order of magnitude. Suppose one employee’s salary of $31,000 was added to the group recorded before. Find the median salary. Even number of observations 26,26,28,29, 30,32,60,31 There are two middle values! First, sort the salaries. Then, locate the value in the middle First, sort the salaries. Then, locate the value s in the middle 26,26,28,29, 30,32,60, , § The median

15 –The mode of a set of measurements is the value that occurs most frequently. –Set of data may have one mode (or modal class), or two or more modes. The modal class § The mode

16 – Example The manager of a men’s store observes the waist size (in inches) of trousers sold yesterday: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40. What is the modal value? This information seems valuable (for example, for the design of a new display in the store), much more than “ the median is 33.2 in.”. 34

17 Relationship among Mean, Median, and Mode If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode

18 ` If a distribution is symmetrical, the mean, median and mode coincide If a distribution is non symmetrical, and skewed to the left or to the right, the three measures differ. A positively skewed distribution (“skewed to the right”) Mean Median Mode Mean Median Mode A negatively skewed distribution (“skewed to the left”)

19 Example A professor of statistics wants to report the results of a midterm exam, taken by 100 students. He calculates the mean, median, and mode using excel. Describe the information excel provides. The mean provides information about the over-all performance level of the class. It can serve as a tool for making comparisons with other classes and/or other exams. The Median indicates that half of the class received a grade below 81%, and half of the class received a grade above 81%. The mode must be used when data is qualitative. If marks are classified by letter grade, the frequency of each grade can be calculated.Then, the mode becomes a logical measure to compute. Excel results

20 Measures of variability (Looking beyond the average) Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? How spread out are the measurements about the average value? or

21 Observe two hypothetical data sets The average value provides a good representation of the values in the data set. Low variability data set High variability data set The same average value does not provide as good presentation of the values in the data set as before. This is the previous data set. It is now changing to...

22 –The range of a set of measurements is the difference between the largest and smallest measurements. –Its major advantage is the ease with which it can be computed. –Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. ? ? ? But, how do all the measurements spread out? Smallest measurement Largest measurement The range cannot assist in answering this question Range § The range

23 –This measure of dispersion reflects the values of all the measurements. –The variance of a population of N measurements x 1, x 2,…,x N having a mean  is defined as –The variance of a sample of n measurements x 1, x 2, …,x n having a mean is defined as § The variance

24 Consider two small populations: Population A: 8, 9, 10, 11, 12 Population B: 4, 7, 10, 13, = = = = = = = = +6 Sum = 0 The mean of both populations is …but measurements in B are much more dispersed then those in A. Thus, a measure of dispersion is needed that agrees with this observation. Let us start by calculating the sum of deviations A B The sum of deviations is zero in both cases, therefore, another measure is needed.

= = = = = = = = +6 Sum = 0 A B The sum of deviations is zero in both cases, therefore, another measure is needed. The sum of squared deviations is used in calculating the variance.

26 Let us calculate the variance of the two populations Why is the variance defined as the average squared deviation? Why not use the sum of squared deviations as a measure of dispersion instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!!

27 Which data set has a larger dispersion? AB Data set B is more dispersed around the mean Let us calculate the sum of squared deviations for both data sets Sum A = (1-2) 2 +…+(1-2) 2 +(3-2) 2 + … +(3-2) 2 = 10 Sum B = (1-3) 2 + (5-3) 2 = 8 5 times However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked  A 2 = Sum A /N = 10/5 = 2  B 2 = Sum B /N = 8/2 = 4 !

28 – Example Find the mean and the variance of the following sample of measurements (in years). 3.4, 2.5, 4.1, 1.2, 2.8, 3.7 – Solution A shortcut formula =1/5[ … ]-[(17.7) 2 /6] = (years)

29 –The standard deviation of a set of measurements is the square root of the variance of the measurements. – Example Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4

30 –Solution –Let’s use the Excel printout that is run from the “Descriptive statistics” sub-menu Fund A should be considered riskier because its standard deviation is larger

31 –The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. –This coefficient provides a proportionate measure of variation. A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately large when the mean value is 500 § The coefficient of variation

32 Interpreting Standard Deviation The standard deviation can be used to –compare the variability of several distributions –make a statement about the general shape of a distribution. The empirical rule: If a sample of measurements has a mound-shaped distribution, the interval

33 – Example The duration of 30 long-distance telephone calls are shown next. Check the empirical rule for the this set of measurements. Solution First check if the histogram has an approximate mound-shape

34 Calculate the intervals: Calculate the mean and the standard deviation: Mean = 10.26; Standard deviation = Interval Empirical Rule Actual percentage 5.97, %70% 1.68, %96.7% -2.61, %100% Interval Empirical Rule Actual percentage 5.97, %70% 1.68, %96.7% -2.61, %100%

35 Measures of Association Two numerical measures are presented, for the description of linear relationship between two variables depicted in the scatter diagram. –Covariance - is there any pattern to the way two variables move together? –Correlation coefficient - how strong is the linear relationship between two variables

36  x (  y ) is the population mean of the variable X (Y) N is the population size. n is the sample size. § The covariance

37 If the two variables move in two opposite directions, (one increases when the other one decreases), the covariance is a large negative number. If the two variables are unrelated, the covariance will be close to zero. If the two variables move the same direction, (both increase or both decrease), the covariance is a large positive number.

38 –This coefficient answers the question: How strong is the association between X and Y. § The coefficient of correlation

39 COV(X,Y)=0  or r = +1 0 Strong positive linear relationship No linear relationship Strong negative linear relationship or COV(X,Y)>0 COV(X,Y)<0

40 If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.

41 – Example Compute the covariance and the coefficient of correlation to measure how advertising expenditure and sales level are related to one another.

42 Use the procedure below to obtain the required summations xyxyx2x2 y2y2 Similarly, s y = 8.839

43 Excel printout Interpretation –The covariance ( ) indicates that advertisement expenditure and sales levelare positively related –The coefficient of correlation (.797) indicates that there is a strong positive linear relationship between advertisement expenditure and sales level. Covariance matrixCorrelation matrix