Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Calculating & Reporting Healthcare Statistics
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 3-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Data Summary Using Descriptive Measures Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Slides by JOHN LOUCKS St. Edward’s University.
Basic Business Statistics 10th Edition
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Numerical Descriptive Measures
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Department of Quantitative Methods & Information Systems
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
© Copyright McGraw-Hill CHAPTER 3 Data Description.
© The McGraw-Hill Companies, Inc., Chapter 3 Data Description.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Descriptive Statistics: Numerical Methods
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Skewness & Kurtosis: Reference
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
©2003 Thomson/South-Western 1 Chapter 3 – Data Summary Using Descriptive Measures Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
Numerical Measures of Variability
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
Basic Business Statistics Chapter 3: Numerical Descriptive Measures Assoc. Prof. Dr. Mustafa Yüzükırmızı.
Summary Statistics: Measures of Location and Dispersion.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Basic Business Statistics 11 th Edition.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Descriptive Statistics(Summary and Variability measures)
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Business and Economics 6th Edition
Chapter 3 Describing Data Using Numerical Measures
2.5: Numerical Measures of Variability (Spread)
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Business and Economics 7th Edition
Presentation transcript:

Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8 Based on Introduction to Business Statistics Kvanli / Pavur / Keeling

Coefficient of Variation |»| Summary of Descriptive Measures DESCRIPTIVE MEASURES A single number computed from the sample data that provides information about the data. An example of such measures is the mean, which the average of all the observations in a sample or a population. Measures of Central Tendency Determine the center of the data values or possibly the most typical value. Measures of Variation Determine the spread of the data. Measures of Position Indicate how a particular data point fits in with all the other data points. Measures of Shape Indicate how the data points are distributed. Mean The average of the data values. Range Range = H - L Percentile P% below P-th Percentile & (1-P)% above it Skewness The tendency of a distribution to stretch out in a particular direction Median The value in the center of the ordered data values Variance The average of the sum squared differences of the mean from individual values. Quartiles The 25th, 50th and 75th percentiles Kurtosis A measure of the peakedness of a distribution Mode The value that occurs more than once and the most often Standard Deviation The positive squared root of the variance Z-Score Expresses the number of standard deviations the value x is from the mean. Midrange The average of the highest and the lowest values Coefficient of Variation The standard deviation in terms of the mean.

|»| The Mean The mean represents the average of the data and is computed by dividing the sum of the data points by the number of the data points. It is the most popular measure of central tendency. We can easily compute and explain the mean. We have two types of mean depending on whether the data set includes all items of a population or a subset of items of a population – Sample Mean and Population Mean.

|»| Sample Mean It is the sum of the data values in a sample divided by the number of data values in that sample. We use (X-bar) to denote the sample mean, and n to denote the number of data values in a sample. Therefore, for ungrouped data, we obtain, Example 3.1 (Accident Data): The following sample represents the number of accidents (monthly) over 11 months: 18, 10, 15, 13, 17, 15, 12, 15, 18, 16, 11. Compute the mean number of monthly accidents, i.e., compute the sample mean.

|»| Sample Mean (cont.) |»| Population Mean Example 3.2: The mean of a sample with 5 observations is 20. If the sum of four of the observations is 75, what is the value of the fifth observation? |»| Population Mean It is the sum of the data values in a population divided by the number of data values in that population. We use μ to denote the population mean, and N to denote the number of data values in a population. Therefore, we obtain,

|»| The Median, Md The Median (Md) of a set of data is the value in the center of the data values when they are arranged from lowest to highest. It has the equal number of items to the right and the left. Median is preferred to the mean as a measure of central tendency for data set with outliers. Calculating the median from a sample involves the following steps: Arrange data values in ascending order. Find the position of the median. The median position is the ordered value. Find the median value.

|»| The Median (cont.) Example 3.3: Compute the median for the accident data given in Example 3.1. Ascending order: 10, 11, 12, 13, 15, 15 , 15, 16, 17, 18, 18. n = 11, Median Position = (11+1)/2 = 6th ordered value. Md = 15.

|»| The Mode, Mo The Mode (Mo) of a data set is the value that occurs more than once and the most often. Mode is not always a measure of central tendency; this value need not occur in the center of the data. There may be more than one mode if several numbers occur the same (and the largest) number of times. Mode is extensively used in areas such as manufacturing of clothing, shoes, etc. Example 3.4: Find the mode for the accident data given in Example 3.1. The data point 15 appears the most number of times, so Mo = 15.

|»| The Midrange, Mr It is the average of the highest and the lowest values of a data set. Midrange provides an easy-to-grasp measure of central tendency. If we use H to denote the highest value and L to denote the lowest value of a data set, we obtain, Example 3.5: Find the midrange for the accident data given in Example 3.1. L = 10, H = 18, so Mr = (L + H)/2 = (10 + 18)/2 = 19.

|»| The Range, R The numerical difference between the largest value (H) and the smallest value (L). That is, Range = H – L. Example 3.6: The range for the accident data given in Example 3.1 is H – L = 18 – 10 = 8. The range is a crude measure of variation but easy to calculate and contains valuable information in some situations. For instance, stock reports cite the high and low prices of the day. Similarly, weather forecasts use daily high and low temperatures. Range is strongly influenced by the outliers.

|»| The Variance |»| Sample Variance, S2 Variance describes the spread of the data values from the mean. It is the average of the sum of the squared differences of the mean from individual values. Two types of variance are (1) Sample variance, and (2) Population variance. |»| Sample Variance, S2 S2 describes the variation of the sample values about the sample mean. It is the average of the sum of the squared differences of the sample mean from individual values. That is,

|»| Sample Variance - Example Example 3.7: Calculate the sample variance for the accident data. x 18 18 - 14.55 = 3.45 11.9025 10 10 - 14.55 = -4.55 20.7025 15 15 - 14.55 = 0.45 0.2025 13 13 - 14.55 = -1.55 2.4025 17 17 - 14.55 = 2.45 6.0025 12 12 - 14.55 = -2.55 6.5025 16 16 - 14.55 = 1.45 2.1025 11 11 - 14.55 = -3.55 12.6025

|»| Sample Variance - Examples Example 3.8: From 50 collected data, the statistics ∑x and ∑x2 are calculated to be 20 and 33, respectively. Compute the sample variance, Example 3.9: The values of the difference between data values and the sample mean are -5, 1, -3, 2, 3, and 2, What is the variance of the data? -5 (-5)2 = 25 1 (1)2 = 1 -3 (-3)2 = 9 2 (2)2 = 4 3 (3)2 = 9

|»| Population Variance, σ2 σ 2 describes the variation of the population values about the population mean. It is the average of the sum of the squared differences of the population mean from individual values. That is, |»| The Standard Deviation Standard deviation is the positive square root of the variance. The positive square root of the sample variance is the sample standard deviation, denoted by S. The positive square root of the population variance is the population standard deviation, denoted by σ.

|»| Standard Deviation Example 3.10: Find the sample standard deviations for Examples 3.7, 3.8, and 3.9. From Example 3.7: From Example 3.8: From Example 3.9:

|»| Coefficient of Variation, CV Measures the standard deviation in terms of mean. For example, what percentage of x-bar is s? The Coefficient of Variation (CV) is used to compare the variation of two or more data sets where the values of the data differ greatly. Example 3.11: The scores for team 1 were 70, 60, 65, and 69. The scores for team 2 were 72, 58, 61, and 73. Compare the coefficients of variation for these two teams. For team 1: For team 2:

|»| Percentile The P-th percentile is a number such that P% of the measurements fall below the P-th percentile and (100-P)% fall above it. Most common measure of position. How to calculate percentile Arrange the data Find the location of the Pth percentile. Find percentile using the following rules: Location Rule 1: If n  P/100 is not a counting number, round it up, and the Pth percentile will be the value in this position of the ordered data. Location Rule 2: If n  P/100 is a counting number, the Pth percentile is the average of the number in this location (of the ordered data) and the number in the next largest location 17

|»| Percentile - Example Example 3.12: Find the 35th percentile from the following aptitude data (Aptitude Data). 22 25 28 31 34 35 39 40 42 44 46 48 49 51 53 55 56 57 59 60 61 63 65 66 68 69 71 72 74 75 76 78 80 82 83 85 88 90 92 96 Number of data values, n = 50 35th Percentile = P35. So, 17.5 is NOT a counting number. So, using Location Rule 1, P35 = 18th value = 53. 18

|»| Quartiles and Interquartile Range Quartiles are merely particular percentiles that divide the data into quarters, namely. Q1 = 1st quartile = 25th percentile (P25) Q2 = 2nd quartile = 50th percentile (P50) = Median. Q3 = 3rd quartile = 75th percentile (P75) Example 3.13: Determine the quartiles for the aptitude data Q1 = 13th ordered value = 46 Q2 = Median = (61+63)/2 = 62 Q3 = 38th ordered value = 75 Interquartile Range (IQR) The range for the middle 50% of the data IQR = Q3 – Q1. For aptitude data: IQR = 75 – 46 = 29. 19

|»| Standardizing Sample Data |»| Z-Scores Z-score determines the relative position of any particular data value X and is based on the mean and standard deviation of the data set. The Z-score is expresses the number of standard deviations the value x is from the mean. A negative Z-score implies that x is to the left of the mean and a positive Z-score implies that x is to the right of the mean. Example 3.14: Find the z-score for an aptitude test score of 83. |»| Standardizing Sample Data The process of subtracting the mean and dividing by the standard deviation is referred to as standardizing the sample data. The corresponding z-score is the standardized score. 20

|»| Skewness, Sk Skewness measures the tendency of a distribution to stretch out in a particular direction. The Pearson’s coefficient of skewness is used to calculate skewness. Example 3.15: Find the skewness for aptitude data. Sk = 3(60.36 – 62)/18.61 = 3(-1.64)/18.61 = -4.92/18.61 = -0.26 The values of Sk will always fall between -3 and 3 A positive Sk number implies a shape which is skewed right and the mode < median < mean In a data set with a negative Sk value the mean < median < mode 21

|»| Skewness, Sk – In Graphs Histogram of Symmetric Data x = Md = Mo Frequency 22

|»| Skewness, Sk – In Graphs Histogram with Right (Positive) Skew Mode (Mo) Median (Md) Sk > 0 Mean (x ) Relative Frequency 23

|»| Skewness, Sk – In Graphs Histogram with Left (Negative) Skew Mode (Mo) Median (Md) Sk < 0 Mean (x ) Relative Frequency 24

|»| Interpreting X-bar and S |»| Kurtosis Kurtosis is a measure of the peakedness of a distribution. Large values occur when there is a high frequency of data near the mean and in the tails. The calculation is cumbersome and the measure is used infrequently. |»| Interpreting X-bar and S How many or what percentage of the data values are/is within two standard deviation of the mean? Usually three ways to know that: Actual percentage based on the sample Chebyshev’s Inequality Empirical Rule 25

|»| Kurtosis According to Chebyshev, in general, at least of the data values lie between and (have z-scores between –k and k) for any k > 1. Chebyshev’s Inequality is usually conservative but makes no assumption about the distribution of the population. Empirical rule assumes bell-shaped distribution of the population, i.e., normal population Actual Chebyshev’s Percentage Inequality Empirical Rule Between (Aptitude Data) Percentage Percentage x - s and x + s 66% — ≈ 68% (33 out of 50) x - 2s and x + 2s 98% ≥ 75% ≈ 95% (49 out of 50) x - 3s and x + 3s 100% ≥ 89% ≈ 100% (50 out of 50) 26

|»| A Bell-Shaped (Normal) Population 27

|»| Bivariate Data Data collected on two variables for each item. Example 3.16: Data for 10 families on income (thousands of dollars) and square footage of home (hundreds of square feet) (Income-Footage Data). Income (000s), X Sq Footage of Home (00s), Y 32 16 36 17 55 26 47 24 38 22 60 21 66 44 18 70 30 50 20 28

|»| Scatter Diagram Graphical illustration of bivariate data Each observation is represented by a point, where the X-axis is always horizontal and the Y-axis is vertical. | 20 30 40 50 60 70 80 35 – 30 – 25 – 20 – 15 – 10 – 5 – Square footage (hundreds) Y X Income (thousands) (a) 35 – 30 – 25 – 20 – 15 – 10 – 5 – Square footage (hundreds) | 20 30 40 50 60 70 80 Y X Income (thousands) (b) 29

|»| Coefficient of Correlation, r Measures the strength of the linear relationship between X variable and Y variable. r ranges from -1 to 1. The larger the |r| is, the stronger the linear relationship is between X and Y. If r = 1 or r = -1, X and Y are perfectly correlated. If r > 0, X and Y have positive relationship (i.e., large values of X are associated with large values of Y). If r < 0, X and Y have negative relationship (i.e., large values of X are associated with small values of Y). 30

|»| Coefficient of Correlation – Example Example 3.17: Calculate r for Income-Footage Data. Income, X Footage, Y XY X2 Y2 32 16 32x16=512 (32)2=1024 (16)2=256 36 17 36x17=612 (36)2=1296 (17)2=289 55 26 55x26=1430 3025 676 47 24 47x24=1128 2209 576 38 22 38x22=836 1444 484 60 21 60x21=1260 3600 441 66 66x32=2112 4356 1024 44 18 44x18=792 1936 324 70 30 70x30=2100 4900 900 50 20 50x20=1000 2500 400 498 226 11782 26290 5370 31

|»| Coefficient of Correlation, r – In Graphs y x r = 0 (a) y x r = 1 (b) y x r = -1 (c) y x r = .9 (d) 32

|»| Coefficient of Correlation, r – In Graphs y x r = -.8 (e) y x r = .5 (f) 33