Copyright © Cengage Learning. All rights reserved. 1 Overview and Descriptive Statistics.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Descriptive Statistics Summarizing data using graphs.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Calculating & Reporting Healthcare Statistics
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Copyright © Cengage Learning. All rights reserved. 1 Overview and Descriptive Statistics.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
Measures of Central Tendency
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Chapter 3 Descriptive Measures
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
Describing distributions with numbers
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
Describing and Displaying Quantitative data. Summarizing continuous data Displaying continuous data Within-subject variability Presentation.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Categorical vs. Quantitative…
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
INVESTIGATION 1.
To be given to you next time: Short Project, What do students drive? AP Problems.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Chapter 4 Numerical Methods for Describing Data.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Honors Statistics Chapter 3 Measures of Variation.
Chapter 3 Descriptive Statistics: Numerical Methods.
Descriptive Statistics(Summary and Variability measures)
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Copyright © Cengage Learning. All rights reserved. 1 Overview and Descriptive Statistics.
Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6
Descriptive Statistics
Descriptive Statistics (Part 2)
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Analyzing One-Variable Data
Chapter 3 Describing Data Using Numerical Measures
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Overview and Descriptive Statistics
LESSON 3: CENTRAL TENDENCY
Presentation transcript:

Copyright © Cengage Learning. All rights reserved. 1 Overview and Descriptive Statistics

Copyright © Cengage Learning. All rights reserved. 1.3 Measures of Location

3 More formal data analysis often requires the calculation and interpretation of numerical summary measures. Measurements that characterize the data set and convey some of its salient features. Focus on numerical data primarily, a little categorical data at the end. Two important location measures are both for the center: Mean and Median

4 The Mean

5 For a given set of numbers x 1, x 2,..., x n, the most familiar and useful measure of the center is the mean, or arithmetic average of the set. The sample mean x of observations x 1, x 2,…, x n, is Recommend using decimal accuracy of one digit more than the accuracy of the x i ’s.

6 Example 14 x 1 = 16.1 x 2 = 9.6 x 3 = 24.9 x 4 = 20.4 x 5 = 12.7 x 6 = 21.2 x 7 = 30.2 x 8 = 25.8 x 9 = 18.5 x 10 = 10.3 x 11 = 25.3 x 12 = 14.0 x 13 = 27.1 x 14 = 45.0 x 15 = 23.3 x 16 = 24.2 x 17 = 14.6 x 18 = 8.9 x 19 = 32.4 x 20 = 11.8 x 21 = 28.5 With  x i = 444.8, the sample mean is The mean as the balance point for a system of weights

7 The Mean for population The population mean, , is the average of all population values,  = (Σ N values)/N, for a finite population of size N. Later for statistical inference, we will learn methods based on the sample mean to draw conclusions for a population mean. For example, we can use the sample mean x = computed in Example 14 as a point estimate of  = crack length for all specimens.

8 The Mean: one weakness One deficiency to measure center: Its value can be greatly affected by an outlier (unusually large or small observation). In Example 14, x 14 = Without x 14, x = 399.8/20 = 19.99; the outlier increases the mean by more than 1  m. If the 45.0  m observation were replaced by the catastrophic value  m a really extreme outlier, then x = 694.8/21 = 33.09, which is larger than all but one of the observations!

9 The Mean A sample of incomes often produces such outlying values (those lucky few who earn astronomical amounts), and the use of average income is often misleading. There is a measure that is less sensitive to outlying values than x, the median. However, the mean is still the most widely used measure, because there are many populations that don’t have extreme outliers in the sample often. Sampling from such a population (a normal or bell-shaped population), the sample mean will tend to be stable and quite representative of the sample.

10 The Median

11 The Median The sample median, is the middle value once the observations are ordered from smallest to largest, for x 1 ≤ x 2 ≤…, ≤ x n.

12 Example 15 A sample of 12 recordings of Beethoven’s Symphony #9, yielding the following durations (min) listed ascendingly: n = 12, the sample median is the average of the 6 th and 7 th values from the ordered list: Had the largest observation 79.0 not been there, the resulting sample median for the n = 11 remaining observations would have been the single middle (6 th ) value Dotplot of the data from Example 14 Figure 1.16 cont’d Mean: 816.1/12 = Median: 66.9

13 The Median The data in Example 15 illustrates an important property of in contrast to x: The sample median is insensitive to outliers. If, for example, we increased the two largest x i s from 75.7 and 79.0 to 85.7 and 89.0, respectively, would be unaffected. Thus, in the treatment of outlying data values, x and are at opposite ends of a spectrum. Both measures describe where the data is centered, but they will not in general be equal because they focus on different aspects of the sample.

14 The Median Analogous to as the middle value in the sample is a middle value in the population, the population median, denoted by As with and , we can think of using the sample median to make an inference about In Example 15, we might use = as an estimate of the median time for the population of all recordings. A median is often used to describe income or salary data (because it is not greatly influenced by a few large salaries).

15 The Median The population mean  and median will not generally be identical. If the population distribution is positively or negatively skewed, as pictured in Figure 1.17, then When this is the case, in making inferences we must first decide which of the two population characteristics is of greater interest and then proceed accordingly. (a) Negative skew(b) Symmetric(c) Positive skew Three different shapes for a population distribution Figure 1.17

16 Other Measures of Location: Quartiles, Percentiles, and Trimmed Means

17 Other Measures of Location: Quartiles, Percentiles, and Trimmed Means The median (population or sample) divides the data set into two parts of equal size. To obtain finer measures of location, we could divide the data into more than two such parts. Roughly speaking, quartiles divide the data set into four equal parts, with the observations above the third quartile constituting the upper quarter of the data set, the second quartile being identical to the median, and the first quartile separating the lower quarter from the upper three-quarters.

18 Other Measures of Location: Quartiles, Percentiles, and Trimmed Means Similarly, a data set (sample or population) can be even more finely divided using percentiles; the 99th percentile separates the highest 1% from the bottom 99%, and so on. Unless the number of observations is a multiple of 100, care must be exercised in obtaining percentiles.

19 Other Measures of Location: Quartiles, Percentiles, and Trimmed Means The mean is quite sensitive to a single outlier, whereas the median is impervious to many outliers. Since extreme behavior of either type might be undesirable, we briefly consider alternative measures that are neither as sensitive as nor as insensitive as. To motivate these alternatives, note that and are at opposite extremes of the same “family” of measures. The mean is the average of all the data, whereas the median results from eliminating all but the middle one or two values and then averaging.

20 Other Measures of Location: Quartiles, Percentiles, and Trimmed Means To paraphrase, the mean involves trimming 0% from each end of the sample, whereas for the median the maximum possible amount is trimmed from each end. A trimmed mean is a compromise between and. A 10% trimmed mean, for example, would be computed by eliminating the smallest 10% and the largest 10% of the sample and then averaging what remains.

21 Example 16 Consider the following observations on copper content (%) for a sample of alloy: The sample mean and median are 3.65 and 3.35, respectively. A trimmed mean with a trimming percentage of 100(2/26) = 7.7% results from eliminating the two smallest and two largest observations; Dotplot of copper contents from Example 16 Figure 1.18 Outlier

22 Trimmed Mean: when desired % is not integer If the desired trimming percentage is 100  % and n  is not an integer, the trimmed mean must be calculated by interpolation. For example, consider  =.10 for a 10% trimming percentage and n = 26 as in Example 16. Then x tr(10) would be the appropriate weighted average of the 7.7% trimmed mean calculated there and the 11.5% trimmed mean resulting from trimming three observations from each end.

23 Categorical Data and Sample Proportions

24 Categorical Data and Sample Proportions When the data is categorical, the natural numerical summary quantities are the individual frequencies and the relative frequencies. A dichotomous population—one having only two categories. If we let x denote the number in the sample in category 1 (the one we are more interested), then the number in category 2 is n – x. The relative frequencies or sample proportions in categories 1 and 2 are x/n and 1 – x/n.

25 Categorical Data and Sample Proportions Let’s denote a response that falls in category 1 by a 1 and a response that falls in category 2 by a 0. A sample size of n = 10 might then yield the responses 1, 1, 0, 1, 1, 1, 0, 0, 1, 1. The sample mean for this numerical sample is: Then the sample proportion of observations in the category is the sample mean of the sequence of 1s and 0s. Thus a sample mean can be used to summarize the results of a categorical sample.

26 Categorical Data and Sample Proportions Analogous to the sample proportion x/n of individuals or objects falling in a particular category, let p represent the proportion of the entire population falling in the category. As with x/n, p is also between 0 and 1, and while x/n is a sample characteristic, p is a population characteristic. The relationship between the two parallels the relationship between and, and between x and . In particular, we will subsequently use x/n to make inferences about p.

27 Categorical Data and Sample Proportions With k categories (k > 2), we can use the k sample proportions to answer questions about the population proportions p 1,...., p k.