Intro to Statistics Part II Descriptive Statistics

Slides:



Advertisements
Similar presentations
Brought to you by Tutorial Support Services The Math Center.
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Calculating & Reporting Healthcare Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
12.3 – Measures of Dispersion
Section 12-2 Measures of Central Tendency.
Chapter 13 Section 5 - Slide 1 Copyright © 2009 Pearson Education, Inc. AND.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
12.3 – Measures of Dispersion Dispersion is another analytical method to study data. Two of the most common measures of dispersion are the range and the.
Chapter 3 Averages and Variations
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
SECTION 12-3 Measures of Dispersion Slide
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-3 Measures of Dispersion.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
© 2010 Pearson Education, Inc. All rights reserved Data Analysis/Statistics: An Introduction Chapter 10.
1 Descriptive Statistics Descriptive Statistics Ernesto Diaz Faculty – Mathematics Redwood High School.
Summary Statistics: Measures of Location and Dispersion.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.
Slide Copyright © 2009 Pearson Education, Inc. Unit 9 Seminar Agenda Final Project and Due Dates Measures of Central Tendency Measures of Dispersion.
 2012 Pearson Education, Inc. Slide Chapter 12 Statistics.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
 2012 Pearson Education, Inc. Slide Chapter 12 Statistics.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Copyright © 2009 Pearson Education, Inc. 4.3 Measures of Variation LEARNING GOAL Understand and interpret these common measures of variation: range, the.
Copyright © 2009 Pearson Education, Inc. Chapter 13 Section 5 - Slide 1 Section 5 Measures of Central Tendency.
One-Variable Statistics. Descriptive statistics that analyze one characteristic of one sample  Where’s the middle?  How spread out is it?  How do different.
An Introduction to Statistics
AND.
Descriptive Statistics Ernesto Diaz Faculty – Mathematics
Descriptive Statistics ( )
Business and Economics 6th Edition
One-Variable Statistics
3 Averages and Variation
10 Chapter Data Analysis/Statistics: An Introduction
PROBABILITY AND STATISTICS
Chapter 3 Describing Data Using Numerical Measures
SUBTOPIC 8.3 : Measures of Location 8.4 : Measures of Dispersion
Chapter 12 Statistics 2012 Pearson Education, Inc.
Intro to Statistics Part II Descriptive Statistics
Chapter 12 Statistics 2012 Pearson Education, Inc.
Using Descriptive Statistics in Business and Economics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics
Description of Data (Summary and Variability measures)
4.3 Measures of Variation LEARNING GOAL
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
9.2 - Measures of Central Tendency
12.2 – Measures of Central Tendency
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 12 Statistics.
14.2 Measures of Central Tendency
Chapter 12 Statistics.
Business and Economics 7th Edition
Chapter 12 Statistics.
Presentation transcript:

Intro to Statistics Part II Descriptive Statistics Ernesto Diaz Assistant Professor of Mathematics 1

Descriptive Statistics 14.2 Descriptive Statistics Copyright © Cengage Learning. All rights reserved.

Descriptive Statistics Descriptive statistics is concerned with the accumulation of data, measures of central tendency, and dispersion.

Measures of Central Tendency

Measures of Central Tendency When we add up a list of numbers in statistics, we use the symbol x to mean the sum of all the values that x can assume. Similarly, x2 means to square each value that x can assume, and then add the results; ( x)2 means to first add the values and then square the result. The symbol  is the Greek capital letter sigma (which is chosen because S reminds us of “sum”). The average is the measure that most of us think of when we hear someone use the word average. It is called the mean.

Measures of Central Tendency Other statistical measures, called averages or measures of central tendency, are defined in the following box.

Example 3 – Mean, median, and mode for table values Consider Table 14.5, which shows the number of days one must wait for a marriage license in the various states in the United States. What are the mean, the median, and the mode for these data? Wait Time for a U.S. Marriage License Table 14.5

Example 3 – Solution Mean: To find the mean, we could, of course, add all 50 individual numbers, but instead, notice that 0 occurs 25 times, so write 0  25 1 occurs 1 time, so write 1  1 2 occurs 1 time, so write 2  1 3 occurs 19 times, so write 3  19 4 occurs 1 time, so write 4  1 5 occurs 3 times, so write 5  3 Thus, the mean is

Example 3 – Solution cont’d Median: Since the median is the middle number and there are 50 values, the median is the mean of the 25th and 26th numbers (when they are arranged in order): 25th term is 0 26th term is 1 Mode: The mode is the value that occurs most frequently, which is 0.

Measures of Central Tendency When finding the mean from a frequency distribution, you are finding what is called a weighted mean.

Example 4 – Find a weighted mean A sociology class is studying family structures and the professor asks each student to state the number of children in his or her family. The results are summarized in Table 14.6. What is the average number of children in the families of students in this sociology class? Family Data Table 14.6

Example 4 – Solution We need to find the weighted mean, where x represents the number of students and w the population (number of families). = 2.12 There is an average of two children per family.

Measures of Position

Measures of Position The median divides the data into two equal parts, with half the values above the median and half below the median, so the median is called a measure of position. Sometimes we use benchmark positions that divide the data into more than two parts. Quartiles, denoted by Q1(first quartile), Q2(second quartile), and Q3(third quartile), divide the data into four equal parts. Deciles are nine values that divide the data into ten equal parts, and percentiles are 99 values that divide the data into 100 equal parts.

Measures of Position Measures of position are often used to make comparisons. Two measures of position are percentiles and quartiles.

To Find the Quartiles of a Set of Data Order the data from smallest to largest. Find the median, or 2nd quartile, of the set of data. If there are an odd number of pieces of data, the median is the middle value. If there are an even number of pieces of data, the median will be halfway between the two middle pieces of data.

To Find the Quartiles of a Set of Data continued The first quartile, Q1, is the median of the lower half of the data; that is, Q1, is the median of the data less than Q2. The third quartile, Q3, is the median of the upper half of the data; that is, Q3 is the median of the data greater than Q2.

Example: Quartiles The weekly grocery bills for 23 families are as follows. Determine Q1, Q2, and Q3. 170 210 270 270 280 330 80 170 240 270 225 225 215 310 50 75 160 130 74 81 95 172 190

Example: Quartiles continued Order the data: 50 75 74 80 81 95 130 160 170 170 172 190 210 215 225 225 240 270 270 270 280 310 330 Q2 is the median of the entire data set which is 190. Q1 is the median of the numbers from 50 to 172 which is 95. Q3 is the median of the numbers from 210 to 330 which is 270.

Example 5 – Divide exam scores into quartiles The test results for Professor Hunter’s midterm exam are summarized in Table 14.7. Divide these scores into quartiles. Grade Distribution Table 14.7

Example 5 – Solution The quartiles are the three scores that divide the data into four parts. The first quartile is the data value that separates the lowest 25% of the scores from the remaining scores; the 2nd quartile is the value that separates the lower 50% of the scores from the remainder. Note that the 2nd quartile is the same as the median since the median divides the scores so that 50% are above and 50% are below. The 3rd quartile is the value that separates the lower 75% of the scores from the upper 25%. Begin by noting the number of scores: 4 + 7 + 16 + 3 = 30.

Example 5 – Solution cont’d First quartile: 0.25(30) = 7.5, so Q1(the first quartile) is the 8th lowest score. From Table 14.7, we see that this score is 69. Second quartile: Q2 the second quartile score, is the median, which is the mean of the 15th and 16th scores from the bottom.

Example 5 – Solution cont’d Third quartile: 0.75(30) = 22.5, so Q3 (the third quartile score) is 23 scores from the bottom (or the 8th from the top). From Table 14.7, we see this score is 85. Grade Distribution Table 14.7

Measures of Dispersion

Measures of Dispersion The measures we’ve been discussing can help us interpret information, but they do not give the entire story. For example, consider these sets of data: Set A: {8, 9, 9, 9, 10} Mean: Median: 9 Mode: 9 Set B: {2, 9, 9, 12, 13} Mean:

Measures of Dispersion Notice that, for sets A and B, the measures of central tendency do not distinguish the data. However, if you look at the data placed on planks, as shown in Figure 14.29, you will see that the data in Set B are relatively widely dispersed along the plank, whereas the data in Set A are clumped around the mean. a. A = {8, 9, 9, 9, 10} b. B = {2, 9, 9, 12, 13} Visualization of dispersion of sets of data Figure 14.29

Measures of Dispersion We’ll consider three measures of dispersion: the range, the standard deviation, and the variance.

Example 6 – Find the range Find the ranges for the data sets in Figure 14.29: a. Set A = {8, 9, 9, 9,10} b. Set B = {2, 9, 9, 12, 13} Solution: Notice from Figure 14.29 that the mean for each of these sets of data is the same. a. A = {8, 9, 9, 9, 10} b. B = {2, 9, 9, 12, 13} Visualization of dispersion of sets of data Figure 14.29

Example 6 – Solution cont’d The range is found by comparing the difference between the largest and smallest values in the set. a. 10 – 8 = 2 b. 13 – 2 = 11

Measures of Dispersion The range is used, along with quartiles, to construct a statistical tool called a box plot. For a given set of data, a box plot consists of a rectangular box positioned above a numerical scale, drawn from Q1 (the first quartile) to Q3 (the third quartile). The median ( Q2, or second quartile) is shown as a dashed line, and a segment is extended to the left to show the distance to the minimum value; another segment is extended to the right for the maximum value.

Measures of Dispersion Figure 14.30 shows a box plot for the data in Example 5. Box plot for grade distribution Figure 14.30

Measures of Dispersion Sometimes a box plot is called a box-and-whisker plot. Its usefulness should be clear when you look at Figure 14.31. box plot shows: the median (a measure of central tendency); the location of the middle half of the data (represented by the extent of the box); Box plot Figure 14.31

Measures of Dispersion 3. the range (a measure of dispersion); 4. the skewness (the nonsymmetry of both the box and the whiskers). The variance and standard deviation are measures that use all the numbers in the data set to give information about the dispersion. When finding the variance, we must make a distinction between the variance of the entire population and the variance of a random sample from the population.

Measures of Dispersion When the variance is based on a set of sample scores, it is denoted by s2; and when it is based on all scores in a population, it is denoted by  2 ( is the lowercase Greek letter sigma). The variance for a random sample is found by

Measures of Dispersion To understand this formula for the sample variance, we will consider an example before summarizing a procedure. Again, let’s use the data sets we worked with in Example 6. Set A = {8, 9, 9, 9, 10} Set B = {2, 9, 9, 12, 13} Mean is 9. Mean is 9.

Measures of Dispersion Find the deviations by subtracting the mean from each term: 8 – 9 = –1 2 – 9 = –7 9 – 9 = 0 9 – 9 = 0 9 – 9 = 0 12 – 9 = 3 10 – 9 = 1 13 – 9 = 4 If we sum these deviations (to obtain a measure of the total deviation), in each case we obtain 0, because the positive and negative differences “cancel each other out.” Mean Mean

Measures of Dispersion Next we calculate the square of each of these deviations: Set A = {8, 9, 9, 9, 10} Set B = {2, 9, 9, 12, 13} (8 – 9)2 = (–1)2 = 1 (2 – 9)2 = (–7)2 = 49 (9 – 9)2 = 02 = 0 (9 – 9)2 = 02 = 0 (9 – 9)2 = 02 = 0 (12 – 9)2 = 32 = 9 (10 – 9)2 = 12 = 1 (13 – 9)2 = 42 = 16

Measures of Dispersion Finally, we find the sum of these squares and divide by one less than the number of items to obtain the variance: Set A: Set B: The larger the variance, the more dispersion there is in the original data.

Measures of Dispersion

Example 8 – Find the standard deviation for a math test Suppose that Hannah received the following test scores in a math class: 92, 85, 65, 89, 96, and 71. Find s, the standard deviation, for her test scores. Solution: Step 1 This is the mean.

Example 8 – Solution Steps 2–4 We summarize these steps in table format: Score Square of the Deviation from the Mean 92 (92 – 83)2 = 92 = 81 85 (85 – 83)2 = 22 = 4 65 (65 – 83)2 = (–18)2 = 324 89 (89 – 83)2 = 62 = 36 96 (96 – 83)2 = 132 = 169 71 (71 – 83)2 = (–12)2 = 144

Example 8 – Solution cont’d Step 5 Divide the sum by 5 (one less than the number of scores): We note that this number, 151.6, is called the variance. If you do not have access to a calculator, you can use the variance as a measure of dispersion. However, we assume you have a calculator and can find the standard deviation.

Example 8 – Solution cont’d Step 6

Interpreting Measures of Dispersion A main use of dispersion is to compare the amounts of spread in two (or more) data sets. A common technique in inferential statistics is to draw comparisons between populations by analyzing samples that come from those populations.

Example: Interpreting Measures Two companies, A and B, sell small packs of sugar for coffee. The mean and standard deviation for samples from each company are given below. Which company consistently provides more sugar in their packs? Which company fills its packs more consistently? Company A Company B

Example: Interpreting Measures Solution We infer that Company A most likely provides more sugar than Company B (greater mean). We also infer that Company B is more consistent than Company A (smaller standard deviation).

© 2008 Pearson Addison-Wesley. All rights reserved Symmetry in Data Sets The most useful way to analyze a data set often depends on whether the distribution is symmetric or non-symmetric. In a “symmetric” distribution, as we move out from a central point, the pattern of frequencies is the same (or nearly so) to the left and right. In a “non-symmetric” distribution, the patterns to the left and right are different. © 2008 Pearson Addison-Wesley. All rights reserved

Some Symmetric Distributions © 2008 Pearson Addison-Wesley. All rights reserved

Non-symmetric Distributions A non-symmetric distribution with a tail extending out to the left, shaped like a J, is called skewed to the left. If the tail extends out to the right, the distribution is skewed to the right. © 2008 Pearson Addison-Wesley. All rights reserved

Some Non-symmetric Distributions © 2008 Pearson Addison-Wesley. All rights reserved

Chebyshev’s Theorem For any set of numbers, regardless of how they are distributed, the fraction of them that lie within k standard deviations of their mean (where k > 1) is at least © 2008 Pearson Addison-Wesley. All rights reserved

Example: Chebyshev’s Theorem What is the minimum percentage of the items in a data set which lie within 3 standard deviations of the mean? Solution With k = 3, we calculate © 2008 Pearson Addison-Wesley. All rights reserved