Section 3.2 Measures of Spread

Slides:



Advertisements
Similar presentations
Trends in Number of High School Graduates: National
Advertisements

Hwy Ops Div1 THE GREAT KAHUNA AWARD !!! TEA 2004 CONFERENCE, MOBILE, AL OCTOBER 09-11, 2004 OFFICE OF PROGRAM ADMINISTRATION HIPA-30.
The West` Washington Idaho 1 Montana Oregon California 3 4 Nevada Utah
TOTAL CASES FILED IN MAINE PER 1,000 POPULATION CALENDAR YEARS FILINGS PER 1,000 POPULATION This chart shows bankruptcy filings relative to.
BINARY CODING. Alabama Arizona California Connecticut Florida Hawaii Illinois Iowa Kentucky Maine Massachusetts Minnesota Missouri 0 Nebraska New Hampshire.
U.S. Civil War Map On a current map of the U.S. identify and label the Union States, the Confederate States, and U.S. territories. Create a map key and.
Chart 6. 12: Impact of Community Hospitals on U. S
SECTION 3.2 MEASURES OF SPREAD Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
This chart compares the percentage of cases filed in Maine under chapter 13 with the national average between 1999 and As a percent of total filings,
Fasten your seatbelts we’re off on a cross country road trip!
Map Review. California Kentucky Alabama.
Judicial Circuits. If You Live In This State This Is Your Judicial Circuit Alabama11th Circuit Alaska 9th Circuit Arkansas 8th Circuit Arizona 9th Circuit.
1. AFL-CIO What percentage of the funds received by Alabama K-12 public schools in school year was provided by the state of Alabama? a)44% b)53%
The United States.
Figure 1. Growth of HSA/HDHP Enrollment from March 2005 to January Source: 2010 AHIP HSA/HDHP Census.
Directions: Label Texas, Arkansas, Louisiana, Mississippi, Tennessee, Alabama, Georgia, Florida, South Carolina, North Carolina, Virginia--- then color.
 As a group, we thought it be interesting to see how many of our peers drop out of school.  Since in the United States education is so important, we.
CHAPTER 7 FILINGS IN MAINE CALENDAR YEARS 1999 – 2009 CALENDAR YEAR CHAPTER 7 FILINGS This chart shows total case filings in Maine for calendar years 1999.
Study Cards The East (12) Study Cards The East (12) New Hampshire New York Massachusetts Delaware Connecticut New Jersey Rhode Island Rhode Island Maryland.
Hawaii Alaska (not to scale) Alaska GeoCurrents Customizable Base Map text.
US MAP TEST Practice
Education Level. STD RATE Teen Pregnancy Rates Pre-teen Pregnancy Rate.
TOTAL CASE FILINGS - MAINE CALENDAR YEARS 1999 – 2009 CALENDAR YEAR Total Filings This chart shows total case filings in Maine for calendar years 1999.
The United States is a system that can be broken into 5 major parts or regions.
United States Cultural Regions. New England The six states of New England are Maine, New Hampshire, Vermont, Rhode Island, Massachusetts and Connecticut.
Can you locate all 50 states? Grade 4 Mrs. Kuntz.
USA ILLUSTRATIONS – US CHARACTER Go ahead and replace it with your own text. This is an example text. Go ahead and replace it with your own text Go ahead.
1st Hour2nd Hour3rd Hour Day #1 Day #2 Day #3 Day #4 Day #5 Day #2 Day #3 Day #4 Day #5.
NEADA Winter Meeting February 28, 2017.
2012 IFTA / IRP MANAGERS’AND LAW ENFORCEMENT WORKSHOP
Table 2.1: Number of Community Hospitals,(1) 1994 – 2014
2c: States grouped by region
The United States Song Wee Sing America.
Expanded State Agency Use of NMLS
The United States.
Supplementary Data Tables, Utilization and Volume
Sales Tax Raw Data State Sales Tax 1 Alabama 4% 2 Alaska 0% 3 Arizona
Maps.
Physicians per 1,000 Persons
USAGE OF THE – GHz BAND IN THE USA
USA! E M O C L E W MAP OF USA To the Go ahead, use your tools below:
Chart 6. 12: Impact of Community Hospitals on U. S
Table 3.1: Trends in Inpatient Utilization in Community Hospitals, 1992 – 2012
Name the State Flags Your group are to identify which state the flag belongs to and sign correctly to earn a point.
GLD Org Chart February 2008.
Membership Update July 13, 2016.
2008 presidential election
Table 3.1: Trends in Inpatient Utilization in Community Hospitals, 1987 – 2007
State Adoption of Uniform State Test
The States How many states are in the United States?
State Adoption of NMLS ESB
Supplementary Data Tables, Trends in Overall Health Care Market
Fifty nifty United States
AIDS Education & Training Center Program Regional Centers
Table 2.3: Beds per 1,000 Persons by State, 2013 and 2014
Regions of the United States
DO NOW: TAKE OUT ANY FORMS OR PAPERS YOU NEED TO TURN IN
Self-Reported Obesity Among U.S. Adults in 2012
Regions of the United States
Supplementary Data Tables, Utilization and Volume
Presidential Electoral College Map
2008 presidential election
WASHINGTON MAINE MONTANA VERMONT NORTH DAKOTA MINNESOTA MICHIGAN
Expanded State Agency Use of NMLS
CBD Topical Sales Restrictions by State (as of May 23, 2019)
Percent of adults aged 18 years and older who have obesity †
In 2006, approximately 46% of all AIDS cases among adults and adolescents were in the South, followed by the Northeast (26%), the West (16%), and the Midwest.
Misinformation – We are not extremely over taxed
AIDS Education & Training Center Program Regional Centers
USAGE OF THE 4.4 – 4.99 GHz BAND IN THE USA
Presentation transcript:

Section 3.2 Measures of Spread

Objectives Compute the range of a data set Compute the variance of a population and a sample Compute the standard deviation of a population and a sample Approximate the standard deviation using grouped data Use the Empirical Rule to summarize data that are unimodal and approximately symmetric Use Chebyshev’s Inequality to describe a data set Compute the coefficient of variation

Objective 1 Compute the range of a data set

The Range The range of a data set is a measure of spread. That is, it measure how spread out the data are. The range of a data set is the difference between the largest and the smallest value. Range = Largest Value – Smallest Value

Source: National Weather Service Example The following table presents the average monthly temperature, in degrees Fahrenheit, for the cities of San Francisco and St. Louis. Compute the range for each city. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec San Francisco 51 54 55 56 58 60 61 63 62 52 St. Louis 30 35 44 57 66 75 79 78 70 59 45 Source: National Weather Service

Source: National Weather Service Solution The largest value for San Francisco is 63 and the smallest is 51. The range for San Francisco is 63 – 51 = 12. The largest value for St. Louis is 79 and the smallest is 30. The range for St. Louis is 79 – 30 = 49. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec San Francisco 51 54 55 56 58 60 61 63 62 52 St. Louis 30 35 44 57 66 75 79 78 70 59 45 Source: National Weather Service

The Range is Not Used in Practice Although the range is easy to compute, it is not often used in practice. The reason is that the range involves only two values from the data set; the largest and smallest. The measures of spread that are most often used are the variance and the standard deviation, which use every value in the data set.

Objective 2 Compute the variance of a population and a sample

Variance When a data set has a small amount of spread, like the San Francisco temperatures, most of the values will be close to the mean. When a data set has a larger amount of spread, more of the data values will be far from the mean. The variance is a measure of how far the values in a data set are from the mean, on the average. The variance is computed slightly differently for populations and samples. The population variance is presented first.

Definition: Population Variance Let x1, x2, x3, … xN denote the values in a population of size N. Let μ denote the population mean. The population variance, denoted by σ2 , is

Procedure for Computing the Population Variance Following is the procedure for computing the population variance of a data set: Step 1: Compute the population mean μ. Step 2: For each population value xi compute xi – μ. This is called the deviation for the value xi. Step 3: Square the deviations to obtain the quantity (xi – μ)2. Step 4: Sum the squared deviations to obtain the quantity Σ(xi – μ)2. Step 5: Divide the sum obtained in Step 4 by the population size N to obtain the population variance σ2.

Example Compute the population variance for the San Francisco temperatures. Solution: Step 1: Compute the population mean μ. Step 2: For each population value xi compute xi – μ. These values are shown in the second row below. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec San Francisco 51 54 55 56 58 60 61 63 62 52 Xi 51 54 55 56 58 60 61 63 62 52 xi – μ –6.5 –3.5 –2.5 –1.5 0.5 2.5 3.5 5.5 4.5 -5.5

Solution Step 3: Square the deviations to obtain the quantity (xi – μ)2. These values are shown in the third row. Step 4: Sum the squared deviations to obtain the quantity Σ(xi – μ)2. Step 5: Divide the sum obtained in Step 4 by the population size N to obtain the population variance σ2. xi 51 54 55 56 58 60 61 63 62 52 xi – μ –6.5 –3.5 –2.5 –1.5 0.5 2.5 3.5 5.5 4.5 -5.5 (xi – μ)2 42.25 12.25 6.25 2.25 0.25 30.25 20.25

Sample Variance When the data values come from a sample rather than a population, the variance is called the sample variance. The procedure for computing the sample variance is a bit different from the one used to compute a population variance. In the formula, the mean μ is replaced by the sample mean and the denominator is n – 1 instead of N. The sample variance is denoted by s2.

Why Divide by n –1? When computing the sample variance, we use the sample mean to compute the deviations. For the population variance we use the population mean for the deviations. It turns out that the deviations using the sample mean tend to be a bit smaller than the deviations using the population mean. If we were to divide by n when computing a sample variance, the value would tend to be a bit smaller than the population variance. It can be shown mathematically that the appropriate correction is to divide the sum of the squared deviations by n –1 rather than n.

Example A company that manufactures batteries is testing a new type of battery designed for laptop computers. They measure the lifetimes, in hours, of six batteries, and the results are presented in the following table. Find the sample variance of the lifetimes. Solution: Step 1: Compute the sample mean. Step 2: For each sample value xi compute xi – . These values are shown in the second row below. Battery Lifetime 3 4 6 5 2 xi 3 4 6 5 2 xi – –1 1 –2

Solution Step 3: Square the deviations to obtain the quantity (xi – )2. These values are shown in the third row. Step 4: Sum the squared deviations to obtain the quantity Σ(xi – )2. Step 5: Divide the sum obtained in Step 4 by n –1 to obtain the sample variance s2. xi 3 4 6 5 2 xi – –1 1 –2 (xi – )2

Objective 3 Compute the standard deviation of a population and a sample

Standard Deviation Because the variance is computed using squared deviations, the units of the variance are the squared units of the data. For example, in Battery Lifetime example, the units of the data are hours, and the units of variance are squared hours. In most situations, it is better to use a measure of spread that has the same units as the data. We do this simply by taking the square root of the variance. This quantity is called the standard deviation. The standard deviation of a sample is denoted s, and the standard deviation of a population is denoted by σ.

Example Recall that in the Battery Lifetime example, the sample variance was computed as s2 = 2. Find the sample standard deviation. Solution: The sample standard deviation, s, is the square root of the sample variance. Battery Lifetime 3 4 6 5 2

Standard Deviation on the TI-84 PLUS The following steps will compute the standard deviation for both sample data and population data on the TI-84 PLUS Calculator: Enter the data into L1 in the data editor. Run the 1-Var Stats command (the same command used for means and medians), selecting L1 as the location of the data.

Example Using the TI-84 PLUS Calculator, find the sample variance. A company that manufactures batteries is testing a new type of battery designed for laptop computers. They measure the lifetimes, in hours, of six batteries, and the results are presented in the following table. Solution: Running the 1-Var Stats command, we find the sample standard deviation to be s = 1.414213562. We square this quantity to find the sample variance: s2 = (1.414213562)2 = 2. Battery Lifetime 3 4 6 5 2

Standard Deviation & Resistance Recall that a statistic is resistant if its value is not affected much by extreme data values. The standard deviation is not resistant. That is, the standard deviation is affected by extreme data values.

Objective 4 Approximate the standard deviation using grouped data

Approximating the Standard Deviation Sometimes we don’t have access to the raw data in a data set, but we are given a frequency distribution. In these cases we can approximate the standard deviation.

Approximating the Standard Deviation Following is the procedure for approximating the standard deviation: Step 1: Compute the midpoint of each class and approximate the mean of the frequency distribution. Step 2: For each class, subtract mean from the class midpoint to obtain (Midpoint – Mean). Step 3: For each class, square the differences obtained in Step 2 to obtain (Midpoint – Mean)2, and multiply by the frequency to obtain (Midpoint – Mean)2 x (Frequency). Step 4: Add the products (Midpoint – Mean)2 x (Frequency) over all classes. Step 5: To compute the population variance, divide the sum obtained in Step 4 by n. To compute the sample variance, divide the sum obtained in Step 4 by n –1. Step 6: Take the square root of the variance obtained in Step 5. The result is the standard deviation.

Number of Text Messages Sent Example The following table presents the number of text messages sent via cell phone by a sample of 50 high school students. Approximate the sample standard deviation number of messages sent. Number of Text Messages Sent Frequency 0 – 49 10 50 – 99 5 100 – 149 13 150 – 199 11 200 – 249 7 250 – 299 4

Number of Text Messages Sent Solution Step 1: Compute the midpoint of each class. Recall from the last section that the sample mean was computed as 137. Number of Text Messages Sent Class Midpoint 0 – 49 25 50 – 99 75 100 – 149 125 150 – 199 175 200 – 249 225 250 – 299 275

Number of Text Messages Sent Solution Step 2: For each class, subtract mean from the class midpoint to obtain (Midpoint – Mean). Number of Text Messages Sent Class Midpoint (Midpoint – Mean) 0 – 49 25 –112 50 – 99 75 –62 100 – 149 125 –12 150 – 199 175 38 200 – 249 225 88 250 – 299 275 138

Number of Text Messages Sent (Midpoint – Mean)2 x (Frequency) Solution Step 3: For each class, square the differences obtained in Step 2 to obtain (Midpoint – Mean)2, and multiply by the frequency to obtain (Midpoint – Mean)2 x (Frequency). Number of Text Messages Sent Frequency (Midpoint – Mean) (Midpoint – Mean)2 x (Frequency) 0 – 49 10 –112 125,440 50 – 99 5 –62 19,220 100 – 149 13 –12 1,872 150 – 199 11 38 15,884 200 – 249 7 88 54,208 250 – 299 4 138 76,176

(Midpoint – Mean)2 x (Frequency) Solution Step 4: Add the products (Midpoint – Mean)2 x (Frequency) over all classes. (Midpoint – Mean)2 x (Frequency) 125,440 19,220 1,872 15,884 54,208 76,176

Solution Step 5: Since we are computing the sample variance, we divide the sum obtained in Step 4 by n –1. Step 6: Take the square root of the variance to obtain the standard deviation.

Grouped Data on the TI-84 PLUS The same procedure used to compute the mean for grouped data in a frequency distribution may be used to compute the standard deviation. Enter the midpoint for each class into L1 and the corresponding frequencies in L2. Next, select the 1-Var stats command and enter L1 in the List field and L2 in the FreqList field, if using Stats Wizards. If you are not using Stats Wizards, you may run the1-Var Stats command followed by L1, comma, L2.

Example Class Midpoint Frequency 25 10 75 5 125 13 175 11 225 7 275 4 The output for the last example on the TI-84 PLUS Calculator is presented below. The value of s represents the approximate sample standard deviation. In this example s = 77.30142. Therefore the approximate standard deviation is 77.30142.

Objective 5 Use the Empirical Rule to summarize data that are unimodal and approximately symmetric

Bell-Shaped Histogram Many histograms have a single mode near the center of the data, and are approximately symmetric. Such histograms are often referred to as bell-shaped.

The Empirical Rule When a data set has a bell-shaped histogram, it is often possible to use the standard deviation to provide an approximate description of the data using a rule known as The Empirical Rule. When a population has a histogram that is approximately bell-shaped, then: Approximately 68% of the data will be within one standard deviation of the mean. Approximately 95% of the data will be within two standard deviations of the mean. All, or almost all, of the data will be within three standard deviations of the mean.

Example The following table presents the U.S. Census Bureau projection for the percentage of the population aged 65 and over for each state and the District of Columbia. Compute the population mean and standard deviation and use The Empirical Rule to describe the data. Alabama 14.1 Rhode Island Nevada 12.3 Kentucky 13.1 Arkansas 14.3 Tennessee 13.3 New Mexico Maryland 12.2 Connecticut 14.4 Vermont North Dakota 15.3 Minnesota 12.4 Florida 17.8 West Virginia 16 Oregon 13 Montana 15 Idaho 12 Alaska 8.1 South Carolina 13.6 New Hampshire 12.6 Iowa 14.9 California 11.5 Texas 10.5 New York Louisiana Delaware Virginia Ohio 13.7 Massachusetts Georgia 10.2 Wisconsin 13.5 Pennsylvania 15.5 Mississippi 12.8 Illinois Arizona 13.9 South Dakota 14.6 Nebraska 13.8 Kansas 13.4 Colorado 10.7 Utah 9 New Jersey Maine 15.6 D.C. Washington North Carolina Michigan Hawaii Wyoming 14 Oklahoma Missouri Indiana 12.7

Solution We first note that the histogram is approximately bell-shaped. We may use the TI-84 PLUS Calculator – or other technology – to compute the population mean and population standard deviation. Mean: µ = 13.249 Standard Deviation: σ = 1.6827

Solution Approximately 68% of the data values are between these. Almost all of the data values are between these. 8.20 9.88 11.57 14.93 16.61 18.30

Objective 6 Use Chebyshev’s Inequality to describe a data set

Any Data Set When a distribution is bell-shaped, we use The Empirical Rule to approximate the proportion of data within one or two standard deviations. Another rule called Chebyshev’s Inequality holds for any data set.

Chebyshev’s Inequality In any data set, the proportion of the data that is within K standard deviations of the mean is at least 1– 1/K2. Specifically, by setting K = 2 or K = 3, we obtain the following results. At least 3/4 (75%) of the data are within two standard deviations of the mean. At least 8/9 (89%) of the data are within three standard deviations of the mean.

Example As part of a public health study, systolic blood pressure was measured for a large group of people. The mean was 120 and the standard deviation was 10. What information does Chebyshev’s Inequality provide about these data? Solution: We compute the following: We conclude: At least 3/4 (75%) of the people had systolic blood pressures between 100 and 140. At least 8/9 (89%) of the people had systolic blood pressures between 90 and 150.

Objective 7 Compute the coefficient of variation

Coefficient of Variation The coefficient of variation (CV for short) tells how large the standard deviation is relative to the mean. It can be used to compare the spreads of data sets whose values have different units. The coefficient of variation is found by dividing the standard deviation by the mean.

Example National Weather service records show that over a thirty-year period, the annual precipitation in Atlanta, Georgia had a mean of 49.8 inches with a standard deviation of 7.6 inches, and the annual temperature had a mean of 62.2 degrees Fahrenheit with a standard deviation of 1.3 degrees. Compute the coefficient of variation for precipitation and for temperature. Which has greater spread relative to its mean?

Solution We compute the following: The CV for precipitation is larger than the CV for temperature. Therefore precipitation has a greater spread relative to its mean.

Do You Know… How to compute the range of a data set? How to compute the variance of a population and a sample and the appropriate notation? How to compute the standard deviation of a population and a sample and the appropriate notation? How to approximate the standard deviation using grouped data? How to use the Empirical Rule to summarize data? How to use Chebyshev’s Inequality to describe a data set? How to compute the coefficient of variation?