Chapter 4 Numerical Methods for Describing Data. Parameter - Fixed value about a population Typical unknown Suppose we want to know the MEAN length of.

Slides:



Advertisements
Similar presentations
Describing Distributions with Numbers
Advertisements

Describing Quantitative Variables
Descriptive Measures MARE 250 Dr. Jason Turner.
Class Session #2 Numerically Summarizing Data
CHAPTER 1 Exploring Data
Numerically Summarizing Data
SECTION 3.3 MEASURES OF POSITION Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1.2: Describing Distributions
Chapter 2 Describing Data with Numerical Measurements
Means & Medians Chapter 5. Parameter - ► Fixed value about a population ► Typical unknown.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Objectives 1.2 Describing distributions with numbers
Chapter 7 Continuous Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements.
Numerical Descriptive Techniques
Numerical Methods for Describing Data Distributions
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Why is the study of variability important? Allows us to distinguish between usual & unusual values In some situations, want more/less variability –scores.
AP Statistics Monday, 21 September 2015 OBJECTIVE TSW examine density curves, z-scores, Chebyshev’s Rule, normal curves, and the empirical rule. ASSIGNMENT.
Numerical Methods for Describing Data
Describing distributions with numbers
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Why is the study of variability important? Allows us to distinguish between usual & unusual values In some situations, want more/less variability –medicine.
Unit 3 Lesson 2 (4.2) Numerical Methods for Describing Data
Why is the study of variability important? Allows us to distinguish between usual & unusual values In some situations, want more/less variability –scores.
Chapter 3 Looking at Data: Distributions Chapter Three
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Means & Medians Chapter 4. Parameter - Fixed value about a population Typical unknown.
1 Chapter 4 Numerical Methods for Describing Data.
LINEAR TRANSFORMATION RULE When adding a constant to a random variable, the mean changes but not the standard deviation. When multiplying a constant to.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
NOTES: page 26. Suppose you take the SAT test and the ACT test. Not using the chart they provide, can you directly compare your SAT Math score to your.
Chapter 4 Lesson 4.4a Numerical Methods for Describing Data
Unit 1 Mr. Lang’s AP Statistics Power point. Homework Assignment 4 For the A: 1, 3, 5, 7, 8, Odd, 27 – 32, 37 – 59 Odd, 60, 69 – 74, 79 – 105 Odd.
Chapter 4 Lesson 4.1 Numerical Methods for Describing Data 4.1: Describing the Center of a Data Set.
Descriptive Statistics ( )
Numerical Methods for Describing Data
Numerical Methods for Describing Data
Variability.
__________.
How to describe a graph Otherwise called CUSS
Interpreting Center & Variability.
NUMERICAL DESCRIPTIVE MEASURES
Interpreting Center & Variability.
Numerical Descriptive Measures
Means & Medians Chapter 4.
Chapter 2b.
Variability.
Variability.
Variability.
CHAPTER 1 Exploring Data
Means & Medians Chapter 4.
Means & Medians Chapter 5.
Variability.
Variability.
Means & Medians Chapter 4.
How to describe a graph Otherwise called CUSS
Measures of Center.
Means & Medians Chapter 5.
Variability.
Variability.
Means & Medians.
Means & Medians Chapter 4.
Presentation transcript:

Chapter 4 Numerical Methods for Describing Data

Parameter - Fixed value about a population Typical unknown Suppose we want to know the MEAN length of all the fish in Lake Lewisville... Is this a value that is known? Can we find it out? At any given point in time, how many values are there for the mean length of fish in the lake?

Statistic - calculatedValue calculated from a sample Suppose we want to know the MEAN length of all the fish in Lake Lewisville. What can we do to estimate this unknown parameter?

Measures of Central Tendency Mode – the observation that occurs the most often –Can be more than one mode –If all values occur only once – there is no mode –Not used as often as mean & median

Measures of Central Tendency Median - the middle value of the data; it divides the observations in half To find: list the observations in numerical order Where n = sample size

Suppose we catch a sample of 5 fish from the lake. The lengths of the fish (in inches) are listed below. Find the median length of fish The numbers are in order & n is odd – so find the middle observation. The median length of fish is 5 inches.

Suppose we caught a sample of 6 fish from the lake. The median length is … The numbers are in order & n is even – so find the middle two observations. The median length is 5.5 inches. Now, average these two values. 5.5

Measures of Central Tendency Mean is the arithmetic average. –Use  to represent a population mean –Use x to represent a sample mean Formula:  is the capital Greek letter sigma – it means to sum the values that follow parameter statistic  is the lower case Greek letter mu

Suppose we caught a sample of 6 fish from the lake. Find the mean length of the fish To find the mean length of fish - add the observations and divide by n.

x(x - x) Sum What is the sum of the deviations from the mean? Now find how each observation deviates from the mean. 0 Will this sum always equal zero? YES This is the deviation from the mean Find the rest of the deviations from the mean The mean is considered the balance point of the distribution because it “balances” the positive and negative deviations.

Imagine a ruler with pennies placed at 3”, 4”, 5”, 6”, 8” and 10”. To balance the ruler on your finger, you would need to place your finger at the mean of 6. The mean is the balance point of a distribution

What happens to the median & mean if the length of 10 inches was 15 inches? The median is The mean is What happened?

What happens to the median & mean if the 15 inches was 20? The median is The mean is What happened?

Statistics that are not affected by extreme values are said to be resistant. Is the median resistant? Is the mean resistant? NO YES

Suppose we caught a sample of 20 fish with the following lengths. Create a histogram for the lengths of fish. (Use a class width of 1.) Mean = Median = Calculate the mean and median. 6.5 Look at the placement of the mean and median in this symmetrical distribution.

Suppose we caught a sample of 20 fish with the following lengths. Create a histogram for the lengths of fish. (Use a class width 1.) Mean = Median = Calculate the mean and median. Look at the placement of the mean and median in this skewed distribution

Suppose we caught a sample of 20 fish with the following lengths. Create a histogram for the lengths of fish. (Use a class width of 1.) Mean = Median = Calculate the mean and median. Look at the placement of the mean and median in this skewed distribution

Recap: In a symmetrical distribution, the mean and median are equal. In a skewed distribution, the mean is pulled in the direction of the skewness. In a symmetrical distribution, you should report the mean! In a skewed distribution, the median should be reported as the measure of center!

Trimmed mean: Purpose is to remove outliers from a data set To calculate a trimmed mean: Multiply the percent to trim by n Truncate that many observations from BOTH ends of the distribution (when listed in order) Calculate the mean with the shortened data set

Mean = 23.8 Find the mean of the following set of data %(10) = 1 So remove one observation from each side! Find a 10% trimmed.

60% of the sample was satisfied with their cell phone service. What values are used to describe categorical data? Suppose that each person in a sample of 15 cell phone users is asked if he or she is satisfied with the cell phone service. Here are the responses: YNYYYNNYYYNYYYNNYY NYYYNNNYYYNN What would be the possible responses? Find the sample proportion of the people who answered “yes”: Pronounced p-hat The population proportion is denoted by the letter p.

Why is the study of variability important? There is variability in virtually everything Allows us to distinguish between usual & unusual values Reporting only a measure of center doesn’t provide a complete picture of the distribution. Does this can of soda contain exactly 12 ounces?

What is the mean and median of these three graphs? A B C

Measures of Variability The simplest numeric measure of variability is range. Range = largest observation – smallest observation The first two data sets have a range of 50 (70-20) but the third data set has a much smaller range of 10. What is the range of these data sets? A B C

Measures of Variability How would a dotplot look if the average deviation was 0? What does it mean to have an average deviation of 0?

Measures of Variability Another measure of the variability in a data set uses the deviations from the mean (x – x). A What is the mean of this distribution? 45 What is a deviation from the mean?

Measures of Variability Another measure of the variability in a data set uses the deviations from the mean (x – x). Remember the sample of 6 fish that we caught from the lake... They were the following lengths: 3”, 4”, 5”, 6”, 8”, 10” The mean length was 6 inches. Recall that we calculated the deviations from the mean. What was the sum of these deviations? Can we find an average deviation? What can we do to the deviations so that we could find an average? The estimated average of the deviations squared is called the variance. Degree of freedom (explained later) Population variance is denoted by  2 and divided by n.

x(x - x) Sum0 What is the sum of the deviations squared? Remember the sample of 6 fish that we caught from the lake... Find the variance of the length of fish. Divide this by 5. First square the deviations Finding the average of the deviations would always equal 0! s 2 = 6.5 (x - x) 2 What could we do so that we would be able to find an average deviation?

Measures of Variability The square root of variance is called standard deviation. A typical deviation from the mean is the standard deviation. s 2 = 6.8 inches 2 so s = inches The fish in our sample deviate from the mean of 6 by an average of inches.

Calculation of standard deviation of a sample Population standard deviation is denoted by  (where n is used in the denominator). The most commonly used measures of center and variability are the mean and standard deviation, respectively.

Degrees of Freedom (df) The number of independent observations that are free to vary Suppose we consider the sample of 6 fish where the mean is 6 inches. Five of these values are free to be any possible length of fish! However, once these five values occur, then the sixth value is no longer free to vary. It MUST be a specific value in order for the deviations from the mean (of 6) to have a sum of zero. Thus, out of a sample of n, n - 1 observations are free to vary.

Measures of Variability Interquartile range (IQR) is the range of the middle half of the data. Lower quartile (Q 1 ) is the median of the lower half of the data Upper quartile (Q 3 ) is the median of the upper half of the data IQR = Q 3 – Q 1

The Chronicle of Higher Education ( issue) published the accompanying data on the percentage of the population with a bachelor’s or higher degree in 2007 for each of the 50 states and the District of Columbia Find the interquartile range for this set of data.

First put the data in order & find the median Find the lower quartile (Q 1 ) by finding the median of the lower half. 24 Find the upper quartile (Q 3 ) by finding the median of the upper half. 30 IQR = 30 – 24 = 6

Which measure(s) of variability (spread) is/are resistant? Only the IQR!

Wolf Stat Company Activity How does the mean and standard deviation change with linear transformations?

Linear transformation rule When adding a constant to a random variable, the mean changes but not the standard deviation. When multiplying a constant to a random variable, the mean and the standard deviation changes.

An appliance repair shop charges a $30 service call to go to a home for a repair. It also charges $25 per hour for labor. From past history, the average length of repairs is 1 hour 15 minutes (1.25 hours) with standard deviation of 20 minutes (1/3 hour). Including the charge for the service call, what is the mean and standard deviation for the charges for labor?

Stat Land Game Activity ? Move 1  How do you combine the mean and standard deviation of two independent random variables?

Rules for Combining two variables To find the mean for the sum (or difference), add (or subtract) the two means To find the standard deviation of the sum (or differences), ALWAYS add the variances, then take the square root. If variables are independent

Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted, etc.). Based on past experience, the times for each setup phase are independent with the following means & standard deviations (in minutes). What are the mean and standard deviation for the total bicycle setup times? PhaseMeanSD Unpacking Assembly Tuning

Another graph- Boxplots What are some advantages of boxplots? Ease of construction Convenient handling of outliers Construction is not subjective (like histograms) Used with medium or large size data sets (n > 10) Useful for comparative displays

Boxplots When to Use Univariate numerical data How to construct a Skeleton Boxplot –Calculate the five number summary –Draw a horizontal (or vertical) scale –Construct a rectangular box from the lower quartile (Q 1 ) to the upper quartile (Q 3 ) –Draw lines from the lower quartile to the smallest observation and from the upper quartile to the largest observation To describe – comment on the center, spread, and shape of the distribution and if there is any unusual features Use for moderate to large data sets. Don’t use with data sets of n < 10. The five-number summary is the minimum value, first quartile, median, third quartile, and maximum value

Remember the data on the percentage of the population with a bachelor’s or higher degree in 2007 for each of the 50 states and the District of Columbia First draw a scale Draw a box from Q 1 to Q 3 Draw a line for the median Draw lines for the whiskers

Modified boxplots To display outliers: Identify mild & extreme outliers An observation is an outliers if it is more than 1.5(iqr) away from the nearest quartile. An outlier is extreme if it is more than 3(iqr) away from the nearest quartile. whiskers extend to largest (or smallest) data observation that is not an outlier Modified boxplots are generally preferred because they provide more information about the data distribution.

Remember the data on the percentage of the population with a bachelor’s or higher degree in 2007 for each of the 50 states and the District of Columbia First, draw the scale, box and the line for the median Draw lines for the whiskers Next calculate the fences for outliers (6) = (6) = (6) = 48 There is one outlier at the upper end at the distribution, but none at the lower end. Is it extreme? Place a solid dot for the outlier To describe: The distribution of percent of the population with a bachelor’s degree or higher for the U.S. states and District of Columbia is positively skewed with an outlier at 47%. The median percentage is at 26% with a range of 30%.

Symmetrical boxplots Approximately symmetrical boxplot Skewed boxplot Notice that all 3 boxplots are identical, but their corresponding histograms are very different. Can you determine the number of modes from a boxplot? Notice that the range of the lower half and the range of the upper half of this distribution are approximately equal so we can say that it is approximately symmetrical. However, the range of the two halves of this distribution are definitely different sizes, so it would be skewed in the direction of the longest side.

The salaries of NBA players published on the web site hoopshype.com were used to construct the comparative boxplot of salary data for five teams. Discuss the similarities and differences.

Normal Curve Bell-shaped, symmetrical, unimodal curve Transition points between cupping upward and downward occur at  ±  As the standard deviation increases, the curve flattens and spreads As the standard deviation decreases, the curve gets taller and thinner Let’s use our calculator to graph some normal curves Put the following into your calculator: (Window: x: [0,20] & y: [0,0.3]) Y1: normalpdf(X,10,2) Y2: normalpdf(X,10,1.5) Y3: normalpdf(X,10,3) What happens?

Input the following command into a graphing calculator in order to graph a normal curve with a mean of 20 and standard deviation of 3. Y1 = normalpdf(X,20,3)(Window x: [10,30] y: [0,0.2]) Use the command 2nd trace, 7 to find the area under the curve for the: (Round to 3 decimal places.) Lower limit: 17Upper limit: 23Area: ________ Lower limit: 14Upper limit: 26Area: ________ Lower limit: 11Upper limit: 29Area: ________ What’s my area?

Graph a normal curve with a mean of 50 and standard deviation of 5. Y1 = normalpdf(X,50,5) (x: [30,70] y: [0,0.1]) Find the area under the curve for the following: Lower limit: 45Upper limit: 55Area: ________ Lower limit: 40Upper limit: 60Area: ________ Lower limit: 35Upper limit: 65Area: ________ What’s my area? What pattern do you notice?

Interpreting Center & Variability Empirical Rule- Approximately 68% of the observations are within 1 standard deviation of the mean Approximately 95% of the observations are within 2 standard deviation of the mean Approximately 99.7% of the observations are within 3 standard deviation of the mean Can ONLY be used with distributions that are mound shaped! 68% 95% 99.7%

The height of male students at PWSH is approximately normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. a)What percent of the male students are shorter than 66 inches? b) Taller than 73.5 inches? c) Between 66 & 73.5 inches? About 2.5% About 16% About 81.5%

Measures of Relative Standing Z-score A z-score tells us how many standard deviations the value is from the mean. One example of standardized score.

What do these z-scores mean? standard deviations below the mean 1.8 standard deviations above the mean 4.3 standard deviations below the mean

Sally is taking two different math achievement tests with different means and standard deviations. The mean score on test A was 56 with a standard deviation of 3.5, while the mean score on test B was 65 with a standard deviation of 2.8. Sally scored a 62 on test A and a 69 on test B. On which test did Sally score the best? She did better on test A. Z-score on test AZ-score on test B

Measures of Relative Standing Percentiles A percentile is a value in the data set where r percent of the observations fall AT or BELOW that value

In addition to weight and length, head circumference is another measure of health in newborn babies. The National Center for Health Statistics reports the following summary values for head circumference (in cm) at birth for boys. Head circumference (cm) Percentile What percent of newborn boys had head circumferences greater than 37.0 cm? 10% of newborn babies have head circumferences bigger than what value? 25% 38.2 cm