Download presentation
Presentation is loading. Please wait.
Published byMadison Butler Modified over 9 years ago
1
Copyright © 2004 Pearson Education, Inc.
2
Chapter 2 Descriptive Statistics Describe, Explore, and Compare Data 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative Standing 2-7 Exploratory Data Analysis
3
Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-1 Overview
4
Copyright © 2004 Pearson Education, Inc. Descriptive Statistics Describe the important characteristics of a set of data. Organize, present and summarize data: 1. Graphically 2. Numerically Overview
5
Copyright © 2004 Pearson Education, Inc. “Shape, Center, and Spread” 1. Center: A representative or average value that indicates where the middle of the data set is located 2. Variation: A measure of the amount that the values vary among themselves 3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed) Important Characteristics of Quantitative Data
6
Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-2 and 2-3 Frequency Distributions And Visualizing Data
7
Copyright © 2004 Pearson Education, Inc. Frequency Distribution Table that organizes data values into classes along with the number of data values that fall in each class (frequency, f). 1. Ungrouped Frequency Distribution – for data sets with few different values. Each value is in its own class. 2. Grouped Frequency Distribution: for data sets with many different values, which are grouped together in the classes. Frequency Distributions And Histograms
8
Copyright © 2004 Pearson Education, Inc. Ungrouped Frequency Distributions Number of Peas in a Pea Pod Sample Size: 50 55464 37635 65455 62355 55743 45456 51626 66664 45453 55765 Peas per podFreq, f Peas per podFreq, f 11 22 35 49 518 612 73
9
Copyright © 2004 Pearson Education, Inc. Frequency Histogram A bar graph that represents the frequency distribution of a data set. It has the following properties: 1.Horizontal scale is quantitative and measures the data values. 2.Vertical scale measures the frequencies of the classes. 3.Consecutive bars must touch.
10
Copyright © 2004 Pearson Education, Inc. Frequency Histogram Ex. Peas per Pod Peas per podFreq, f 11 22 35 49 518 612 73
11
Copyright © 2004 Pearson Education, Inc. Relative Frequency Distribution Shows the proportion (or percentage) of data values that fall into each class relative frequency: rf = f/n Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies. Relative Frequency Distributions and Relative Frequency Histograms
12
Copyright © 2004 Pearson Education, Inc. Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies. Figure 2-2
13
Copyright © 2004 Pearson Education, Inc. Group data into 5-20 classes of equal width. Grouped Frequency Distributions Exam ScoresFreq, f 30-391 40-490 50-594 60-699 70-7913 80-8910 90-993
14
Copyright © 2004 Pearson Education, Inc. Lower class limits: are the smallest numbers that can actually belong to different classes Upper class limits: are the largest numbers that can actually belong to different classes Class width: is the difference between two consecutive lower class limits or two consecutive lower class boundaries Class midpoints: the value halfway between LCL and UCL Class boundaries : the value halfway between an UCL and the next LCL Definitions
15
Copyright © 2004 Pearson Education, Inc. 1. Calculate the range of values to span the set: Range = Hi – Low. (May round up) 2.Decide on the number of classes (should be between 5 and 20). 3. Calculate class width: (May round up) 4. Choose the 1 st LCL (less than or equal to smallest value) 5. Write all LCLs by adding the class width. 6. Enter all the UCLs. 7. Find the frequencies for each class. Constructing a Grouped Frequency Table class width (highest value) – (lowest value) number of classes
16
Copyright © 2004 Pearson Education, Inc. Symmetric Data is symmetric if the left half of its histogram is roughly a mirror image of its right half. Skewed Data is skewed if it is not symmetric and if it extends more to one side than the other. Uniform Data is uniform if it is equally distributed (on a histogram, all the bars are the same height). “Shape” of Distribution
17
Copyright © 2004 Pearson Education, Inc. Shape Figure 2-11
18
Copyright © 2004 Pearson Education, Inc. Outliers are “unusal” data values as compared to the rest of the set. They may be distinguished by gaps in a histogram. Outliers
19
Copyright © 2004 Pearson Education, Inc. Besides histograms, there are other ways to graph quantitative data: 1. Stem and Leaf plots 2. Dot plots 3. Time Series Other Graphs
20
Copyright © 2004 Pearson Education, Inc. Stem-and Leaf Plot Represents data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit)
21
Copyright © 2004 Pearson Education, Inc. Dot Plot Consists of a graph in which each data value is plotted as a point along a scale of values Figure 2-5
22
Copyright © 2004 Pearson Education, Inc. Time-Series Graph Data that have been collected at different points in time. Figure 2-8 Ex. www.eia.doe.gov/oil_gas/petroleum/www.eia.doe.gov/oil_gas/petroleum/
23
Copyright © 2004 Pearson Education, Inc. The two most common graphs for qualitative data are: 1. Pareto Charts (Bar charts) 2. Pie Charts Qualitative Data
24
Copyright © 2004 Pearson Education, Inc. Pareto Chart A bar graph for qualitative data, with the bars arranged in order according to frequencies Figure 2-6
25
Copyright © 2004 Pearson Education, Inc. Pie Chart A graph depicting qualitative data as slices pf a pie Figure 2-7
26
Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-4 Measures of Center
27
Copyright © 2004 Pearson Education, Inc. Measures of Center Measure of Center Number representing a “typical” or central value of a data set. An “average”. There are 4 common “averages”: 1.Mean 2.Median 3.Mode 4.Midrange
28
Copyright © 2004 Pearson Education, Inc. Mean: the measure of center obtained by adding the values and dividing the total by the number of values. The Mean
29
Copyright © 2004 Pearson Education, Inc. Notation denotes the addition of a set of values x is the variable usually used to represent the individual data values n represents the number of values in a sample N represents the number of values in a population
30
Copyright © 2004 Pearson Education, Inc. Notation µ is pronounced ‘mu’ and denotes the mean of all values in a population x = n x x is pronounced ‘x-bar’ and denotes the mean of a set of sample values x N µ = x x
31
Copyright © 2004 Pearson Education, Inc. Carry one more decimal place than is present in the original set of values. Round-off Rule for Measures of Center
32
Copyright © 2004 Pearson Education, Inc. Median Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude often denoted by x (pronounced ‘x-tilde’) ~ is not affected by an extreme value
33
Copyright © 2004 Pearson Education, Inc. Finding the Median If the number of values is odd, the median is the number located in the exact middle of the list If the number of values is even, the median is found by computing the mean of the two middle numbers
34
Copyright © 2004 Pearson Education, Inc. 2561113 odd number of values: median is the exact middle value MEDIAN is 6 2 5 6 9 11 13 6 + 9 2 even number of values: median is the mean of the by two numbers MEDIAN is 7.5
35
Copyright © 2004 Pearson Education, Inc. Mode Mode: the value that occurs most frequently. The mode is not always unique. A data set may be:Bimodal Multimodal No Mode example: a. 5.40 1.10 0.42 0.73 0.48 1.10 b. 27 27 27 55 55 55 88 88 99 c. 1 2 3 6 7 8 9 10 Mode is 1.10 Bimodal - 27 & 55 No Mode
36
Copyright © 2004 Pearson Education, Inc. Midrange: the value midway between the highest and lowest values in the Original data set. Midrange Midrange = highest score + lowest score 2
37
Copyright © 2004 Pearson Education, Inc. Best Measure of Center
38
Copyright © 2004 Pearson Education, Inc. Picking the best “average” The shape of your data may help determine the best measure of center. Outliers may effect the mean, making it too high or too low to represent a “typical” value. If so, the median may be the best choice.
39
Copyright © 2004 Pearson Education, Inc. Shape Figure 2-11
40
Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-5 Measures of Variation
41
Copyright © 2004 Pearson Education, Inc. Measures of Variation “Spread” Because this section introduces the concept of variation, this is one of the most important sections in the entire book. The two most common methods of measuring spread: 1. Range 2. Standard deviation and variance
42
Copyright © 2004 Pearson Education, Inc. Definition The range of a set of data is the difference between the highest value and the lowest value value highest lowest value
43
Copyright © 2004 Pearson Education, Inc. Standard Deviation and Variance measure the amount data values vary (or deviate) from the mean. ( x - x ) 2 n - 1 S2 =S2 = sample variance: sample standard deviation: s2s2 S =S = ( x - x ) 2 n - 1 =
44
Copyright © 2004 Pearson Education, Inc. Round-off Rule for Measures of Variation Carry one more decimal place than is present in the original set of data. Round only the final answer, not values in the middle of a calculation.
45
Copyright © 2004 Pearson Education, Inc. Notation SamplePopulation StatisticsParameters Mean x µ Standards σ Deviation Variances 2 σ 2
46
Copyright © 2004 Pearson Education, Inc. Sample vs. Population Standard Deviation 2 ( x - µ ) N = Note: Unlike x and µ, the formulas for s and σ are not mathematically the same: ( x - x ) 2 n - 1 s = s =
47
Copyright © 2004 Pearson Education, Inc. Standard Deviation - Key Points The standard deviation is a measure of variation of all values from the mean. The larger s is, the more the data varies. ( When would s = 0 ?) The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others) The units of the standard deviation s are the same as the units of the original data values (The variance has units 2 ).
48
Copyright © 2004 Pearson Education, Inc. Standard Deviation and “Spread” How does “s” show how much the data varies? Three methods: 1. Range Rule of Thumb 2. Chebyshev’s Theorem 3. The Empirical Rule
49
Copyright © 2004 Pearson Education, Inc. The Range Rule of Thumb Alternatively, If the range is known, you can use the range rule to estimate the standard deviation: Range 4 s Range Rule: For most data sets, the majority of the data lies within 2 standard deviations of the mean. Recall: Range = High – Lo Estimate: Range ≈ 4s
50
Copyright © 2004 Pearson Education, Inc. Chebyshev’s Theorem For data with any distribution, the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/K 2, where K is any positive number greater than 1. For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean
51
Copyright © 2004 Pearson Education, Inc. The Empirical Rule Empirical (68-95-99.7) Rule For data sets having a symmetric distribution: About 68% of all values fall within 1 standard deviation of the mean About 95% of all values fall within 2 standard deviations of the mean About 99.7% of all values fall within 3 standard deviations of the mean
52
Copyright © 2004 Pearson Education, Inc. The Empirical Rule
53
Copyright © 2004 Pearson Education, Inc. The Empirical Rule
54
Copyright © 2004 Pearson Education, Inc. The Empirical Rule
55
Copyright © 2004 Pearson Education, Inc. Created by Tom Wegleitner, Centreville, Virginia Section 2-6 and 2-7 Measures of Position (Relative Standing)
56
Copyright © 2004 Pearson Education, Inc. Measures of Position Sometimes we want to know the “relative standing” or “relative position” of a particular data value in the set. Some measures of position: 1.Standard Scores (z-scores*) 2.Median, Quartiles, Percentiles
57
Copyright © 2004 Pearson Education, Inc. The z-score (or standard score) for a data value x is the number of standard deviations that x is above or below the mean. z-score
58
Copyright © 2004 Pearson Education, Inc. Sample: Population x - µ z = Round to 2 decimal places Computing z-scores z = x - x s To convert a data value x to a z-score:
59
Copyright © 2004 Pearson Education, Inc. Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between –2 and 2 sd Unusual Values:z score 2 sd FIGURE 2-14
60
Copyright © 2004 Pearson Education, Inc. Other Measures of Position Median Quartiles Percentiles Recall: The median separates ranked data into 2 equal parts.
61
Copyright © 2004 Pearson Education, Inc. Quartiles Quartiles separate ranked data into 4 equal parts: Q 1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. Q 2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. Q 1 (Third Quartile) separates the bottom 75% of sorted values from the top 25%.
62
Copyright © 2004 Pearson Education, Inc. Q 1, Q 2, Q 3 divides ranked scores into four equal parts Quartiles 25% Q3Q3 Q2Q2 Q1Q1 Low(High) (median)
63
Copyright © 2004 Pearson Education, Inc. Percentiles Just as there are quartiles separating data into four parts, there are 99 percentiles denoted P 1, P 2,... P 99, which partition the data into 100 groups.
64
Copyright © 2004 Pearson Education, Inc. Tukey’s 5-number Summary Tukey’s 5-number summary: Low Q 1 Median Q 3 High These 5 numbers can also give another representation of “center and spread.”
65
Copyright © 2004 Pearson Education, Inc. Boxplots Figure 2-16 A Boxplot (or Box & Whisker plot) is a graphical representation of Tukey’s 5-number summary. example:
66
Copyright © 2004 Pearson Education, Inc. Figure 2-17 Boxplots
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.