Download presentation
Presentation is loading. Please wait.
1
Statistics for Business and Economics
Chapter 2 Methods for Describing Sets of Data
2
Learning Objectives Describe Qualitative Data Graphically
Describe Quantitative Data Graphically Explain Numerical Data Properties Describe Summary Measures Analyze Numerical Data Using Summary Measures
3
Thinking Challenge X Y Us
36% Our market share far exceeds all competitors! - VP 34% Problem - no zero point. Maybe, a pie chart would be better. 32% 30% X Y Us
4
Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram
5
Presenting Qualitative Data
6
Frequency Distribution
Data Presentation Pie Chart Pareto Diagram Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph
7
Summary Table Lists categories & number of elements in category
Obtained by tallying responses in category May show frequencies (counts), % or both Row Is Category Major Count Accounting 130 Economics 20 Management 50 Total 200 Tally: |||| |||| |||| ||||
8
Frequency Distribution
Data Presentation Pie Chart Summary Table Data Presentation Qualitative Data Quantitative Data Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pareto Diagram
9
Bar Graph Equal Bar Widths Bar Height Shows Frequency or %
Percent Used Also Frequency Horizontal bars are used for categorical variables. Vertical bars are used for numerical variables. Still, some variation exists on this point in the literature. Also, there are many variations on the bar (e.g., stacked bar) Vertical Bars for Qualitative Variables Zero Point
10
Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram
11
Pie Chart Shows breakdown of total quantity into categories Majors
Useful for showing relative differences Angle size (360°)(percent) Majors Mgmt. Econ. 25% 10% 36° Acct. 65% (360°) (10%) = 36°
12
Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram
13
Pareto Diagram Like a bar graph, but with the categories arranged by height in descending order from left to right. Equal Bar Widths Bar Height Shows Frequency or % Percent Used Also Frequency Vertical Bars for Qualitative Variables Zero Point
14
Thinking Challenge You’re an analyst for IRI. You want to show the market shares held by Web browsers in Construct a bar graph, pie chart, & Pareto diagram to describe the data. Allow students minutes to complete this before revealing answers. Browser Mkt. Share (%) Firefox 14 Internet Explorer 81 Safari 4 Others 1
15
Bar Graph Solution* Market Share (%) Browser
16
Pie Chart Solution* Market Share
17
Pareto Diagram Solution*
Market Share (%) Browser
18
Presenting Quantitative Data
19
Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram
20
Stem-and-Leaf Display
1. Divide each observation into stem value and leaf value Stem value defines class Leaf value defines frequency (count) 2 144677 26 3 028 4 1 2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
21
Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram
22
Frequency Distribution Table Steps
Determine range Select number of classes Usually between 5 & 15 inclusive Compute class intervals (width) Determine class boundaries (limits) Compute class midpoints Count observations & assign to classes
23
Frequency Distribution Table Example
Raw Data: 24, 26, 24, 21, , 41, 32, 38 Class Midpoint Frequency 15.5 – 25.5 20.5 3 Width 25.5 – 35.5 30.5 5 35.5 – 45.5 40.5 2 (Lower + Upper Boundaries) / 2 Boundaries
24
Relative Frequency & % Distribution Tables
Percentage Distribution The number of classes is usually between 5 and 15. Only 3 are used here for illustration purposes. Class Prop. Class % 15.5 – 25.5 .3 15.5 – 25.5 30.0 25.5 – 35.5 .5 25.5 – 35.5 50.0 35.5 – 45.5 .2 35.5 – 45.5 20.0
25
Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram
26
Histogram Count 5 4 Frequency Relative Frequency 3 Percent Bars Touch
Class Freq. Count 15.5 – 25.5 3 5 25.5 – 35.5 5 35.5 – 45.5 2 4 Frequency Relative Frequency Percent 3 Bars Touch 2 1 Lower Boundary
27
Numerical Data Properties
28
Thinking Challenge $400,000 $70,000 $50,000 $30,000 $20,000
11 total employees; total salaries are $770,000. The mode is $20,000 (Union argument). The median is $30,000. The mean is $70,000 (President’s argument). Different measures are used! $50,000 ... employees cite low pay -- most workers earn only $20,000. ... President claims average pay is $70,000! $30,000 $20,000
29
Standard Notation Measure Sample Population Mean X
Throughout this chapter, we will be using the following notation, which I will introduce now. Standard Deviation S S 2 Variance 2 Size n N
30
Numerical Data Properties
Central Tendency (Location) Location (Position) Concerned with where values are concentrated. Variation (Dispersion) Concerned with the extent to which values vary. Shape Concerned with extent to which values are symmetrically distributed. Variation (Dispersion) Shape
31
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
32
Central Tendency
33
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
34
Mean Measure of central tendency Most common measure
Acts as ‘balance point’ Affected by extreme values (‘outliers’) Formula (sample mean) X n i 1 2 …
35
Mean Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 X X X X X X
i X X X X X X 1 2 3 4 5 6 X i 1 n 6 10 . 3 4 . 9 8 . 9 11 . 7 6 . 3 7 . 7 6 8 . 30
36
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
37
Median Measure of central tendency Middle value in ordered sequence
If n is odd, middle value of sequence If n is even, average of 2 middle values Position of median in sequence Not affected by extreme values Positioning Point n 1 2
38
Median Example Odd-Sized Sample
Raw Data: Ordered: Position: n 1 5 1 Positioning Point 3 . 2 2 Median 22 . 6
39
Median Example Even-Sized Sample
Raw Data: Ordered: Position: n 1 6 1 Positioning Point 3 . 5 2 2 7 . 7 8 . 9 Median 8 . 30 2
40
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
41
Mode Measure of central tendency Value that occurs most often
Not affected by extreme values May be no mode or several modes May be used for quantitative or qualitative data
42
Mode Example No Mode Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode Raw Data: More Than 1 Mode Raw Data:
43
Thinking Challenge You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. Describe the stock prices in terms of central tendency. This is the data from problem 3.54 in BL5ed. Give the class minutes to compute before showing the answer.
44
Central Tendency Solution*
Mean n X i X X … X 1 2 8 X i 1 n 8 17 16 21 18 13 16 12 11 8 15 . 5
45
Central Tendency Solution*
Median Raw Data: Ordered: Position: Median = 6.5 Position = (n+1)/2 = (10+1)/2 = 5.5 (n = 10) (6+7)/2 = 6.5 n 1 8 1 Positioning Point 4 . 5 2 2 16 16 Median 16 2
46
Central Tendency Solution*
Mode Raw Data: Mode = 16 Mode = 8 Midrange = 6 (Xsmallest + Xlargest)/2 = (1+11)/2 = 6
47
Summary of Central Tendency Measures
Formula Description Mean X / n Balance Point i Median ( n +1) Middle Value Position 2 When Ordered Mode none Most Frequent
48
Shape
49
Shape Describes how data are distributed Measures of Shape
Skew = Symmetry Shape Concerned with extent to which values are symmetrically distributed. Kurtosis The extent to which a distribution is peaked (flatter or taller). For example, a distribution could be more peaked than a normal distribution (still may be ‘bell-shaped). If values are negative, then distribution is less peaked than a normal distribution. Skew The extent to which a distribution is symmetric or has a tail. Values are 0 if normal distribution. If the values are negative, then negative or left-skewed. Left-Skewed Symmetric Right-Skewed Mean Median Mean = Median Median Mean
50
Variation
51
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Range Mean Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
52
Range Measure of dispersion
Difference between largest & smallest observations Range = Xlargest – Xsmallest Ignores how data are distributed 7 8 9 10 7 8 9 10 Range = 10 – 7 = 3 Range = 10 – 7 = 3
53
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
54
Variance & Standard Deviation
Measures of dispersion Most common measures Consider how data are distributed 4. Show variation about mean (X or μ) X = 8.3 4 6 8 10 12
55
Sample Variance Formula
X n i 2 1 ( ) X n 1 2 ( ) … = n - 1 in denominator! (Use N if Population Variance)
56
Sample Standard Deviation Formula
2 S S n ( ) 2 X X i i 1 n 1 ( ) ( ) ( ) 2 2 2 X X X X … X X 1 2 n n 1
57
( ) ( ) ( ) ( ) Variance Example
Raw Data: n ( ) n 2 X X X i i 2 S i 1 where X i 1 8 . 3 n 1 n ( ) ( ) ( ) 2 2 2 10 . 3 8 . 3 4 . 9 8 . 3 … 7 . 7 8 . 3 2 S 6 1 6 . 368
58
Thinking Challenge You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. What are the variance and standard deviation of the stock prices? This is the data from problem 3.54 in BL5ed. Give the class minutes to compute before showing the answer.
59
( ) ( ) ( ) ( ) Variation Solution* Sample Variance
Raw Data: n ( ) n 2 Using exact values: Midhinge = (Q1 + Q3)/2 = ( )/2 = 11/2 = 5.5 X X X i i 2 S i 1 where X i 1 15 . 5 n 1 n ( ) ( ) ( ) 2 2 2 17 15 . 5 16 15 . 5 … 11 15 . 5 2 S 8 1 11 . 14
60
( ) Variation Solution* Sample Standard Deviation X X S S 11
2 Using exact values: Midhinge = (Q1 + Q3)/2 = ( )/2 = 11/2 = 5.5 X X i 2 S S i 1 11 . 14 3 . 34 n 1
61
Summary of Variation Measures
Formula Description Range X – X Total Spread largest smallest Standard Deviation X n i 2 1 Dispersion about (Sample) Sample Mean Standard Deviation X N i 2 Dispersion about (Population) Population Mean Variance ( X X ) 2 Squared Dispersion i (Sample) n – 1 about Sample Mean
62
Interpreting Standard Deviation
63
Interpreting Standard Deviation: Chebyshev’s Theorem
Applies to any shape data set No useful information about the fraction of data in the interval x – s to x + s At least 3/4 of the data lies in the interval x – 2s to x + 2s At least 8/9 of the data lies in the interval x – 3s to x + 3s In general, for k > 1, at least 1 – 1/k2 of the data lies in the interval x – ks to x + ks
64
Interpreting Standard Deviation: Chebyshev’s Theorem
No useful information At least 3/4 of the data At least 8/9 of the data
65
Chebyshev’s Theorem Example
Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues.
66
Chebyshev’s Theorem Example
At least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean. x = s = 3.34 (x – 2s, x + 2s) = (15.5 – 2∙3.34, ∙3.34) = (8.82, 22.18)
67
Interpreting Standard Deviation: Empirical Rule
Applies to data sets that are mound shaped and symmetric Approximately 68% of the measurements lie in the interval μ – σ to μ + σ Approximately 95% of the measurements lie in the interval μ – 2σ to μ + 2σ Approximately 99.7% of the measurements lie in the interval μ – 3σ to μ + 3σ
68
Interpreting Standard Deviation: Empirical Rule
μ – 3σ μ – 2σ μ – σ μ μ + σ μ +2σ μ + 3σ Approximately 68% of the measurements Approximately 95% of the measurements Approximately 99.7% of the measurements
69
Empirical Rule Example
Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s.
70
Empirical Rule Example
According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s), (15.5 – 3.34, ) = (12.16, 18.84) Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, ∙3.34) = (8.82, 22.18) Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, ∙3.34) = (5.48, 25.52)
71
Numerical Measures of Relative Standing
72
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Range Mean Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
73
Numerical Measures of Relative Standing: Percentiles
Describes the relative location of a measurement compared to the rest of the data The pth percentile is a number such that p% of the data falls below it and (100 – p)% falls above it Median = 50th percentile
74
Percentile Example You scored 560 on the GMAT exam. This score puts you in the 58th percentile. What percentage of test takers scored lower than you did? What percentage of test takers scored higher than you did?
75
Percentile Example What percentage of test takers scored lower than you did? 58% of test takers scored lower than 560. What percentage of test takers scored higher than you did? (100 – 58)% = 42% of test takers scored higher than 560.
76
Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Range Mean Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation
77
Numerical Measures of Relative Standing: Z–Scores
Describes the relative location of a measurement compared to the rest of the data Sample z–score x – x s z = Population z–score x – μ σ Measures the number of standard deviations away from the mean a data value is located
78
Z–Score Example The mean time to assemble a product is 22.5 minutes with a standard deviation of 2.5 minutes. Find the z–score for an item that took 20 minutes to assemble. Find the z–score for an item that took 27.5 minutes to assemble.
79
Z–Score Example x = 20, μ = 22.5 σ = 2.5 x – μ 20 – 22.5 z = = = –1.0
= 2.0
80
Quartiles & Box Plots
81
( ) Quartiles Measure of noncentral tendency
2. Split ordered data into 4 quarters 25% Q1 Q2 Q3 Positioning Point of Q i n 1 4 ( ) 3. Position of i-th quartile
82
( ) ( ) Quartile (Q1) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: Position: ( ) ( ) 1 n 1 1 6 1 Q Position 1 . 75 2 1 4 4 Q 6 . 3 1
83
( ) ( ) Quartile (Q2) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: Position: ( ) ( ) 2 n 1 2 6 1 Q Position 3 . 5 2 4 4 7 . 7 8 . 9 Q 8 . 3 2 2
84
( ) ( ) Quartile (Q3) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: Position: ( ) ( ) 3 n 1 3 6 1 Q Position 5 . 25 5 3 4 4 Q 10 . 3 3
85
Numerical Data Properties & Measures
Central Variation Shape Tendency Mean Range Skew Median Interquartile Range Mode Variance Standard Deviation
86
Interquartile Range Measure of dispersion Also called midspread
Difference between third & first quartiles Interquartile Range = Q3 – Q1 4. Spread in middle 50% 5. Not affected by extreme values
87
Thinking Challenge You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. What are the quartiles, Q1 and Q3, and the interquartile range? This is the data from problem 3.54 in BL5ed. Give the class minutes to compute before showing the answer.
88
( ) ( ) Quartile Solution* Q1 Raw Data: 17 16 21 18 13 16 12 11
Ordered: Position: Q1 = 1(n+1)/4 = 1(10+1)/4 = 11/4 = 2.75 Position If exact values: 75% of way Between 2 & 3; Value is 2.75 ( ) ( ) 1 n 1 1 8 1 Q Position 2 . 5 1 4 4 Q 12 . 5 1
89
( ) ( ) Quartile Solution* Q3 Raw Data: 17 16 21 18 13 16 12 11
Ordered: Position: Q3 = 3(n+1)/4 = 3(10+1)/4 = 33/4 = 8.25 Position If exact values: 25% of way Between 8 & 9; Value is 8.25 ( ) ( ) 3 n 1 3 8 1 Q Position 6 . 75 7 3 4 4 Q 18 3
90
Interquartile Range Solution*
Raw Data: Ordered: Position: Using exact values: Midhinge = (Q1 + Q3)/2 = ( )/2 = 11/2 = 5.5 Interquartile Range Q Q 18 . 12 . 5 5 . 5 3 1
91
Box Plot 1. Graphical display of data using 5-number summary X Q Median Q X smallest 1 3 largest 4 6 8 10 12
92
Shape & Box Plot Left-Skewed Symmetric Right-Skewed Q Median Q Q
1 3 1 3 1 3
93
Graphing Bivariate Relationships
94
Graphing Bivariate Relationships
Describes a relationship between two quantitative variables Plot the data in a Scattergram Positive relationship Negative relationship No relationship x y
95
Scattergram Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ (x) Sales (Units) (y) Draw a scattergram of the data
96
Scattergram Example Sales 4 3 2 1 1 2 3 4 5 Advertising
97
Time Series Plot
98
Time Series Plot Used to graphically display data produced over time
Shows trends and changes in the data over time Time recorded on the horizontal axis Measurements recorded on the vertical axis Points connected by straight lines
99
Time Series Plot Example
The following data shows the average retail price of regular gasoline in New York City for 8 weeks in 2006. Draw a time series plot for this data. Date Average Price Oct 16, 2006 $2.219 Oct 23, 2006 $2.173 Oct 30, 2006 $2.177 Nov 6, 2006 $2.158 Nov 13, 2006 $2.185 Nov 20, 2006 $2.208 Nov 27, 2006 $2.236 Dec 4, 2006 $2.298
100
Time Series Plot Example
Price Date
101
Distorting the Truth with Descriptive Techniques
102
Errors in Presenting Data
Using ‘chart junk’ No relative basis in comparing data batches Compressing the vertical axis No zero point on the vertical axis
103
‘Chart Junk’ Bad Presentation Good Presentation $ Minimum Wage
1960: $1.00 4 1970: $1.60 2 1980: $3.10 1990: $3.80 1960 1970 1980 1990
104
No Relative Basis Bad Presentation Good Presentation Freq. %
A’s by Class A’s by Class Freq. % 300 30% 200 20% 100 10% 0% FR SO JR SR FR SO JR SR
105
Compressing Vertical Axis
Bad Presentation Good Presentation Quarterly Sales Quarterly Sales $ $ 200 50 100 25 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
106
No Zero Point on Vertical Axis
Bad Presentation Good Presentation Monthly Sales Monthly Sales $ $ 45 60 42 40 39 20 36 J M M J S N J M M J S N
107
Conclusion Described Qualitative Data Graphically
Described Numerical Data Graphically Explained Numerical Data Properties Described Summary Measures Analyzed Numerical Data Using Summary Measures
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.