Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Business and Economics

Similar presentations


Presentation on theme: "Statistics for Business and Economics"— Presentation transcript:

1 Statistics for Business and Economics
Chapter 2 Methods for Describing Sets of Data

2 Learning Objectives Describe Qualitative Data Graphically
Describe Quantitative Data Graphically Explain Numerical Data Properties Describe Summary Measures Analyze Numerical Data Using Summary Measures

3 Thinking Challenge X Y Us
36% Our market share far exceeds all competitors! - VP 34% Problem - no zero point. Maybe, a pie chart would be better. 32% 30% X Y Us

4 Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram

5 Presenting Qualitative Data

6 Frequency Distribution
Data Presentation Pie Chart Pareto Diagram Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph

7 Summary Table Lists categories & number of elements in category
Obtained by tallying responses in category May show frequencies (counts), % or both Row Is Category Major Count Accounting 130 Economics 20 Management 50 Total 200 Tally: |||| |||| |||| ||||

8 Frequency Distribution
Data Presentation Pie Chart Summary Table Data Presentation Qualitative Data Quantitative Data Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pareto Diagram

9 Bar Graph Equal Bar Widths Bar Height Shows Frequency or %
Percent Used Also Frequency Horizontal bars are used for categorical variables. Vertical bars are used for numerical variables. Still, some variation exists on this point in the literature. Also, there are many variations on the bar (e.g., stacked bar) Vertical Bars for Qualitative Variables Zero Point

10 Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram

11 Pie Chart Shows breakdown of total quantity into categories Majors
Useful for showing relative differences Angle size (360°)(percent) Majors Mgmt. Econ. 25% 10% 36° Acct. 65% (360°) (10%) = 36°

12 Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram

13 Pareto Diagram Like a bar graph, but with the categories arranged by height in descending order from left to right. Equal Bar Widths Bar Height Shows Frequency or % Percent Used Also Frequency Vertical Bars for Qualitative Variables Zero Point

14 Thinking Challenge You’re an analyst for IRI. You want to show the market shares held by Web browsers in Construct a bar graph, pie chart, & Pareto diagram to describe the data. Allow students minutes to complete this before revealing answers. Browser Mkt. Share (%) Firefox 14 Internet Explorer 81 Safari 4 Others 1

15 Bar Graph Solution* Market Share (%) Browser

16 Pie Chart Solution* Market Share

17 Pareto Diagram Solution*
Market Share (%) Browser

18 Presenting Quantitative Data

19 Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram

20 Stem-and-Leaf Display
1. Divide each observation into stem value and leaf value Stem value defines class Leaf value defines frequency (count) 2 144677 26 3 028 4 1 2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41

21 Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram

22 Frequency Distribution Table Steps
Determine range Select number of classes Usually between 5 & 15 inclusive Compute class intervals (width) Determine class boundaries (limits) Compute class midpoints Count observations & assign to classes

23 Frequency Distribution Table Example
Raw Data: 24, 26, 24, 21, , 41, 32, 38 Class Midpoint Frequency 15.5 – 25.5 20.5 3 Width 25.5 – 35.5 30.5 5 35.5 – 45.5 40.5 2 (Lower + Upper Boundaries) / 2 Boundaries

24 Relative Frequency & % Distribution Tables
Percentage Distribution The number of classes is usually between 5 and 15. Only 3 are used here for illustration purposes. Class Prop. Class % 15.5 – 25.5 .3 15.5 – 25.5 30.0 25.5 – 35.5 .5 25.5 – 35.5 50.0 35.5 – 45.5 .2 35.5 – 45.5 20.0

25 Frequency Distribution
Data Presentation Data Presentation Qualitative Data Quantitative Data Summary Table Stem-&-Leaf Display Frequency Distribution Histogram Bar Graph Pie Chart Pareto Diagram

26 Histogram Count 5 4 Frequency Relative Frequency 3 Percent Bars Touch
Class Freq. Count 15.5 – 25.5 3 5 25.5 – 35.5 5 35.5 – 45.5 2 4 Frequency Relative Frequency Percent 3 Bars Touch 2 1 Lower Boundary

27 Numerical Data Properties

28 Thinking Challenge $400,000 $70,000 $50,000 $30,000 $20,000
11 total employees; total salaries are $770,000. The mode is $20,000 (Union argument). The median is $30,000. The mean is $70,000 (President’s argument). Different measures are used! $50,000 ... employees cite low pay -- most workers earn only $20,000. ... President claims average pay is $70,000! $30,000 $20,000

29 Standard Notation Measure Sample Population Mean  X 
Throughout this chapter, we will be using the following notation, which I will introduce now. Standard Deviation S S 2 Variance 2 Size n N

30 Numerical Data Properties
Central Tendency (Location) Location (Position) Concerned with where values are concentrated. Variation (Dispersion) Concerned with the extent to which values vary. Shape Concerned with extent to which values are symmetrically distributed. Variation (Dispersion) Shape

31 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

32 Central Tendency

33 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

34  Mean Measure of central tendency Most common measure
Acts as ‘balance point’ Affected by extreme values (‘outliers’) Formula (sample mean) X n i 1 2

35  Mean Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 X X  X  X  X  X
i X X X X X X 1 2 3 4 5 6 X i 1 n 6 10 . 3 4 . 9 8 . 9 11 . 7 6 . 3 7 . 7 6 8 . 30

36 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

37 Median Measure of central tendency Middle value in ordered sequence
If n is odd, middle value of sequence If n is even, average of 2 middle values Position of median in sequence Not affected by extreme values Positioning Point n 1 2

38 Median Example Odd-Sized Sample
Raw Data: Ordered: Position: n 1 5 1 Positioning Point 3 . 2 2 Median 22 . 6

39 Median Example Even-Sized Sample
Raw Data: Ordered: Position: n 1 6 1 Positioning Point 3 . 5 2 2 7 . 7 8 . 9 Median 8 . 30 2

40 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

41 Mode Measure of central tendency Value that occurs most often
Not affected by extreme values May be no mode or several modes May be used for quantitative or qualitative data

42 Mode Example No Mode Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode Raw Data: More Than 1 Mode Raw Data:

43 Thinking Challenge You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. Describe the stock prices in terms of central tendency. This is the data from problem 3.54 in BL5ed. Give the class minutes to compute before showing the answer.

44 Central Tendency Solution*
Mean n X i X X X 1 2 8 X i 1 n 8 17 16 21 18 13 16 12 11 8 15 . 5

45 Central Tendency Solution*
Median Raw Data: Ordered: Position: Median = 6.5 Position = (n+1)/2 = (10+1)/2 = 5.5 (n = 10) (6+7)/2 = 6.5 n 1 8 1 Positioning Point 4 . 5 2 2 16 16 Median 16 2

46 Central Tendency Solution*
Mode Raw Data: Mode = 16 Mode = 8 Midrange = 6 (Xsmallest + Xlargest)/2 = (1+11)/2 = 6

47 Summary of Central Tendency Measures
Formula Description Mean  X / n Balance Point i Median ( n +1) Middle Value Position 2 When Ordered Mode none Most Frequent

48 Shape

49 Shape Describes how data are distributed Measures of Shape
Skew = Symmetry Shape Concerned with extent to which values are symmetrically distributed. Kurtosis The extent to which a distribution is peaked (flatter or taller). For example, a distribution could be more peaked than a normal distribution (still may be ‘bell-shaped). If values are negative, then distribution is less peaked than a normal distribution. Skew The extent to which a distribution is symmetric or has a tail. Values are 0 if normal distribution. If the values are negative, then negative or left-skewed. Left-Skewed Symmetric Right-Skewed Mean Median Mean = Median Median Mean

50 Variation

51 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Range Mean Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

52 Range Measure of dispersion
Difference between largest & smallest observations Range = Xlargest – Xsmallest Ignores how data are distributed 7 8 9 10 7 8 9 10 Range = 10 – 7 = 3 Range = 10 – 7 = 3

53 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Mean Range Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

54 Variance & Standard Deviation
Measures of dispersion Most common measures Consider how data are distributed 4. Show variation about mean (X or μ) X = 8.3 4 6 8 10 12

55 Sample Variance Formula
X n i 2 1 ( ) X n 1 2 ( ) = n - 1 in denominator! (Use N if Population Variance)

56 Sample Standard Deviation Formula
2 S S n ( ) 2 X X i i 1 n 1 ( ) ( ) ( ) 2 2 2 X X X X X X 1 2 n n 1

57 ( ) ( ) ( ) ( )   Variance Example
Raw Data: n ( ) n 2 X X X i i 2 S i 1 where X i 1 8 . 3 n 1 n ( ) ( ) ( ) 2 2 2 10 . 3 8 . 3 4 . 9 8 . 3 7 . 7 8 . 3 2 S 6 1 6 . 368

58 Thinking Challenge You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. What are the variance and standard deviation of the stock prices? This is the data from problem 3.54 in BL5ed. Give the class minutes to compute before showing the answer.

59 ( ) ( ) ( ) ( ) Variation Solution*   Sample Variance
Raw Data: n ( ) n 2 Using exact values: Midhinge = (Q1 + Q3)/2 = ( )/2 = 11/2 = 5.5 X X X i i 2 S i 1 where X i 1 15 . 5 n 1 n ( ) ( ) ( ) 2 2 2 17 15 . 5 16 15 . 5 11 15 . 5 2 S 8 1 11 . 14

60 ( ) Variation Solution*  Sample Standard Deviation X  X S  S   11
2 Using exact values: Midhinge = (Q1 + Q3)/2 = ( )/2 = 11/2 = 5.5 X X i 2 S S i 1 11 . 14 3 . 34 n 1

61 Summary of Variation Measures
Formula Description Range X X Total Spread largest smallest Standard Deviation X n i 2 1 Dispersion about (Sample) Sample Mean Standard Deviation X N i 2 Dispersion about (Population) Population Mean Variance ( X X ) 2 Squared Dispersion i (Sample) n – 1 about Sample Mean

62 Interpreting Standard Deviation

63 Interpreting Standard Deviation: Chebyshev’s Theorem
Applies to any shape data set No useful information about the fraction of data in the interval x – s to x + s At least 3/4 of the data lies in the interval x – 2s to x + 2s At least 8/9 of the data lies in the interval x – 3s to x + 3s In general, for k > 1, at least 1 – 1/k2 of the data lies in the interval x – ks to x + ks

64 Interpreting Standard Deviation: Chebyshev’s Theorem
No useful information At least 3/4 of the data At least 8/9 of the data

65 Chebyshev’s Theorem Example
Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is 3.34. Use this information to form an interval that will contain at least 75% of the closing stock prices of new stock issues.

66 Chebyshev’s Theorem Example
At least 75% of the closing stock prices of new stock issues will lie within 2 standard deviations of the mean. x = s = 3.34 (x – 2s, x + 2s) = (15.5 – 2∙3.34, ∙3.34) = (8.82, 22.18)

67 Interpreting Standard Deviation: Empirical Rule
Applies to data sets that are mound shaped and symmetric Approximately 68% of the measurements lie in the interval μ – σ to μ + σ Approximately 95% of the measurements lie in the interval μ – 2σ to μ + 2σ Approximately 99.7% of the measurements lie in the interval μ – 3σ to μ + 3σ

68 Interpreting Standard Deviation: Empirical Rule
μ – 3σ μ – 2σ μ – σ μ μ + σ μ +2σ μ + 3σ Approximately 68% of the measurements Approximately 95% of the measurements Approximately 99.7% of the measurements

69 Empirical Rule Example
Previously we found the mean closing stock price of new stock issues is 15.5 and the standard deviation is If we can assume the data is symmetric and mound shaped, calculate the percentage of the data that lie within the intervals x + s, x + 2s, x + 3s.

70 Empirical Rule Example
According to the Empirical Rule, approximately 68% of the data will lie in the interval (x – s, x + s), (15.5 – 3.34, ) = (12.16, 18.84) Approximately 95% of the data will lie in the interval (x – 2s, x + 2s), (15.5 – 2∙3.34, ∙3.34) = (8.82, 22.18) Approximately 99.7% of the data will lie in the interval (x – 3s, x + 3s), (15.5 – 3∙3.34, ∙3.34) = (5.48, 25.52)

71 Numerical Measures of Relative Standing

72 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Range Mean Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

73 Numerical Measures of Relative Standing: Percentiles
Describes the relative location of a measurement compared to the rest of the data The pth percentile is a number such that p% of the data falls below it and (100 – p)% falls above it Median = 50th percentile

74 Percentile Example You scored 560 on the GMAT exam. This score puts you in the 58th percentile. What percentage of test takers scored lower than you did? What percentage of test takers scored higher than you did?

75 Percentile Example What percentage of test takers scored lower than you did? 58% of test takers scored lower than 560. What percentage of test takers scored higher than you did? (100 – 58)% = 42% of test takers scored higher than 560.

76 Numerical Data Properties & Measures
Central Relative Standing Variation Tendency Range Mean Percentiles Median Interquartile Range Z–scores Mode Variance Standard Deviation

77 Numerical Measures of Relative Standing: Z–Scores
Describes the relative location of a measurement compared to the rest of the data Sample z–score x – x s z = Population z–score x – μ σ Measures the number of standard deviations away from the mean a data value is located

78 Z–Score Example The mean time to assemble a product is 22.5 minutes with a standard deviation of 2.5 minutes. Find the z–score for an item that took 20 minutes to assemble. Find the z–score for an item that took 27.5 minutes to assemble.

79 Z–Score Example x = 20, μ = 22.5 σ = 2.5 x – μ 20 – 22.5 z = = = –1.0
= 2.0

80 Quartiles & Box Plots

81 ( ) Quartiles Measure of noncentral tendency
2. Split ordered data into 4 quarters 25% Q1 Q2 Q3 Positioning Point of Q i n 1 4 ( ) 3. Position of i-th quartile

82 ( ) ( ) Quartile (Q1) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: Position: ( ) ( ) 1 n 1 1 6 1 Q Position 1 . 75 2 1 4 4 Q 6 . 3 1

83 ( ) ( ) Quartile (Q2) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: Position: ( ) ( ) 2 n 1 2 6 1 Q Position 3 . 5 2 4 4 7 . 7 8 . 9 Q 8 . 3 2 2

84 ( ) ( ) Quartile (Q3) Example Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: Position: ( ) ( ) 3 n 1 3 6 1 Q Position 5 . 25 5 3 4 4 Q 10 . 3 3

85 Numerical Data Properties & Measures
Central Variation Shape Tendency Mean Range Skew Median Interquartile Range Mode Variance Standard Deviation

86 Interquartile Range Measure of dispersion Also called midspread
Difference between third & first quartiles Interquartile Range = Q3 – Q1 4. Spread in middle 50% 5. Not affected by extreme values

87 Thinking Challenge You’re a financial analyst for Prudential-Bache Securities. You have collected the following closing stock prices of new stock issues: 17, 16, 21, 18, 13, 16, 12, 11. What are the quartiles, Q1 and Q3, and the interquartile range? This is the data from problem 3.54 in BL5ed. Give the class minutes to compute before showing the answer.

88 ( ) ( ) Quartile Solution* Q1 Raw Data: 17 16 21 18 13 16 12 11
Ordered: Position: Q1 = 1(n+1)/4 = 1(10+1)/4 = 11/4 = 2.75 Position If exact values: 75% of way Between 2 & 3; Value is 2.75 ( ) ( ) 1 n 1 1 8 1 Q Position 2 . 5 1 4 4 Q 12 . 5 1

89 ( ) ( ) Quartile Solution* Q3 Raw Data: 17 16 21 18 13 16 12 11
Ordered: Position: Q3 = 3(n+1)/4 = 3(10+1)/4 = 33/4 = 8.25 Position If exact values: 25% of way Between 8 & 9; Value is 8.25 ( ) ( ) 3 n 1 3 8 1 Q Position 6 . 75 7 3 4 4 Q 18 3

90 Interquartile Range Solution*
Raw Data: Ordered: Position: Using exact values: Midhinge = (Q1 + Q3)/2 = ( )/2 = 11/2 = 5.5 Interquartile Range Q Q 18 . 12 . 5 5 . 5 3 1

91 Box Plot 1. Graphical display of data using 5-number summary X Q Median Q X smallest 1 3 largest 4 6 8 10 12

92 Shape & Box Plot Left-Skewed Symmetric Right-Skewed Q Median Q Q
1 3 1 3 1 3

93 Graphing Bivariate Relationships

94 Graphing Bivariate Relationships
Describes a relationship between two quantitative variables Plot the data in a Scattergram Positive relationship Negative relationship No relationship x y

95 Scattergram Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $ (x) Sales (Units) (y) Draw a scattergram of the data

96 Scattergram Example Sales 4 3 2 1 1 2 3 4 5 Advertising

97 Time Series Plot

98 Time Series Plot Used to graphically display data produced over time
Shows trends and changes in the data over time Time recorded on the horizontal axis Measurements recorded on the vertical axis Points connected by straight lines

99 Time Series Plot Example
The following data shows the average retail price of regular gasoline in New York City for 8 weeks in 2006. Draw a time series plot for this data. Date Average Price Oct 16, 2006 $2.219 Oct 23, 2006 $2.173 Oct 30, 2006 $2.177 Nov 6, 2006 $2.158 Nov 13, 2006 $2.185 Nov 20, 2006 $2.208 Nov 27, 2006 $2.236 Dec 4, 2006 $2.298

100 Time Series Plot Example
Price Date

101 Distorting the Truth with Descriptive Techniques

102 Errors in Presenting Data
Using ‘chart junk’ No relative basis in comparing data batches Compressing the vertical axis No zero point on the vertical axis

103 ‘Chart Junk’ Bad Presentation Good Presentation $ Minimum Wage
1960: $1.00 4 1970: $1.60 2 1980: $3.10 1990: $3.80 1960 1970 1980 1990

104 No Relative Basis Bad Presentation Good Presentation Freq. %
A’s by Class A’s by Class Freq. % 300 30% 200 20% 100 10% 0% FR SO JR SR FR SO JR SR

105 Compressing Vertical Axis
Bad Presentation Good Presentation Quarterly Sales Quarterly Sales $ $ 200 50 100 25 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

106 No Zero Point on Vertical Axis
Bad Presentation Good Presentation Monthly Sales Monthly Sales $ $ 45 60 42 40 39 20 36 J M M J S N J M M J S N

107 Conclusion Described Qualitative Data Graphically
Described Numerical Data Graphically Explained Numerical Data Properties Described Summary Measures Analyzed Numerical Data Using Summary Measures


Download ppt "Statistics for Business and Economics"

Similar presentations


Ads by Google