Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics:

Similar presentations


Presentation on theme: "Descriptive Statistics:"— Presentation transcript:

1 Descriptive Statistics:
Part II Each slide has its own narration in an audio file. For the explanation of any slide click on the audio icon to start it. Professor Friedman's Statistics Course by H & L Friedman is licensed under a  Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. 

2 Shape A third important property of data – after location and dispersion - is its shape. Shape can be described by degree of asymmetry (i.e., skewness). mean > median positive or right-skewness mean = median symmetric or zero-skewness mean < median negative or left-skewness Positive skewness can arise when the mean is increased by some unusually high values. Negative skewness can arise when the mean is decreased by some unusually low values. Descriptive Statistics II

3 Skewness Left skewed: Right skewed: Symmetric:
Source: Levine et al., Business Statistics, Pearson, 2013. Descriptive Statistics II

4 Example: # hours to complete a task
This guy took a VERY long time! Data (for n=12 employees): ┋ ┋ ┋ 𝑋 = 180/12 = 15 hours Median = 10 hours The (extremely slow) employee who took 63 hours to complete the task skewed the entire distributon to the right. s2 = 2868 / 11 = s = hours CV = 107.7% Descriptive Statistics II

5 Example Using MS Excel Scores of 17 students on a national calculus exam. Data: 0, 0, 10, 12, 15, 18, 20, 25, 30, 33, 34, 41, 56, 87, 92, 94, 95 Open MS Excel. Go to Data Analysis—Analysis Tools — Descriptive Statistics. If you do not have Data Analysis-Analysis Tools, you have to use the Add-in feature and add it to MS Excel. Make sure to check the Summary Statistics box once you are in descriptive statistics. See MS Excel Output on next slide. Descriptive Statistics II

6 Using MS Excel From the output: mean is 38.94 median is 30 mode is 0
MS Excel uses a formula – the Pearson Coefficient of Skewness – to calculate skewness. You do not have to know the formula. If the coefficient is 0 or very close to it, you have a symmetric distribution. From the output: mean is 38.94 median is 30 mode is 0 standard deviation is 33.44 variance is skewness is .78 (positive) range is 95 n is 17 Descriptive Statistics II

7 Standardizing Data: Z-Scores
We can convert the original scores to new scores with 𝑋 = 0 and s = 1. This will give us a pure number with no units of measurement. Any score below the mean will now be negative. Any score at the mean will be 0. Any score above the mean will be positive. Descriptive Statistics II

8 Standardizing Data: Z-Scores
To compute the Z-scores: 𝑍= 𝑋− 𝑋 𝑠 Example. Data: 0, 2, 4, 6, 8, 10 𝑋 = 30/6 = 5; s = 3.74 X Z 0−5 3.74 -1.34 2 2−5 3.74 -.80 4 4−5 3.74 -.27 6 6−5 3.74 .27 8 8−5 3.74 .80 10 10−5 3.74 1.34 Descriptive Statistics II

9 Standardizing Data: Z-Scores
Data: Exam Scores Original data Change 7 to 97 Change 23 to 93 X Z 65 -0.45 -0.81 -1.40 73 -0.11 -0.38 -0.79 78 0.10 -0.10 -0.40 69 -0.28 -0.60 -1.09 7 -2.89 <= 97 0.94 1.07 23 -2.21 -3.12 93 0.76 98 0.99 1.14 99 1.05 1.22 0.90 75 -0.02 -0.27 -0.63 79 0.14 -0.05 -0.32 85 0.40 0.28 63 -0.53 -0.92 -1.56 67 -0.36 -0.70 -1.25 72 -0.15 -0.43 -0.86 0.73 0.72 95 0.82 0.83 0.91 Mean 75.57 79.86 83.19 s 23.75 18.24 s. 12.96 Descriptive Statistics II

10 Standardizing Data: Z-Scores
No matter what you are measuring, a Z-score of more than +5 or less than – 5 would indicate a very, very unusual score. For standardized data, if it is normally distributed, 95% of the data will be between ±2 standard deviations about the mean. If the data follows a normal distribution, 95% of the data will be between and 99.7% of the data will fall between -3 and +3. 99.99% of the data will fall between -4 and +4. Worst case scenario: 75% of the data are between 2 standard deviations about the mean. [Chebychev.] Descriptive Statistics II

11 Smallest| Q1 | Median | Q3 | Largest
Five Number Summary When examining a distribution for shape, sometime the five number summary is useful: Smallest| Q1 | Median | Q3 | Largest Example: 𝑋 = 15 5-number summary: 2 | 8 | 10 | | 63 This data is right-skewed. In right-skewed distributions, the distance from Q3 to Xlargest (16.5 to 63) is significantly greater than the distance from Xsmallest to Q1(2 to 8). 2 3 8 9 10 12 15 18 22 63 Median Q1 Smallest Q3 Largest Descriptive Statistics II

12 Boxplot The boxplot is a way to graphically portray a distribution of data by means of its five-number summary. Boxplot can be drawn along the horizontal or vertically. Vertical line drawn within the box is the median Vertical line at the left side of box is Q1 Vertical line at the right side of box is Q3 Line on left connects left side of box with Xsmallest (lower 25% of data) Line on right connects right side of box with Xlargest (upper 25% of data) Descriptive Statistics II

13 Boxplot A “bell-shaped” symmetric data distribution would look like this: Descriptive Statistics II

14 Categorical Data We summarize categorical data using frequencies and graphical methods. Descriptive Statistics II

15 Working with Frequencies
A frequency distribution records data grouped into classes and the number of observations that fell into each class. A frequency distribution can be used for: categorical data numerical data that can be grouped into intervals numerical data with repeated observations A percentage distribution records the percent of the observations that fell into each class. Descriptive Statistics II

16 Working with Frequencies
Example. A sample was taken of 200 professors at a (fictitious) local college. Each was asked for his or her (take-home) weekly salary. The responses ranged from about$520 to $590. If we wanted to display the data in, say, 7 equal intervals, we would use an interval width of $10. Width of interval = 𝑅𝑎𝑛𝑔𝑒 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = $70 7 = $10/class. The Frequency / Percentage Distribution: . Take-home pay frequency percentage 520 and under 530 6 3 % 530 " " 540 30 15 540 " " 550 38 19 550 " " 560 52 26 560 " " 570 42 21 570 " " 580 24 12 to 590 8 4 200 100 Descriptive Statistics II

17 Working with Frequencies
A Cumulative Distribution focuses on the number or percentage of cases that lie below or above specified values rather than within intervals. Take-home pay frequency percentage less than 520 " " 530 6 3 540 36 18 550 74 37 560 126 63 570 168 84 580 192 96 590 200 100 Descriptive Statistics II

18 Working with Frequencies
The Frequency Histogram: Descriptive Statistics II

19 Working with Frequencies
The Frequency Polygon Descriptive Statistics II

20 Working with Frequencies
The Cumulative Frequency Distribution Descriptive Statistics II

21 Descriptive Statistics – 2 variables
Categorical Data – graphical representation Contingency Table Side-by-Side Bar Chart Numerical Data – looking for relationships in bivariate data Scatter Plot Correlation The Regression Line Descriptive Statistics II

22 The Contingency Table Two categorical variables are most easily displayed in a contingency table. This is a table of two-way frequencies. Example: “Who would you vote for in the next election?” This also works for two-way percentages: . Male Female Republican Candidate 250 500 Democrat Candidate 150 350 400 600 1000 Descriptive Statistics II

23 The Side-by-Side Bar Chart
Chart: Relative Performance (Source: Microsoft.com) Descriptive Statistics II

24 The Scatter Plot What can we do with 2 numerical variables? We can graph them. Example – Grade and Height (in inches) Y (Grade) 100 95 90 80 70 65 60 40 30 20 X (Height) 73 79 62 69 74 77 81 63 68 Descriptive Statistics II

25 The Scatter Plot Correlation coefficient is r = .12 Coefficient of determination is r2 = .01 We will learn about the above measures, as well as more about scatter plots, in the topic onCORRELATION. Descriptive Statistics II

26 Homework Practice, practice, practice.
As always, do lots and lots of problems. You can find these in the online lecture notes and homework assignments. Descriptive Statistics II


Download ppt "Descriptive Statistics:"

Similar presentations


Ads by Google