Download presentation
Presentation is loading. Please wait.
Published byΤερέντιος Βασιλόπουλος Modified over 6 years ago
1
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
Week 2 Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
2
The spread of a distribution
Range: It is a simple measure of spread. It is calculated as Maximum value – Minimum value Consider the calories example (ordered data below) Range = =220 Note: range is affected by extremely low or extremely high values in the data.
3
The Spread of Distribution
mean Think about distances between observations to the mean We want to find the distance for each observation from the sample mean. Example: Data: 2, 3, 7 (mean = 4) Some distances are below the mean (negative) Some distances are above the mean (positive) 2 – 4 = -2 3 – 4 = -1 7 – 4 = +3 Add these distances: (-2) + (-1) + (+3) = 0 {does not help us find sample variation!)
4
The Spread of Distribution
mean Think about distances between observations to the mean Population Variance: Average of squared deviations of observations from the population (true) mean. Sample Variance: Average of squared deviation of observations from the sample mean. What would be the variance for the following data sets? Data set #1: 7, 7, 7 Date set #2: 2, 3, 7
5
The Spread of Distribution
Borrowing from Pythagorean theorem Sum of squared distances in the data (2, 3, 7): (2 – 4) 𝟐 = 4 (3 – 4) 𝟐 = 1 (7 – 4) 𝟐 = 9 Add these squared distances: = 14 Take an “adjusted” average; This means we do n-1 = 3 – 1 = 2 (because we estimated the mean, we subtract 1 from n) In the data example above, Sample Variance is: 𝟏𝟒 𝟐 = 7 In general, sample variance is: 𝒔𝒖𝒎 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒅𝒊𝒔𝒕𝒂𝒏𝒄𝒆𝒔 𝒏 −𝟏 We denote sample variance by 𝑺 𝟐 𝑺 𝟐 = 𝒊=𝟏 𝒏 ( 𝒚 𝒊 − 𝒚 ) 𝟐 𝒏 −𝟏
6
The Spread of Distribution
However, sample variance is a “squared idea”. Recall: sample variance is: 𝒔𝒖𝒎 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒅𝒊𝒔𝒕𝒂𝒏𝒄𝒆𝒔 𝒏 −𝟏 We take the square root of it, in order to get “typical spread” We call this typical spread of data, sample standard deviation. We denote sample standard deviation by S S = 𝑺 𝟐 For example in our data, S = 𝟕 = 2.65 S measures the spread about the mean 𝒚 S can be 0 or positive; S is 0 when 𝑺 𝟐 is 0 e.g., quiz data for 3 students: 10, 10, 10; 𝒚 = 10; 𝑺 𝟐 = 0; S = 0 Note: variance and standard deviation are affected by (not resistant to) outliers.
7
The Spread of Distribution
In our calories example: 𝑺 𝟐 = (𝟏𝟖𝟎−𝟐𝟓𝟏) 𝟐 + (𝟏𝟗𝟎−𝟐𝟓𝟏) 𝟐 +… (𝟒𝟎𝟎−𝟐𝟓𝟏) 𝟐 𝟑𝟎−𝟏 𝑺 𝟐 = 𝟐𝟓𝟎𝟓.𝟖𝟔𝟐𝟏 S = + 𝟐𝟓𝟎𝟓.𝟖𝟔𝟐𝟏 =𝟓𝟎.𝟎𝟔 StatCrunch: stat>summary stats>Select Column(s)>Calories
8
The Spread of Distribution
Inter Quartile Range (IQR): We can divide the data into quartiles. Recall that the median is the 50th percentile (the second quartile: Q2). Median has half the data below and half the data above it. We define the median of the first half as the first quartile (Q1, 25th percentile). It has, 1/4 (25%) of the observations below it and 3/4 (75%) of the observations above it. We define the median of the second half as the third quartile (Q3, 75th percentile). It has, 3/4 (75%) of the observations below it and 1/4 (25%) of the observations above it. IQR is: 75th percentile (Q3 ) – 25TH percentile (Q1) In our calories example: Q1 is the 8th position (in the first half of the data set; among the first 15 observations): 210 calories. Q3 is the 8th position (in the second half of the data set; among the second 15 observation): 280 calories. So, we can calculate the IQR = 280 – 210 = 70 Note: IQR is NOT affected by (resistant to) outliers.
9
The Spread of Distribution
The 5-numbers Summary (need for Boxplot - later): Minimum Lower Quartile (Q1) Median (Q2) Upper Quartile (Q3) In our calories example: min=180, Q1=210, median=250, Q3=280, max=400 StatCrunch: stat>summary stats>Select Column(s)>Calories
10
Boxplot of Calories of Donuts
Recall the 5-summary from our Tim Horton’s example: Calories of 30 donuts. min=180, Q1=210, median=250, Q3=280, max=400 StatCrunch: Graph>boxplot>Select Column(s)>Calories Select: use fences to identify outliers
11
Boxplot of Calories of Donuts
You can go back and edit the image in StatCrunch and select the option: draw boxes horizontally. Shape: slightly right skewed Centre: the median is almost in the middle of the box. Spread (IQR): 50% of the donuts have calories are from 210 (Q1) to 280 (Q3). Extreme value(s): One point is plotted Individually at 400 (one donut with 400 calories – we will check later if this 400 value is a potential or definite outlier).
12
Boxplot of Calories of Donuts
Locate the min and the max on a horizontal line Locate the median, draw a vertical line above the median (away from the horizontal line). Locate Q1, and Q3, and draw vertical lines above these values. Draw a rectangle using the vertical lines from Q1, Q2, and Q3. Calculate IQR = Q3-Q1 = = 70 Calculate Rule (R) = 1.5 IQR = 1.5 x 70 = 105 Calculate inner fences: Lower inner fence = Q1 – 1.5(IQR) = 210-(1.5(70)) = 105 Upper inner fence = Q (IQR) = 280+(1.5(70)) = 385 Draw lines (whiskers) connecting the box to the most extreme value within fences: From Q1 draw a line to a value ≥105 in the data (in our example it will be 180) From Q3 draw a line to a value ≤ 385 in the data (in our example it will be 340) Plot values outside fences individually. These points are suspected outliers. They are also known as extreme values. In our example, 400 is plotted individually.
13
Summary Statistics for Calories of Donuts
StatCrunch: stat>summary stats>Select Column(s)>Calories
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.