Data Analysis and Statistical Software I Quarter: Spring 2003

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

1.2: Describing Distributions
CHAPTER 3: The Normal Distributions Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Chapter 2 Describing distributions with numbers. Chapter Outline 1. Measuring center: the mean 2. Measuring center: the median 3. Comparing the mean and.
Objectives 1.2 Describing distributions with numbers
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 3: The Normal Distributions ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
Chapter 2 The Normal Distributions. Section 2.1 Density curves and the normal distributions.
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers
Describing Location in a Distribution
CHAPTER 3: The Normal Distributions
Density Curves and Normal Distribution
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
2.1 Density Curve and the Normal Distributions
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Data Analysis and Statistical Software I Quarter: Winter 02/03
the Normal Distribution
Daniela Stan Raicu School of CTI, DePaul University
Basic Practice of Statistics - 3rd Edition The Normal Distributions
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
CHAPTER 1 Exploring Data
Daniela Stan, PhD School of CTI, DePaul University
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 3: The Normal Distributions
Basic Practice of Statistics - 3rd Edition The Normal Distributions
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Describing Location in a Distribution
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 3: The Normal Distributions
Chapter 1: Exploring Data
Presentation transcript:

Data Analysis and Statistical Software I Quarter: Spring 2003 Daniela Stan Raicu School of CTI, DePaul University 1/18/2019 Daniela Stan - CSC323

Outline Describing distributions with numbers (continuation from the previous lecture) The 1.5 X IQR criterion for suspected outliers Measuring spread: the standard deviation Normal Distribution Standard Normal Distribution 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) Measuring spread: the quartiles The pth percentile of a distribution is the value such that p percent of the observations fall at or below it. The 50th percentile = median, M The 25th percentile = first quartile, Q1 The 75th percentile = third quartile, Q3 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. Example: 1.13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30 M=?, Q1=?, Q3=? 1/18/2019 Daniela Stan - CSC323

Describing Distributions (cont.) The Five-Number Summary of a set of observations consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from the smallest to the largest. In symbols, the five number summary is Minimum Q1 M Q3 Maximum A boxplot is a graph of the five-number summary: A central box spans the quartiles Q1 and Q3 A line in the box marks the median M Lines extend from the box out to the smallest and largest observations 1/18/2019 Daniela Stan - CSC323

Weight Data: Sorted 1/18/2019 Daniela Stan - CSC323

Weight Data: Quartiles 10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Weight Data: Quartiles first quartile median or second quartile third quartile Q1= 127.5 Q2= 165 (Median) Q3= 185 1/18/2019 Daniela Stan - CSC323

range = max  min = 160 Five-Number Summary minimum = 100 first quartile = 127.5 second quartile = 165 third quartile = 185 maximum = 260 interquartile range = Q3  Q1 = 57.5 range = max  min = 160 1/18/2019 Daniela Stan - CSC323

Five-Number Summary: Boxplot Q1 M Q3 min max 100 125 150 175 200 225 250 275 Weight 1/18/2019 Daniela Stan - CSC323

Recommended Problems Chapter 1: Section 1.1 IPS web site: http://www.whfreeman.com/ips4e 1/18/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion The interquartile range IQR: is the distance between the first and third quartiles: IQR=Q3 – Q1 The 1.5 X IQR criterion for outliers: An observation is a suspect outlier if it falls more than 1.5 X IQR above the third quartile or below the first quartile. Modified boxplot: - the lines extend out from the central box only to the smallest and largest observations that are not suspected outliers. - the suspected outliers are plotted as individual points. 1/18/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) Examples 1.9/page 14 & 1.17/page 46 1/18/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) Shape? skewed to the right with a single peak at the left Outliers? The one state that stands out is New Mexico with 38.7% Histogram of the percent of Hispanics in the adult population 1/18/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) The five number summary is: 0.6 2.0 4.1 38.7 7.0 Minimum M Q1 Maximum Q3 The 1.5 X IQR criterion for outliers: IQR=Q3 – Q1=5 1.5 X IQR=7.5 Suspected outlier: any value below Q1-1.5 X IQR or above Q3+1.5 X IQR Q1-1.5 X IQR=2.0-7.5= -5.5 Q3+1.5 X IQR=7.0+7.5=14.5 There are 7 suspected outliers 1/18/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) Modified boxplot: The points represent the suspected outliers. 1/18/2019 Daniela Stan - CSC323

Measuring the spread: Variance and Standard Deviation If all values are the same, what is the variation in the data? Variation exists when some values are above or below the mean. Each data value has an associated deviation from the mean 1/18/2019 Daniela Stan - CSC323

Deviations and Variance A deviation: what is a typical deviation from the mean? small values of this typical deviation indicate small variation in the data; large values of this typical deviation indicate large variation in the data Variance: Find the mean Find the deviation of each value from the mean Square the deviations Sum the squared deviations Divide the sum by n-1 1/18/2019 Daniela Stan - CSC323

Measuring Spread: The standard deviation The variance s2 of a set of observations x1, x2,…, xn is the average of the squares of the observations from their mean: or, in more compact notation 1/18/2019 Daniela Stan - CSC323

Measuring Spread: The standard deviation The standard deviation s is the square root of the variance s2: The number n-1 is called degree of freedom of the variance or standard deviation. When standard deviation s is equal to zero? Is standard deviation s a resistant measure ? 1/18/2019 Daniela Stan - CSC323

The standard deviation (cont.) Example: Problem 1.59 Choosing measures for center and spread: - if the distribution is skewed, choose five number summary - if the distribution is symmetric and free of outliers, choose the mean and the standard deviation 1/18/2019 Daniela Stan - CSC323

Density curves Sometimes the overall pattern of a large number of observations is so regular that we can describe it by smooth curve. The curve is the mathematical model for the distribution. A density curve is a curve that is always on or above horizontal axis and has area exactly 1 underneath it. The histogram of all 947 seventh grade students in Gary, Indiana, on the vocabulary part of the Iowa test. A symmetric density curve 1/18/2019 Daniela Stan - CSC323

The normal distributions Normal curves are density curves that are: Symmetric Unimodal Bell-Shaped 1/18/2019 Daniela Stan - CSC323

The normal distributions (cont.) A normal distribution is specified by: Mean  Standard Deviation  Notation: N(, ) The equation of the normal distribution ( gives the height of the normal distribution) : 1/18/2019 Daniela Stan - CSC323

? 1/18/2019 Daniela Stan - CSC323

The normal distributions (cont.) Example of two normal curves specified by their mean and standard deviation f(x) Can we locate the standard deviation with the eye? 1/18/2019 Daniela Stan - CSC323

The 68-95-99.7 rule In the normal distribution N(, ): Approximately 68% of the observations are between -  and +  Approximately 95% of the observations are between - 2 and + 2 Approximately 99.7% of the observations are between - 3 and + 3 1/18/2019 Daniela Stan - CSC323

Empirical Rule for Any Normal Curve +1* -1* 68% +2 * -2*  95%  +3 * -3 * 99.7%  1/18/2019 Daniela Stan - CSC323

Health and Nutrition Examination Study of 1976-1980 (HANES) Heights of adults, aged 18-24 women mean: 65.0 inches standard deviation: 2.5 inches men mean: 70.0 inches standard deviation: 2.8 inches 1/18/2019 Daniela Stan - CSC323

Health and Nutrition Examination Study of 1976-1980 (HANES) Empirical Rule women 68% are between 62.5 and 67.5 inches [mean  1 std dev = 65.0  2.5] 95% are between 60.0 and 70.0 inches 99.7% are between 57.5 and 72.5 inches men 68% are between 67.2 and 72.8 inches 95% are between 64.4 and 75.6 inches 99.7% are between 61.6 and 78.4 inches 1/18/2019 Daniela Stan - CSC323

With the Mean and Standard Deviation of the Normal Distribution We Can Determine: What proportion of individuals fall into any range of values Example: What proportion of men are less than 68 inches tall? At what percentile a given individual falls, if you know their values What value corresponds to a given percentile ? 68 70 (height values) 1/18/2019 Daniela Stan - CSC323