Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03

Slides:



Advertisements
Similar presentations
1.2: Describing Distributions
Advertisements

CHAPTER 3: The Normal Distributions Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Objectives 1.2 Describing distributions with numbers
CHAPTER 3: The Normal Distributions ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density.
Section 2.1 Density Curves. Get out a coin and flip it 5 times. Count how many heads you get. Get out a coin and flip it 5 times. Count how many heads.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
The Normal Distributions.  1. Always plot your data ◦ Usually a histogram or stemplot  2. Look for the overall pattern ◦ Shape, center, spread, deviations.
Chapter 2 The Normal Distributions. Section 2.1 Density curves and the normal distributions.
Section 2.1 Density Curves
2.2 Normal Distributions
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2: Describing Distributions with Numbers
Good Afternoon! Agenda: Knight’s Charge-please wait for direction
CHAPTER 2: Describing Distributions with Numbers
Describing Location in a Distribution
CHAPTER 3: The Normal Distributions
Density Curves and Normal Distribution
CHAPTER 2 Modeling Distributions of Data
Daniela Stan Raicu School of CTI, DePaul University
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
2.1 Density Curve and the Normal Distributions
DAY 3 Sections 1.2 and 1.3.
Daniela Stan Raicu School of CTI, DePaul University
Warmup What is the height? What proportion of data is more than 5?
CHAPTER 1 Exploring Data
CHAPTER 2 Modeling Distributions of Data
Data Analysis and Statistical Software I Quarter: Spring 2003
Describing Quantitative Data with Numbers
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
Chapter 2: Modeling Distributions of Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Do Now In BIG CLEAR numbers, please write your height in inches on the index card.
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 3: The Normal Distributions
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Describing Location in a Distribution
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 3: The Normal Distributions
Chapter 1: Exploring Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Presentation transcript:

Data Analysis and Statistical Software I (323-21-403) Quarter: Autumn 02/03 Daniela Stan, PhD Course homepage: http://facweb.cs.depaul.edu/Dstan/csc323 Office hours: (No appointment needed) M, 3:00pm - 3:45pm at LOOP, CST 471 W, 3:00pm - 3:45pm at LOOP, CST 471 1/13/2019 Daniela Stan - CSC323

Outline The 1.5 X IQR criterion for suspected outliers Measuring spread: the standard deviation Normal Distribution Standard Normal Distribution Introduction to SAS 1/13/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion The interquartile range IQR: is the distance between the first and third quartiles: IQR=Q3 – Q1 The 1.5 X IQR criterion for outliers: An observation is a suspect outlier if it falls more than 1.5 X IQR above the third quartile or below the first quartile. Modified boxplot: - the lines extend out from the central box only to the smallest and largest observations that are not suspected outliers. - the suspected outliers are plotted as individual points. 1/13/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) Examples 1.9/page 14 & 1.17/page 46 1/13/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) Shape? skewed to the right with a single peak at the left Outliers? The one state that stands out is New Mexico with 38.7% Histogram of the percent of Hispanics in the adult population 1/13/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) The five number summary is: 0.6 2.0 4.1 38.7 7.0 Minimum M Q1 Maximum Q3 The 1.5 X IQR criterion for outliers: IQR=Q3 – Q1=5 1.5 X IQR=7.5 Suspected outlier: any value below Q1-1.5 X IQR or above Q3+1.5 X IQR Q1-1.5 X IQR=2.0-7.5= -5.5 Q3+1.5 X IQR=7.0+7.5=14.5 There are 7 suspected outliers 1/13/2019 Daniela Stan - CSC323

The 1.5 X IQR criterion (cont.) Modified boxplot: The points represent the suspected outliers. 1/13/2019 Daniela Stan - CSC323

Measuring Spread: The standard deviation The variance s2 of a set of observations x1, x2,…, xn is the average of the squares of the observations from their mean: or, in more compact notation 1/13/2019 Daniela Stan - CSC323

Measuring Spread: The standard deviation The standard deviation s is the square root of the variance s2: The number n-1 is called degree of freedom of the variance or standard deviation. When standard deviation s is equal to zero? Is standard deviation s a resistant measure ? 1/13/2019 Daniela Stan - CSC323

The standard deviation (cont.) Example: Problem 1.59 Choosing measures for center and spread: - if the distribution is skewed, choose five number summary - if the distribution is symmetric and free of outliers, choose the mean and the standard deviation 1/13/2019 Daniela Stan - CSC323

The normal distributions Sometimes the overall pattern of a large number of observations is so regular that we can describe it by smooth curve. The curve is the mathematical model for the distribution. A density curve is a curve that is always on or above horizontal axis and has area exactly 1 underneath it. The histogram of all 947 seventh grade students in Gary, Indiana, on the vocabulary part of the Iowa test. A symmetric density curve Notation: Mean:  Standard deviation:  1/13/2019 Daniela Stan - CSC323

The normal distributions (cont.) Normal curves are density curves that are: Symmetric Unimodal Bell-Shaped A normal distribution is specified by: Mean  Standard Deviation  Notation: N(, ) The equation of the normal distribution is: 1/13/2019 Daniela Stan - CSC323

The normal distributions (cont.) Example of two normal curves specified by their mean and standard deviation f(x) Can we locate the standard deviation with the eye? 1/13/2019 Daniela Stan - CSC323

The 68-95-99.7 rule In the normal distribution N(, ): Approximately 68% of the observations are between -  and +  Approximately 95% of the observations are between - 2 and + 2 Approximately 99.7% of the observations are between - 3 and + 3 1/13/2019 Daniela Stan - CSC323

Standardizing and z-Score If x is an observation from a distribution N(, ), the standardized value of x, called z-value, is: If the z-value is negative, the observation x is less than the mean If the z-value is positive, the observation x is greater than the mean 1/13/2019 Daniela Stan - CSC323

The standard normal distribution The standard normal distribution N(0,1) is the normal distribution with mean 0 and standard deviation 1 If a variable X has any normal distribution N(, ), then the standardized variable Z has the standard normal distribution N(0,1). Why are normal distributions so important? Many statistical inference procedures based on normal distributions work well for other roughly symmetric distributions. They are good descriptions for real data 1/13/2019 Daniela Stan - CSC323

Normal distribution calculations Example: The heights of young women are approximately normal with mean =64.5 inches and =2.5 inches. What is the proportion of women how are less than 68 inches tall? 1. State the problem: X = height, X < 68 2. Standardize: 68 standardized to 1.4 X<68 Z < 1.4 1/13/2019 Daniela Stan - CSC323

Normal distribution calculations 3. What proportion of observations/women on the standard normal variable Z take values less than 1.4? Table entry is area to the left of z Table A at the end of the book gives areas (proportions of observations) under standard normal curve. 1/13/2019 Daniela Stan - CSC323

Assignment #1 Due Date: 09/25/02 at 1:30pm Chapter 1: Problem 1.124/page 95 Problem 1.134/page 99 1/13/2019 Daniela Stan - CSC323