The Normal Distribution Chapter 3 College of Education and Health Professions.

Slides:



Advertisements
Similar presentations
Descriptive Statistics-II
Advertisements

Dr Richard Bußmann CHAPTER 12 Confidence intervals for means.
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Normal Distribution Sampling and Probability. Properties of a Normal Distribution Mean = median = mode There are the same number of scores below and.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Assumption of normality
Central Limit Theorem.
Exploring Assumptions
Descriptive Statistics In SAS Exploring Your Data.
Measures of Dispersion
Chapter Six z-Scores and the Normal Curve Model. Copyright © Houghton Mifflin Company. All rights reserved.Chapter The absolute value of a number.
Chapter 11: Inference for Distributions
Modular 13 Ch 8.1 to 8.2.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Chapter 5 DESCRIBING DATA WITH Z-SCORES AND THE NORMAL CURVE.
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area.
Sampling Distribution of the Mean Problem - 1
SW318 Social Work Statistics Slide 1 Estimation Practice Problem – 1 This question asks about the best estimate of the mean for the population. Recall.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Kolmogorov-Smirnov Test PowerPoint Prepared by.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Chapter 9 Sampling Distributions and the Normal Model © 2010 Pearson Education 1.
Overview 7.2 Central Limit Theorem for Means Objectives: By the end of this section, I will be able to… 1) Describe the sampling distribution of x for.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
CHAPTER 18: Inference about a Population Mean
A P STATISTICS LESSON 2 – 2 STANDARD NORMAL CALCULATIONS.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Slide 6-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Distributions of the Sample Mean
©2003 Thomson/South-Western 1 Chapter 3 – Data Summary Using Descriptive Measures Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson.
SW318 Social Work Statistics Slide 1 One-way Analysis of Variance  1. Satisfy level of measurement requirements  Dependent variable is interval (ordinal)
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
1 Sampling Distribution of Arithmetic Mean Dr. T. T. Kachwala.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Univariate Normality PowerPoint Prepared by Alfred.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Analysis of Variance STAT E-150 Statistical Methods.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai The Normal Curve and Univariate Normality PowerPoint.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
Section 1.1, Slide 1 Copyright © 2014, 2010, 2007 Pearson Education, Inc. Section 14.4, Slide 1 14 Descriptive Statistics What a Data Set Tells Us.
Chapter 9 Roadmap Where are we going?.
Sampling Distributions
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
The Standard Deviation as a Ruler and the Normal Model
Assumption of normality
Statistical Reasoning in Everyday Life
Chapter 7 Sampling Distributions.
ANATOMY OF THE EMPIRICAL RULE
Lab 2 Data Manipulation and Descriptive Stats in R
Chapter 7 Sampling Distributions.
Descriptive and inferential statistics. Confidence interval
Dispersion How values arrange themselves around the mean
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Presentation transcript:

The Normal Distribution Chapter 3 College of Education and Health Professions

Assumption of Normality Many statistical tests (t, ANOVA) assume that the sampling distribution is normally distributed. This is a problem, we don’t have access to the sampling distribution. But, according to the central limit theorem if the sample data are approximately normal, then the sampling distribution will be normal. Also from the central limit theorem, in large samples (n > 30) the sampling distribution tends to be normal, regardless of the shape of the data in our sample. Our task is to decide when a distribution is approximately normal.

A Z Score is a Standardized Statistic 95.0% of the scores fall between a Z of to % of the scores fall between a Z of to % of the scores fall between a Z of to +3.30

How do we decide if a distribution is approximately normal? 99.9% of the scores fall between a Z of to +3.30

Normality Statistics are not Reliable for Large Samples Z > 1.95 is significant at p <.05 Z > 2.58 is significant at p <.01 Z > 3.29 is significant at p <.001 Significance tests for normality: Kolomogorov-Smirnov, Shapiro- Wilk, skew, kurtosis, should not be used in large samples (because they are likely to be significant even when skew and kurtosis are not too different from normal.

Small Data Sets (n<20) and Normality? Small data sets are adversely affected by occasional extreme scores, even when the extreme score is less than a Z of

Approximately Normal? We will use the following to determine if a distribution is approximately normal: 1.Q-Q Plot values should lie close to the 45 line. 2.Distribution should be similar in shape to the normal curve. 3.Skew & Kurtosis should be reasonably close to 0. 4.Data points with a Z score > +3.3 or < -3.3 will be considered as outliers and removed.

1 Settings to Superimpose Normal Curve on Histogram

Histogram with Normal Curve Distribution may be leptokurtic (peaked) Positively skewed? Need to run Explore.

Explore Settings

We are 95% sure that the true mean lies between and The distribution is positively skewed. The distribution is leptokurtic. This may be an outlier, we can use the descriptives command to generate Z scores.

Don’t use the K-S test for normality. If the Shapiro-Wilk is less than 0.05 the data may not be normal. But, it is not a perfect test. For small data sets (n < 100) if S-W has a p < the data may not be normal. The test is unreliable for large data sets n > 100.

The circles should fall on the 45 degree line. For this data set the ends are deviating from the line, again suggesting a problem with normality.

Case number 282 has a star, indicating that it is an extreme score. An extreme value, E, is defined as a value that is smaller (or larger) than 3 box- lengths. We need to convert the data to Z scores to examine the Z for case 282.

Check this box to generate Z scores for any variables in the Variable(s) box. The output is shown on the next slide, it places the Z score in your Data Sheet by adding a Z in front of the variable(s) selected.

The Z score for case 282 is Since the value is much greater than our arbitrary cutoff of 3.3. We will delete this data point then re-run Explore. Copy the column, rename the new variable RTimeTrimmed, then delete case 282. The new column ZReactionTime contains the Z scores for ReactionTime.

The top descriptive output is the original data set. Here is the data set with case 282 removed. Look at the changes in the 95% CI, Skew and Kurtosis.

The Shapiro-Wilk is significant, indicating there may be a problem with normality. Looks like case 84 and 100 may be the cause. We need to generate Z scores again.

The Q-Q plot and the box plot both suggest a problem. We need to run Z scores to look at case 100 & 84.

Compute Z scores for the RTimeTrimmed variable. The new variable will be in the data sheet labeled ZRTimeTrimmed

Both case 84 & 100 have a Z score above 3.3, our arbitrary cutpoint. We can delete them and then run Explore. Copy column RTimeTrimmed, make a new variable RTimeTrimmed 2

The original data set is on the top. RTimeTrimmed2 now has 3 data points that have been deleted.

We still may have normality problems.

Looking better, run Z scores again on RTimeTrimmed2, check cases 235 & 290.

The Z score for case 235 is 3.17, and for case 290 is They are both below our arbitrary cutoff of 3.3 for a Z score. The distribution is now approximately normally distributed.