Standard Deviation Z Scores. Learning Objectives By the end of this lecture, you should be able to: – Describe the importance that variation plays in.

Slides:



Advertisements
Similar presentations
SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
Advertisements

Using the Rule Normal Quantile Plots
Standard Normal Table Area Under the Curve
Inference: Confidence Intervals
Looking at data: distributions - Describing distributions with numbers IPS chapter 1.2 © 2006 W.H. Freeman and Company.
2-5 : Normal Distribution
1.2: Describing Distributions
Central Tendency and Variability
Normal Distribution Recall how we describe a distribution of quantitative (continuous) data: –plot the data (stemplot or histogram) –look for the overall.
Today: Central Tendency & Dispersion
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA.
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Density Curves Normal Distribution Area under the curve.
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Normal Distribution MATH 102 Contemporary Math S. Rook.
Probability, contd. Learning Objectives By the end of this lecture, you should be able to: – Describe the difference between discrete random variables.
Stat 1510: Statistical Thinking and Concepts 1 Density Curves and Normal Distribution.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Sample-Based Epidemiology Concepts Infant Mortality in the USA (1991) Infant Mortality in the USA (1991) UnmarriedMarriedTotal Deaths16,71218,78435,496.
1 2.4 Describing Distributions Numerically – cont. Describing Symmetric Data.
CHAPTER 3: The Normal Distributions
The Normal Model and Z-Scores
Slide 6-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Central Tendency & Dispersion
1 Psych 5500/6500 Measures of Variability Fall, 2008.
Measures of variability: understanding the complexity of natural phenomena.
Numerical descriptors BPS chapter 2 © 2006 W.H. Freeman and Company.
Describing Distributions with Numbers Chapter 2. What we will do We are continuing our exploration of data. In the last chapter we graphically depicted.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Intro to Inference & The Central Limit Theorem. Learning Objectives By the end of this lecture, you should be able to: – Describe what is meant by the.
Today: Standard Deviations & Z-Scores Any questions from last time?
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Normal Distributions (aka Bell Curves, Gaussians) Spring 2010.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 16, 2009.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
IPS Chapter 1 © 2012 W.H. Freeman and Company  1.1: Displaying distributions with graphs  1.2: Describing distributions with numbers  1.3: Density Curves.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Introduction We learned from last chapter that histogram can be used to summarize large amounts of data. We learned from last chapter that histogram can.
CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
The Normal Distributions.  1. Always plot your data ◦ Usually a histogram or stemplot  2. Look for the overall pattern ◦ Shape, center, spread, deviations.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
The Normal Distribution
Distribution of the Sample Means
Chapter 5: Describing Distributions Numerically
Normal Distribution Z-distribution.
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
CHAPTER 1 Exploring Data
Standard Deviation and the Normal Model
Standard Normal Table Area Under the Curve
Standard Deviation Z Scores
Density Curves Normal Distribution Area under the curve
Using the Rule Normal Quantile Plots
Density Curves Normal Distribution Area under the curve
Using the Rule Normal Quantile Plots
Standard Normal Table Area Under the Curve
Presentation transcript:

Standard Deviation Z Scores

Learning Objectives By the end of this lecture, you should be able to: – Describe the importance that variation plays in interpreting a distribution – Understand how the standard deviation is calculated – Describe what is meant by a z-score and be able to calculate them – Start learning Greek: Be comfortable with the Greek letters for mean and standard deviation.

Another way of describing variation Recall that we usually describe a distribution in terms of its shape, center, and variation. One method of describing the variation has already been discussed: Quartiles and the related 5-number summary. Another extremely important method of describing variation is the ‘variance’ or ‘standard deviation’. – Variance and SD are essentially the same thing. The standard deviation is simply the square root of the variance. – Eg: If Variance is 25, then SD = 5. If SD = 7, then Variance = 49. Generally, you should only use SD with data that has a symmetrical distribution (ie doesn’t work well with skewed data). – This is because in order to calculate the SD, we need to use the mean of that data. – To review a term: We say that the SD is not resistant to skewed data.

Variation In every set of values (“dataset”—know this term!) e.g. ages, heights, incomes, crop yields, spitball-distances, etc there will always be variation. The question is, how do we describe just how varied our data is? Are all the points closely clustered around some value or are they all over the map? Having a sense of how much the various datapoints are spread out around the center (mean or median) is an important piece of information. Consider employee pay at DePaul University: if we looked at every income from student workers to the Provost/President, we would see a tremendous variation around the center as some people make a great deal more than the center, and some make much less. However, if we focused only on student employees, we would find that the variation around the center is considerably less. Suppose we were looking at a widget used in the construction of a high-performance laser. Suppose this widget is intended to be exactly 2.3 inches in diameter. A manufacturer must make their parts extremely close to this diameter in order for them to fit properly. We would assume that the average diameter of, say, parts would indeed be very close to 2.3. However, the question from a manufacturing perspective would be: How much variation is there among the parts? The manufacturer would hope that the variation would be extremely small, that is, nearly all the parts are very close in size to the mean. If there is a large variation, it would mean that there are several parts that are inappropriately sized, and many of our very expensive lasers are going to be faulty. The most common statistic we used for measuring variability (spread) is called the standard deviation.

How to calculate the SD (This is for exaplanatory purposes only - you will not have to do this by hand!) 1.Find the mean (e.g. 63.5) 2.Look at one datapoint (e.g. 63) and calculate its difference from the mean. (e.g. -0.5) Square it that value (0.25). 3.Repeat for all datapoints. 4.Then add up all values from step 3. 5.Divide by by n-1 6.You now have the variance. To get from variance to ‘s’, simply take the square root of the variance.

Calculating standard deviation: ‘s’ Calculating these by hand can be tedious. We will do a couple of examples, but will quickly switch over to doing them using a computer. Mean = 63.4 Sum of squared deviations from mean = 85.2 Degrees freedom (df) = (n − 1) = 13 s 2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 = 2.56 inches Women height (inches)

Review: Why do we care so much about Normal distributions? Much of the data we examine in the real world turns out to have a normal distribution (exam scores, survival length for cancer patients, crop yields, SAT results, people’s heights, birth rate of rabbits, games of chance, etc, etc, etc) As a result, a great deal of study and research has gone into the properties of Normal distributions/curves, as a result of which, we have many tools that we can use to come up with statistics for analysis of our data. In fact, the various tools discussed in this lecture (standard deviation, z-score, z-tables) only apply to Normal distributions.

Calculating areas under the Normal curve In addition to being a measure of spread, the standard deviation is a number we can use to accurately determine the area under any segment of the Normal curve. We will discuss this over the next couple of lectures.

Overview of determining the area under the curve A fairly straight-forward process: 1.Using the standard deviation, convert the value you are interested in (e.g. a Grade equivalent score of 6.0) into a z-score. 2.Look up your z-score in a normal probability table (also called a z-table) 3.The value you find on the z-table, is the area under the curve to the left of your score (e.g. 6.0). Whiile you should spend a few minutes thinking about this slide, don’t worry too much about it for now. We will revisit it later.

Know the following two facts: 1.A z-score measures the number of standard deviations that some value ‘x’ is from the mean. E.g. a z-score of +1.0 means the value lies 1 standard deviation above the mean. E.g. a z-score of +0.5 means the value lies one-half sd above the mean. E.g. a z-score of means the value lies 1.23 standard deviation below the mean. 2.As long as you know the SD, any value within Normal distribution can be converted to a z-score. That is, any value along the x-axis, can be converted to a z-score * The z-score

It’s all Greek to me …and it’s gonna get a little worse, but we’ll start slow… Still, in the math world, there are numerous ‘shortcuts’ that are used to represent certain concepts. Such symbols are both widespread and useful (though it’s true that sometimes geeks just like to try to impress us), so you should get comfortable with them as they are introduced. Here are two to begin with: σ (sigma)  standard deviation μ (mu)  mean

Practice: z-score using the GEQ (Grade Equivalent Score) dataset If I tell you that: Mean ( μ ) = 7, Standard Deviation (σ ) is 1 Example: Your score is 6. How many standard deviations does your score lie above or below the mean? – Answer: z = -1. Your score is exactly 1 standard deviation below the mean. Example: Your score is 8. How many standard deviations does your score lie above or below the mean? – Answer: z = +1. Your score is exactly 1 standard deviation above the mean. Example: Your score is 5.5. How many standard deviations does your score lie above or below the mean? – Answer: z = Your score is 1.5 standard deviations below the mean.

z-score = number of SDs Recall that z-score is simply a value that tells us the number of standard deviations above or below the mean. So if I ask you “How many SDs above or below the mean?” – that is exactly the same thing as asking you for “the z-score”. If you are not clear on this concept, be sure to review the previous slide (and the next one).

Practice: z-score using the GEQ (Grade Equivalent Score) dataset Given: Mean ( μ ) = 7, Standard Deviation ( σ ) is 0.5 Example:x = 6. How many standard deviations does this observation lie above or below the mean? I.e. What is the z-score? – Answer: z = -2. Your score is exactly 2 standard deviations below the mean. Example:x = 8.5. What is the z-score? – Answer: z = +3. Your score is 3 standard deviations above the mean. Example:x =5. What is the z-score? – Answer: z = -4. Your score is 4 standard deviations below the mean. Example:x = What is the z-score? – Answer: Uh-oh, this one is a bit trickier… See next slide

You will find this relatively simple formula quite useful: Recall that a z-score measures the number of standard deviations that a data value x is from the mean . Formula for calculating the z-score x the observation we are asking about μ the mean of the population σ the standard deviation of the population

Practice: z-score using the GEQ (Grade Equivalent Score) dataset Example: Your score is The distribution has μ = 7, σ = 0.5. What is your z-score? – Answer: Use the formula: – So x= μ = 7 σ = 0.5 Therefore: z = (5.423 – 7) / 0.5 z = – In other words, our value lies standard deviations below the mean.

Example: Calculate the z-score for x = 200. μ = 197 and σ = 342. Answer: Because this is not a Normal distribution, the mean and standard deviation are NOT good choices of statistics to use in an analysis. And without a mean and standard deviation, we can not calculate a z-score! – Take-home point #1: Standard deviations should be avoided for distributions that are skewed and/or have outliers.

Punching numbers into a calculator (or statistical software) is easy!! It is ALWAYS possible to calculate a mean (and sd, and other statistics) for a dataset. However, just because you can punch numbers into a calculator does not mean that those numbers mean anything useful!!!! One of the most important things I want you to take away from this course is the ability to recognize the right and wrong tool for the job! This takes a little bit of review and practice of concepts, but it is absolutely doable!! Recall from our first lectures that the mean is usually the WRONG tool for the center of a distribution if the distribution is skewed or has significant outliers.