The Sample Variance © Chistine Crisp Edited by Dr Mike Hughes.

Slides:



Advertisements
Similar presentations
Measures of Dispersion
Advertisements

Estimating the Standard Deviation © Christine Crisp “Teach A Level Maths” Statistics 2.
“Teach A Level Maths” Statistics 1
Measures of Dispersion
Statistics for the Social Sciences
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
2.3. Measures of Dispersion (Variation):
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Objectives The student will be able to: find the variance of a data set. find the standard deviation of a data set. SOL: A
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Describing Data: Numerical
Describing distributions with numbers
Section 3.2 Measures of Variation Range Standard Deviation Variance.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Chapter 3 – Descriptive Statistics
Estimating the Standard Deviation © Christine Crisp “Teach A Level Maths” Statistics 1.
Chapter 3 Averages and Variations
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
The Normal Distribution © Christine Crisp “Teach A Level Maths” Statistics 1.
1.3: Describing Quantitative Data with Numbers
Statistics: For what, for who? Basics: Mean, Median, Mode.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Descriptive Statistics Measures of Variation. Essentials: Measures of Variation (Variation – a must for statistical analysis.) Know the types of measures.
CHS Statistics 2.5: Measures of Spread
Standard Deviation © Christine Crisp “Teach A Level Maths” Statistics 1.
1.1 - Populations, Samples and Processes Pictorial and Tabular Methods in Descriptive Statistics Measures of Location Measures of Variability.
Least Squares Regression: y on x © Christine Crisp “Teach A Level Maths” Vol. 2: A2 Core Modules.
Describing distributions with numbers
Objectives The student will be able to: find the variance of a data set. find the standard deviation of a data set.
Lesson Describing Distributions with Numbers adapted from Mr. Molesky’s Statmonkey website.
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
SECTION 12-3 Measures of Dispersion Slide
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Descriptive Statistics: Presenting and Describing Data.
Describing Quantitative Data Numerically Symmetric Distributions Mean, Variance, and Standard Deviation.
Finding the Mean © Christine Crisp “Teach A Level Maths” Statistics 1.
Discrete Random Variables
Introduction to Statistics Santosh Kumar Director (iCISA)
9.3 – Measures of Dispersion
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
1.  In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which.
Chapter 1: Exploring Data Lesson 7: Variance and Standard Deviation Mrs. Parziale.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
CHAPTER 2: Basic Summary Statistics
The Sample Variance © Christine Crisp “Teach A Level Maths” Statistics 1.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Measures of Variation. Range, Variance, & Standard Deviation.
Normal Distribution. Normal Distribution Curve A normal distribution curve is symmetrical, bell-shaped curve defined by the mean and standard deviation.
Chapter 1 Lesson 7 Variance and Standard Deviation.
Statistics Descriptive Statistics. Statistics Introduction Descriptive Statistics Collections, organizations, summary and presentation of data Inferential.
Measures of dispersion
Measures of Dispersion
Objectives The student will be able to:
Introductory Mathematics & Statistics
Do-Now-Day 2 Section 2.2 Find the mean, median, mode, and IQR from the following set of data values: 60, 64, 69, 73, 76, 122 Mean- Median- Mode- InterQuartile.
“Teach A Level Maths” Statistics 1
“Teach A Level Maths” Yr1/AS Statistics
Describing Quantitative Data Numerically
Chapter 1: Exploring Data
“Teach A Level Maths” Statistics 1
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
CHAPTER 2: Basic Summary Statistics
2.3. Measures of Dispersion (Variation):
Presentation transcript:

The Sample Variance © Chistine Crisp Edited by Dr Mike Hughes

The Sample Variance Can you find the medians and means for the following 3 data sets? Although the medians and means are the same, the data sets are not really alike. The spread or variability of the numbers is quite different. How can we measure the spread within the data sets? ANS: The range and inter-quartile range both measure spread but neither uses all the data items Set C Set B Set A Mean,Median Interquartile range we will do later with Cumulative Frequency

The Sample Variance If you had to invent a method of measuring spread that used all the data items, what could you do? One thing we could do is find out how far each item is from the mean and add up these differences. e.g.  4  = Data sets B and C give the same result. The negative and positive values have cancelled each other out 11 22 33 4 Set C Set B Set A Mean,Median Set A: x 0

The Sample Variance To avoid the effect of the negative values we can either ignore the negative signs, or square each difference ( since the squares will all be positive ). Squaring is more convenient for developing theory, so, e.g 11 22 33 4 Set A: x Let’s do this calculation for all 3 data sets:

The Sample Variance Set A:Set B:Set C: The larger value for set B shows greater variability. Set C has least variability. Can you see a snag with this measurement? ANS: The calculated value increases if we have more data, so comparing data sets with different numbers of items would not be possible. To allow for this, we need to take n, the number of items, into account Set C: x Set B: x Set A: x Mean, x

The Sample Variance There are 2 formulae that can be used, the mean square deviation. or the sample variance. Our data is nearly always a sample from a large unknown set of data ( the population ) and we take samples to find out about the population. The 1 st formula does not give the best estimate of the variance of the population so is not used.

The Sample Variance So, there are 2 quantities and their square roots that we need to be clear about the mean square deviation, POPULATION VARIANCE Also the sample variance, and the root mean square deviation. POPULATION STANDARD DEVIATION and the sample standard deviation. WE nearly ALWAYS use THESE TWO formula

The Sample Variance e.g. Find the rmsd and msd of the following data: (i) x7914 Mean, (ii) The 2 nd form is exactly the same as the first form but quicker to use !!

The Sample Variance e.g. Find the sample SD and Variance of the following data: (i) x7914 Mean, (ii) The 2 nd form is in general quicker to use.

The Sample Variance This all seems very complicated but help is at hand. Both the quantities, rmsd and s are given by your calculator. The rmsd is smaller than s ( because we are dividing by a larger number ). Correct to 3 s.f. we have e.g. Find the root mean square deviation, rmsd, and the sample standard deviation, s, for the following data: x7914 Use the Statistics function on your calculator and enter the data. Select the list of calculations. You will be able to find the following: and

The Sample Variance x7914 So, for the data we have Squaring these gives ( sample variance )( mean square deviation ) The part of the formula,, is in your formulae sheet, labelled S xx. (said as Sum of squares X X) An expanded form of the expression is also given. All you have to do is divide by the correct quantity.

The Sample Variance  The mean square deviation, msd, and sample variance, both measure the spread or variability in the data. SUMMARY  To find the msd or sample variance, we square the relevant quantity given by the calculator:  If we have raw data we use the statistical functions on the calculator to find the rmsd or sample standard deviation. msd = ( rmsd ) 2 sample variance  s 2  Your formulae sheet will gives the formula or equivalent: Then, we divide by n for the msd or ( n – 1 ) for s 2. The sample standard deviation is the larger than the rmsd because we divide by (n-1)

The Sample Variance The formula for the variance can be easily adapted to find the variance of frequency data. Becomes for FREQUENCY DATA Frequency Data We usually only use the formulae if we are given summary data. With raw data we enter the data into the calculator and use the statistical functions to get the answers directly.

The Sample Variance But note that becomes Frequency Data

The Sample Variance SO MSD= SXX/n and VARIANCE = SXX/(n-1) becomes Frequency Data

The Sample Variance e.g.1 Find the mean and sample standard deviation of the following data: x12510 Frequency, f 3584 Solution: sample standard deviation, Using the calculator functions, the mean,  = Although we don’t need the formula for this question, let’s check we have the correct value by using the formula:

The Sample Variance e.g.1 Find the mean and sample standard deviation of the following data: x12510 Frequency, f 3584 Solution: So,

The Sample Variance Length (cm) Frequency, f e.g.2 Find the sample standard deviation of the following lengths:

The Sample Variance e.g.2 Find the sample standard deviation of the following lengths: Length (cm) x Frequency, f Solution: Standard deviation, s = We need the class mid-values ·5 x2x x2fx2f xf

The Sample Variance e.g.3 Find the mean and sample variance of 20 values of x given the following: Solution: and sample mean, Since we only have summary data, we must use the formulae sample variance,

The Sample Variance SUMMARY Frequency data Raw data MSD is called POPULATION VARIANCE Take square root for rmsd and sample standard deviation RMSD is called POPULATION STANDARD DEVIATION

The Sample Variance Exercise Find the mean, sample standard deviation and sample variance for each of the following samples, using calculator functions where appropriate f 54321x f Time ( mins ) observations where and

The Sample Variance f 54321x mean, variance, standard deviation, s = Answer: mean, variance, standard deviation, s = x Time ( mins ) f N.B. To find we need to use the full calculator value for s, not the answer to 3 s.f.

The Sample Variance observations where and Solution: Standard deviation, s mean, variance,

The Sample Variance There are 2 formulae that can be used to measure spread: or the mean square deviation. the sample variance, In many books you will find the word variance used for the 1 st of these formulae and you may have used it at GCSE. However, our data is nearly always a sample from a large unknown set of data ( the population ) and we take the sample to find out about the population. The 1 st formula does not give the best estimate of the variance of the population so is not used.

The Sample Variance So, there are 2 quantities and their square roots that we need to be clear about Also the mean square deviation the sample variance, and the root mean square deviation. and the sample standard deviation.

The Sample Variance The rmsd is smaller than s ( because we are dividing by a larger number ). Correct to 3 s.f. we have e.g. Find the root mean square deviation, rmsd, and the sample standard deviation, s, for the following data: 1497x Use the Statistics function on your calculator and enter the data. Select the list of calculations. You will be able to find the following: Ignore the calculator notation.

The Sample Variance Squaring these gives ( variance ) ( mean square deviation ) The part of the formula,, is in your formulae booklet ( see correlation and regression ), labelled S xx. An expanded form of the expression is also given. All you have to do is divide by the correct quantity, n or n  1. Using the formulae: If summary data are given, you will need to use the formulae instead of the calculator functions.

The Sample Variance  The mean square deviation, msd, and sample variance, both measure the spread or variability in the data. SUMMARY  To find the msd or sample variance, we square the relevant quantity given by the calculator:  If we have raw data we use the stats functions on the calculator to find the rmsd or sample standard deviation. msd = ( rmsd ) 2 sample variance  s 2  For summary data, we use the formulae book, choosing the appropriate form: Then, we divide by n for the msd or ( n – 1 ) for s 2. The sample standard deviation is the larger of these quantities.

The Sample Variance e.g.1 For the following sample data, find (a) the root mean square deviation, rmsd, (b) the mean square deviation, msd, (c) the sample standard deviation, s, and (d) the sample variance s x Answer: Using the calculator functions, (a)(b) (c)(d)

The Sample Variance e.g.2 Given the following summary of data for a sample of size 5, find Solution: Using the formulae book, (a) the mean square deviation, msd, (b) the root mean square deviation, rmsd, (c) the sample variance s 2 (d) the sample standard deviation, s, and, msd = (a) (b) (c) (d) rmsd =

The Sample Variance The formula for the variance can be easily adapted to find the variance of frequency data. becomes Frequency Data As before, we only use the formulae if we are given summary data.

The Sample Variance e.g.1 Find the mean and sample standard deviation of the following data: 4853 Frequency, f 10521x Solution: So,

The Sample Variance e.g.2 Find the sample standard deviation of the following lengths: Frequency, f Length (cm) Solution: Standard deviation, s = We need the class mid-values ·5 We can now enter the values of x and f on our calculators. x Frequency, f

The Sample Variance  To find the root mean square deviation, rmsd, or the sample standard deviation, s, using the calculator functions, SUMMARY the values of x ( and f ) are entered and checked, the table of calculations gives both values, the variance is the square of the standard deviation. the larger value is the sample standard deviation, s, and this is the value that is most often used by statisticians,

The Sample Variance Outliers We’ve already seen that an outlier is a data item that lies well away from the other data. It may be a genuine observation or an error in the data. e.g. 1 Consider the following data: With this data set, we would immediately suspect an error. The value 81 was likely to have been 18. If so, there would be a large effect on the mean and standard deviation although the median would not be affected and there would be little effect on the IQR. The presence of possible outliers is an argument in favour of using median and IQR as measures of data.

The Sample Variance A 2 nd method used to identify outliers is to find points that are further than 2 standard deviations from the mean. The point 33 is more than 2 standard deviations above the mean so, using this measure, it is an outlier. In an earlier section, we met a method of identifying outliers using a measure of 1·5  IQR above or below the median. e.g. 2. Consider the following sample: The sample mean and sample standard deviation are : mean, standard deviation, s = So, and