Measures of Location Statistics of location Statistics of dispersion

Slides:



Advertisements
Similar presentations
Brought to you by Tutorial Support Services The Math Center.
Advertisements

Calculating & Reporting Healthcare Statistics
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
PSY 307 – Statistics for the Behavioral Sciences
Variability Measures of spread of scores range: highest - lowest standard deviation: average difference from mean variance: average squared difference.
Descriptive Statistics
Biostatistics Unit 2 Descriptive Biostatistics 1.
1 Chapter 4: Variability. 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure.
Descriptive Statistics: Overview Measures of Center Mode Median Mean * Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance.
Central Tendency and Variability
Measures of Central Tendency
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Descriptive Statistics Anwar Ahmad. Central Tendency- Measure of location Measures descriptive of a typical or representative value in a group of observations.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 3 Descriptive Measures
Lecture 3 A Brief Review of Some Important Statistical Concepts.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Measures of Dispersion
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Measures of Location INFERENTIAL STATISTICS & DESCRIPTIVE STATISTICS Statistics of location Statistics of dispersion Summarise a central pointSummarises.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Introduction to statistics I Sophia King Rm. P24 HWB
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
CHAPTER 2: Basic Summary Statistics
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Averages and Variability
A.M EASURES OF LOCATION A.M EASURES OF LOCATION B.M EASURES OF SPREAD Central tendency and measures of dispersion &
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
An Introduction to Statistics
Statistics in Forensics
Descriptive Statistics ( )
Descriptive Statistics
Descriptive Statistics: Overview
Distribution of the Sample Means
Chapter 6 ENGR 201: Statistics for Engineers
Central Tendency and Variability
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Statistics Review (Appendix A) Bring all three text books Bring index cards Chalk? White-board.
Numerical Measures: Centrality and Variability
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Numerical Descriptive Measures
Central tendency and spread
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Numerical Descriptive Measures
Summary descriptive statistics: means and standard deviations:
Chapter 3: Central Tendency
CHAPTER 2: Basic Summary Statistics
Chapter 3: Central Tendency
Numerical Descriptive Measures
Presentation transcript:

Measures of Location Statistics of location Statistics of dispersion INFERENTIAL STATISTICS & DESCRIPTIVE STATISTICS Statistics of location Statistics of dispersion When a set of data has been collected, the first thing we will want to do is to summarise that data. This can be done with frequency distributions, as we discussed in the previous chapter on data types. However, we often want a numerical summary of the data. These data are referred to as descriptive stats, and they are divided into two categories: stats of location, and stats of dispersion. Stats of location summarise the central point of the data along a number line, and stats of dispersion summarise how the observations are distributed about that central point. You will remember that we said previously that we use descriptive stats to summarise the important characteristics of a data set, and inferential stats to generalise about a greater population from that which we observe in a smaller sample of that population. In this chapter therefore, we will discuss a few different measures of location, or central tendency, as they are sometimes known, and we will also look at the ways in which data are dispersed around the measures of location or central tendency. Summarise a central point Summarises distribution around central point

Measures of Location ARITHMETIC MEAN Sum all observation, then divide by number of observations For a sample: For a population: The first of the measures of location we will examine is the arithmetic mean, known to lay people as simply “the average”. This represents the centre of the observations in a sample frequency distribution. Calculating the mean is very simple. You simply sum, or add, all the observations, and then divide by the number of observations. If X is the letter we use to denote our sample variable, then X with a bar over it would represent the sample mean of all of our sample observations. Remember that we use different notation when talking about the sample, than we do when talking about a population. We use greek letters for population parameters, and arabic letters for sample statistics. The sample mean therefore, is designated as “x”, and the population mean as mu. Since calculating the mean is so simple, and because it has other properties that are useful when it comes to inferential stats, it is the most commonly reported statistic of location. One problem with the mean though, is that extreme values will greatly influence its value.

Measures of Location X=7.07 No. of People Nightly Hours of Sleep 2 4 6 2 4 6 8 10 12 14 16 18 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 Nightly Hours of Sleep No. of People X=7.07 In the example we have here, we have added all the observation values, in other words, the number of hours each person slept, and have divided by the number of people involved in the sample, to get an average of 7.07 hours of sleep per night, for this particular sample group.

Measures of Location MEDIAN MEDIAN 1 2 3 4 5 33 34 35 36 37 38 39 40 Score Frequency For N = 15 the median is the eighth score = 37 Value that has equal no. of observations (n) on either side The second measure we will examine is called the median. This is defined as the value that has an equal number of observations on either side of it. It divides the frequency distribution in half, relative to the number of observations.

Measures of Location MEDIAN 1 2 3 4 5 33 34 35 36 37 38 39 40 Score Frequency For N = 16 the median is the average of the eighth and ninth scores = 37.5 Value that has equal no. of observations (n) on either side If there are an even number of observations, then there is no one observation that fits the criterion of having an equal number of observations larger as there are smaller. In this case, the value must be calculated by averaging the middle two observations. In the example we have here, the average of the eighth and ninth observations was calculated. You will also note that unlike the mean, the median is unaffected by a few very large or very small, values.

Measures of Location MODE the most frequently occurring score value corresponds to the highest point on the frequency distribution For a given sample N=16: 33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45 The mode = 39 1 2 3 4 5 33 34 35 36 37 38 39 40 41 42 43 44 45 Score Frequency The last measure of location that we will examine is the mode.This is simply the most common observation in the data. If there are two most common values then the distribution is said to be bimodal and it has two specific peaks. The mode is not often used as it contains very little useable information and because of that, you seldom see it reported in scientific literature, although it is often interesting to report the number of modes detected in a population or sample, if there is more than one.

Measures of Location Measures of central tendency Summary Advantages Disadvantages Mode quick & easy to compute useful for nominal data poor sampling stability Median not affected by extreme scores somewhat poor sampling stability Mean sampling stability related to variance inappropriate for discrete data affected by skewed distributions The mean is by far the most commonly reported statistic. It is easy to work with mathematically, but the disadvantages are that it is greatly affected by extreme values. The median is less commonly reported, and the advantage is that it is not greatly affected by outliers. The mode is rarely used since it does not convey much information about the set of data.

Measures of Location DISPERSION These are measures of how the observations are distributed around the mean Besides the measures of location, such as the mean, of a sample, there is also a way in which to measure how the observations are distributed around the mean. In other words, we want to know whether most of the observations lie close to the mean, or are they distributed far from the mean? This characteristic is called the dispersion of the population, and there are several ways in which it can be calculated. As usual, most of the time we will be dealing with a sample only of the population, and so what we will be calculating will be a sample statistic that estimates the population parameter that is actually the measure of dispersion.

Measures of Location DISPERSION: Range The first of these measures that we will examine, is called the range. This is simply the lowest value subtracted from the highest value. It is then obviously greatly affected by the any outliers, and gives very little specific information about how the observations cluster around the mean. It is therefore a poor estimator of the population, and is therefore seldom used. If it is reported, it should be reported together with other measures of dispersion.

Measures of Location DISPERSION: Variance mean = 50 Score Deviation Amy 10 -40 Theo 20 -30 Max 30 -20 Henry 40 -10 Leticia 50 Charlotte 60 Pedro 70 Tricia 80 Lulu 90 SUM mean = 50 To see how ‘deviant’ the distribution is relative to another, we could sum these scores But this would leave us with a big fat zero The second of these measures that we will examine is the variance. This measure describes the dispersion of the data about an estimate of central tendency such as the mean. If the data points are all close to the mean, then variability is low. If data points are dispersed widely around the mean, then variability is high. If we want an estimate of dispersion about the mean, the first thing to do is to take each data point in the sample, and subtract the mean. This quantifies the distance of each point from the mean. To get an over-all picture of variability, we could sum these values, or scores. Unfortunately, this adds up to zero.

Measures of Location DISPERSION: SS= ∑(X-X)2 Variance Score Deviation Sq. of deviation Amy 10 -40 1600 Theo 20 -30 900 Max 30 -20 400 Henry 40 -10 100 Leticia 50 Charlotte 60 Pedro 70 Tricia 80 Lulu 90 SUM 6000 So we use squared deviations from the mean, which are then summed This is the sum of squares (SS) In order to calculate the variance therefore, we must first calculate the squared deviations of each observation from the mean and then we must sum these values. This becomes then, the sample sum of squares, which we commonly abbreviate to SS. This is a very important term, and will be used often. This is then our first estimate of variability. SS= ∑(X-X)2

Measures of Location DISPERSION: Variance For a sample: (to correct for the fact that sample variance tends to underestimate pop variance) Next, we will divide the sample sum of squares by the sample size minus one, in order to get the sample variance, which we denote as s squared. If we wanted the population variance, we would divide the population sum of squares by the size of the population, and this is denoted by sigma squared. This is often impossible though, and the best estimate of the population variance is to take the sample SS and divide by the sample size minus one, as we’ve already described. The variance is also referred to as the “mean square”. Dividing the sample SS by sample size minus one yields an unbiased estimator of the population variance, and the term (n-1) is called the degrees of freedom. As the sum of squares can vary from zero to infinity, the variance itself can vary from zero to infinity. You can never have a variance with a negative value. For a population: We take the “average” squared deviation from the mean and call it VARIANCE

Measures of Location DISPERSION: Standard deviation The standard deviation is the square root of the variance The standard deviation measures spread in the original units of measurement, while the variance does so in units squared. Variance is good for inferential stats. Standard deviation is nice for descriptive stats. The sample variance is an excellent estimate of variability, but it has the square of the original units of the data, which can be difficult to interpret. For example, if you have data in grams, the variance has the unit “square grams”, and who knows what that is? The solution here is simple – just take the square root of the sample variance, and this we call the standard deviation. Since the sample variance is s2, the standard deviation is symbolised as s.

Measures of Location DISPERSION N = 28 X = 50 s2 = 140.74 s = 11.86 2 4 6 8 10 12 14 20 30 40 50 60 70 80 90 100 Scores # of People N = 28 X = 50 s2 = 140.74 s = 11.86 s2 = 555.55 s = 23.57 Here we have two sets of data, with the distribution represented graphically as bar charts. They each have the same number of observations, and the same sample mean, but the distribution of the data in each of the data sets is clearly different, and one would not know that by looking at the sample mean only. But by calculating the sample variance, and then also the standard deviation, we see immediately that the two sets of data differ in terms of dispersion.

Measures of Location DISPERSION Mean Variance Standard Deviation For a sample: For a population: Remember that the SS, Variance, and Standard Deviation quantities are all statistics – they are estimates of population parameters. We generally use the formulas here for samples when working with data, because we are generally working at the sample level, seldom, if ever, at the population level. We do however, need to be aware of the formulas for the population parameters.

Measures of Location DISPERSION s = n The Standard Error, or Standard Error of the Mean, is an estimate of the standard deviation of the sampling distribution of means, based on the data from one or more random samples e.g. 15 students each compile data sets of the heights of 20 people Numerically, it is equal to the square root of the quantity obtained when s squared is divided by the size of the sample. Uptil now, we have discussed the frequency distribution, or dispersion, of values of data. However, we are also often concerned about the frequency distribution, or dispersion of statistics. For example, suppose every person in this class went outside, stopped 20 random people and asked how tall they were. If there were 15 people in this class, and they now each had a sample of n=20, there would be 20 x 15 = 300 data points. We could construct a frequency distribution of the values of 300 data points. Or we could have each person in the class calculate the mean of their particular set of data points. So we would have 15 different means, and we could prepare a frequency distribution of the values of the means – in other words, how often did the mean occur? Since the mean is a statistic, this would be a frequency distribution, or dispersion of the values of a sample statistic. Since this is a mouthful, we have a much shorter term that we use: sampling distribution. A sampling distribution is therefore a frequency distribution of the values of a sample statistic. Next, we could calculate the standard deviation of our 15 means. We would use the formula we have used uptil now, except that our mean would be a “mean of the means”, and each x value would be one of the 15 means (standard deviation symbolized as s subscript x) The standard deviation of the values of a statistic is called the standard error. In our particular example, we have calculated the the standard error of the mean. and n X = s