Download presentation
Presentation is loading. Please wait.
Published byCharleen Joseph Modified over 9 years ago
1
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission
2
Announcements First math problem set will be handed out in Lab on Monday… Due September 20 Today’s Class: The Mean (and relevant mathematical notation) Measures of Dispersion
3
Review: Variables / Notation Each column of a dataset is considered a variable We’ll refer to a column generically as “Y” Person# Guns owned 10 23 30 41 51 The variable “Y” Note: The total number of cases in the dataset is referred to as “N”. Here, N=5.
4
Equation of Mean: Notation Each case can be identified a subscript Y i represents “ith” case of variable Y i goes from 1 to N Y 1 = value of Y for first case in spreadsheet Y 2 = value for second case, etc. Y N = value for last case Person# Guns owned (Y) 1Y 1 = 0 2Y 2 = 3 3Y 3 = 0 4Y 4 = 1 5Y 5 = 1
5
Calculating the Mean Equation: 1. Mean of variable Y represented by Y with a line on top – called “Y-bar” 2. Equals sign means equals: “is calculated by the following…” 3. N refers to the total number of cases for which there is data Summation ( ) – will be explained next…
6
Equation of Mean: Summation Sigma (Σ): Summation –Indicates that you should add up a series of numbers The thing on the right is the item to be added repeatedly The things on top and bottom tell you how many times to add up Y-sub-i… AND what numbers to substitute for i.
7
Equation of Mean: Summation 1. Start with bottom: i = 1. –The first number to add is Y-sub-1 2. Then, allow i to increase by 1 –The second number to add is i = 2, then i = 3 3. Keep adding numbers until i = N –In this case N=5, so stop at 5
8
Equation of the Mean: Example 2 Can you calculate mean for gun ownership? Person# Guns owned (Y) 1Y 1 = 0 2Y 2 = 3 3Y 3 = 0 4Y 4 = 1 5Y 5 = 1 Answer:
9
Properties of the Mean The mean takes into account the value of every case to determine what is “typical” –In contrast to the the mode & median –Probably the most commonly used measure of “central tendency” But, it is often good to look at median & mode also! Disadvantages –Every case influences outcome… even unusual ones –Extreme cases affect results a lot –The mean doesn’t give you any information on the shape of the distribution Cases could be very spread out, or very tightly clustered
10
The Mean and Extreme Values CaseNum CD’sNum CD’s2 120 240 300 4701000 Mean32.5265 Extreme values affect the mean a lot: Changing this one case really affects the mean a lot
11
Example 1 And, very different groups can have the same mean:
12
Example 2
13
Example 3
14
Interpreting Dispersion Question: What are possible social interpretations of the different distributions (all with the same mean)? Example 1: Individuals cluster around 100 Example 2: Individuals distributed sporadically over range 0-200 Example 3: Individuals in two groups – near zero and near 200
15
Measures of Dispersion Remember: Goal is to understand your variable… Center of the distribution is only part of the story Important issue: How “spread out” are the cases around the mean? –How “dispersed”, “varied” are your cases? –Are most cases like the “typical” case? Or not?
16
Measures of Dispersion Some measures of dispersion: 1. Range –Also related: Minimum and Maximum 2. Average Absolute deviation 3. Variance 4. Standard deviation
17
Minimum and Maximum Minimum: the lowest value of a variable represented in your data Maximum: the highest value of a variable represented in your data Example: In previous histograms about number of CDs owned, the minimum was 0, the maximum was 200.
18
The Range The Range is calculated as the maximum minus the minimum –In case of CD ownership, 200 - 0 = 200 Advantage: –Easy Disadvantage: –1. Easily influenced by extreme values… may not be representative –2. Doesn’t tell you anything about the middle cases
19
The Idea of Deviation Deviation: How much a particular case differs from the mean of all cases Deviation of zero indicates the case has the same value as the mean of all cases –Positive deviation: case has higher value than mean –Negative deviation: case has lower value than mean Extreme positive/negative indicates cases further from mean.
20
Deviation of a Case Formula: Literally, it is the distance from the mean (Y-bar)
21
Deviation Example CaseNum CD’sDeviation from mean (32.5) 120-12.5 2407.5 30-32.5 47037.5
22
Turning the Deviation into a Useful Measure of Dispersion Idea #1: Add it all up –The sum of deviation for all cases: What is sum of the following? -12.5, 7.5, -32.5, 37.5 Problem: Sum of deviation is always zero –Because mean is the exact center of all cases –Cases equally deviate positively and negatively –Conclusion: You can’t measure dispersion this way
23
Turning the Deviation into a Useful Measure of Dispersion Idea #2: Sum up “absolute value” of deviation –Absolute value makes negative values positive –Designated by vertical bars: What is sum? -12.5, 7.5, -32.5, 37.5 Answer: 90 –These 4 cases deviate by 90 cds from the mean Problem: Sum of Absolute Deviation grows larger if you have more cases… –Doesn’t allow comparison across samples
24
Turning the Deviation into a Useful Measure of Dispersion Idea #3: The Average Absolute Deviation –Calculate the sum, divide by total N of cases –Gives the deviation of the average case Formula:
25
Turning the Deviation into a Useful Measure of Dispersion Digression: Here we have used the mean to determine “typical” size of case deviations –Originally, I introduce the mean as a way to analyze actual case values (e.g. # of CDs owned) –Now: Instead of looking at typical case values, we want to know what sort of deviation is typical In other words a statistic, the mean, is being used to analyze another statistic – a deviation –This is a general principle that we will use often: statistics can help us understand our raw data and also further summarize our statistical calculations!
26
Average Absolute Deviation Example: Total Deviation = 90, N=4 –What is Average absolute deviation? –Answer: 22.5 Advantages –Very intuitive interpretation: Tells you how much cases differ from the mean, on average Disadvantages –Has non-ideal properties, according to statisticians
27
Turning the Deviation into a Useful Measure of Dispersion Idea #4: Square the deviation to avoid problem of negative values –Sum of “squared” deviation –Divide by “N-1” (instead of N) to get the average Result: The “variance”:
28
Calculating the Variance 1 CaseNum CD’s (Y) 120 240 30 470
29
Calculating the Variance 2 CaseNum CD’s (Y) Mean (Y bar) 12032.5 24032.5 30 47032.5
30
Calculating the Variance 3 CaseNum CD’s (Y) Mean (Y bar) Deviation (d) 12032.5-12.5 24032.57.5 3032.5-32.5 47032.537.5
31
Calculating the Variance 4 CaseNum CD’s (Y) Mean (Y bar) Deviation (d) Squared Deviation (d 2 ) 12032.5-12.5150 24032.57.556.25 3032.5-32.51056.25 47032.537.51406.25
32
Calculating the Variance 5 Variance = Average of “squared deviation” –Average = mean = sum up, divide by N –In this case, use N-1 Sum of 150 + 56.25 + 1056.26 + 1406.25 = 2668.75 Divide by N-1 –N-1 = 4-1 = 3 Compute variance: 2668.75 / 3 = 889.6 = variance = s 2
33
The Variance Properties of the variance –Zero if all points cluster exactly on the mean –Increases the further points lie from the mean –Comparable across samples of different size Advantages –1. Provides a good measure of dispersion –2. Better mathematical characteristics than the AAD Disadvantages: –1. Not as easy to interpret as AAD –2. Values get large, due to “squaring”
34
Turning the Deviation into a Useful Measure of Dispersion Idea #5: Take square root of Variance to shrink it back down Result: Standard Deviation –Denoted by lower-case s –Most commonly used measure of dispersion Formula:
35
Calculating the Standard Deviation Simply take the square root of the variance Example: –Variance = 889.6 –Square root of 889.6 = 29.8 Properties: –Similar to Variance –Zero for perfectly concentrated distribution –Grows larger if cases are spread further from the mean –Comparable across different sample sizes
36
Example 1: s = 21.72
37
Example 2: s = 67.62
38
Example 3: s = 102.15
39
Thinking About Dispersion Suppose we observe that the standard deviation of wealth is greater in the U.S. than in Sweden… –What can we conclude about the two countries? Guess which group has a higher standard deviation for income: Men or Women? Why? The standard deviation of a stock’s price is sometimes considered a measure of “risk”. Why? Suppose we polled people on two political issues and the S.D. was much higher for one What are some possible interpretations? What are some other examples where the deviation would provide useful information?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.