Descriptive Statistics II: Measures of Dispersion.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Dispersion
Descriptive Statistics Renan Levine. Frequency Table One can easily display all of the responses to survey questions in a frequency table. Ipsos-Reid.
Descriptive Statistics
Measures of Dispersion
Statistics [0,I/2] The Essential Mathematics. Two Forms of Statistics Descriptive Statistics What is physically happening within the data? Inferential.
Measures of Dispersion or Measures of Variability
Calculating & Reporting Healthcare Statistics
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Descriptive Statistics
Intro to Descriptive Statistics
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Data observation and Descriptive Statistics
STANDARD SCORES AND THE NORMAL DISTRIBUTION
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
Today: Central Tendency & Dispersion
Describing Data: Numerical
Inferential Statistics
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Part II Sigma Freud & Descriptive Statistics
Chapter 3 – Descriptive Statistics
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Table of Contents 1. Standard Deviation
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
By: Amani Albraikan 1. 2  Synonym for variability  Often called “spread” or “scatter”  Indicator of consistency among a data set  Indicates how close.
Describing distributions with numbers
Page 1 Chapter 3 Variability. Page 2 Central tendency tells us about the similarity between scores Variability tells us about the differences between.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Measures of Dispersion
Measures of Dispersion How far the data is spread out.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Categorical vs. Quantitative…
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
1 Descriptive statistics: Measures of dispersion Mary Christopoulou Practical Psychology 1 Lecture 3.
Numerical Measures of Variability
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
To be given to you next time: Short Project, What do students drive? AP Problems.
Central Tendency & Dispersion
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Summary Statistics: Measures of Location and Dispersion.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Day 1 Quantitative Methods for Investment Management by Binam Ghimire.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Descriptive Statistics
Description of Data (Summary and Variability measures)
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Measures of Dispersion
Presentation transcript:

Descriptive Statistics II: Measures of Dispersion

How typical is ‘typical’? Mean, Median and Mode are measures designed to convey to the reader the “typical” observation. Often useful for the reader to know just how typical the “typical” observation is!  If most of the observations fall in the mode, than the mode is very “typical”  If most of the observations are close to the mean or median observation than the mean/median is very “typical” or indicative of the distribution.

Measures of Dispersion Measures of dispersion give us an idea of how representative the measures of central tendency (mode, median, mean) are of the entire distribution. The idea is that the more the data is dispersed – or spread out – from the central measure (mode, median or mean), the less indicative the central measure. In other words, a high measure of dispersion tells us that the mode/median/mean is not very typical and many observations are quite different!

Measures of Dispersion Measures of dispersion – how dispersed are the observations.  Variation ratio  Range  Interquartile Range  Variance & Standard Deviation Skewness, Kurtosis

Nominal: Variation Ratio For nominal variables, the variation ratio is the percentage of cases which are not the mode.  =1-(number of observations in the mode) / total number of observations Infrequently used since the variation ratio really does not tell the reader anything that the mode does not already tell the reader.

Example: Variation Ratio and Mode Canadian Election Study, MBS_B1: Please circle the number that best reflects your opinion. The government should: 1. See to it that everyone has a decent standard of living……1090 (65.7%) = Mode 2. Leave people to get ahead on their own… 384 (23.1%) 8. Not sure (11.1%) Variation Ratio = 34.2% Note: Unweighted responses are not reflective of the population.

Range Minimum value to maximum value  Useful when you want to know all the possible responses, for an aggregate policy measures like GDP or other interval/ratio data.  Not very useful for closed-ended survey responses. In example above, range of real GDP is $338 to $48,589.  What does the range tell us about the mean of $9,089 or the median of $5,194?

Percentiles, Quantiles and Quartiles By ordering the values in the distribution, one can classify observations by where they are in the distribution. Percentiles divide the distribution into 100 equal parts.  Lowest values are in the 1 st percentile, largest values are in the 99 th or 100 th percentile.  The median is the 50 th percentile. Quantiles divide the distribution into 10 equal parts. Quartiles divide the distribution into 4 equal parts.  1 st Quartile = 25 th Percentile, 2 nd Quartile = Median, 3 rd Quartile = 75 th Percentile  This matters because quartiles provide us with a measure of dispersion…

Interquartile Range For closed-ended survey responses, like rating the Conservative Party, finding the interquartile range (or IQR) between the observation value at the 25 th percentile and the observation value at the 75 th percentile provides more useful information than the full range. IQR measures the range of the middle half of all observations.  A high IQR relative to the range tells the reader that there are many observations far from the median.  A low IQR relative to the range tells the reader that at least half of all observations are very close to the median.

Calculating the interquartile range Order all of the responses. Identify the observation at the 25 th percentile.  Recall: 50 th percentile = median.  Take the value of this observation. Identify the observation at the 75 th percentile.  Take the value of this observation. Interquartile range= difference between the value of the observation at the 25 th percentile and the observation at the 75 th percentile.

Ex: Finding 25 th and 75 th Percentile FrequencyPercentCum. % Strongly dislike Strongly like Total1, Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight] 25 th percentile Median (50 th Percentile)= 5 75 th percentile

Ex: Calculating the IQR FrequencyPercentCum. % Strongly dislike Strongly like Total1, Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight] 25 th percentile value 75 th percentile value 72 IQR = 7 – 2 = 5

Interpreting IQR The interquartile range for opinions of the Conservative Party (2008) was 5. An IQR of 5 (with a range of 11) tells us that most observations fall into a relatively narrow range of values.  There are few observations with extremely low or extremely high opinions of the Conservative Party.

Interpreting IQR: Real GDP Example In example above, range of real GDP ran between $338 to $48,589 = $48,251  The median was $5,194.  The value of the observation at the 25 th percentile is $2,018.  The value of the observation at the 75 th percentile is $13,532.  The interquartile range is $13,532 - $2,018= $11,514. This tells us that half of all observations are in a relatively narrow range since 11,000 is much smaller than 48,000.  Most countries are nowhere near as rich as the richest countries…

Interpreting IQR: % Pop on $2/day Value of observation at 25 th Percentile = 13.1% Value of observation at 75 th Percentile = 73.9% What is the interquartile range? Since the range was between 2% and 96.6%, what does the interquartile range tell us?

Variance Rather than relying on the location of the value, variance measures dispersion by calculating how far observations are from the mean. Variance = Average of the distance from the mean of each observation (squared).  High variance means that many/most observations are far from the mean but could be heavily influenced by outliers.  Low variance means that many/most observations are close to the mean.

Formula: Variance

Standard deviation Takes square root of variance to put measure in the same unit as the observations.  Example: The average rating of the Conservatives is 4.8 and the standard deviation is 2.8.  This tells us that the average amount that the ratings differ from the mean is 2.8 points on the 11 point scale used to measure feeling towards the Conservative Party. In contrast, the variance is 8.0, which can be interpreted as 8 squared points on the 11 point scale. This explanation is confusing and has little intuitive power.

Formula: Standard Deviation

Deviation qualities and normality If observations are normally distributed (a common statistical assumption), 2/3 observations are within 1 standard deviation of the mean. In a “normal” distribution, the observations are distributed symmetrically around the mean, with the same number of observations above the mean as there are below the mean.  Mean=Median

Skewness If observations are symmetric around the mean there are as many observations less than the mean than there are observations greater than the mean Skewness measures the extent to which the observations are asymmetric.  In other words, skewness tells us whether there are many more observations above or below the mean.  Except skew does not count the observations, skewness considers the values of the observations.  Like mean, skew is sensitive to extreme values.

Skewness Implications Skewness could have normative implications for policy outcomes and public opinion. Some bi- and multivariate analyses become more complicated with a skewed distribution.

Interpreting Skewness Negative skew= most of the observation values are above the mean.  Usually this means that most of the observations (including the median) are below the mean. Positive skew= most of the observation values are below the mean.  Usually this means that most of the observations (including the median) are below the mean. Skew values close to zero mean that the distribution is nearly symmetrical.

Skew Caution When the mean is less than the median, the skew is usually negative. When the mean is greater than the median, the skew is usually positive. Usually does not mean ALWAYS! If it did, why bother looking at the skew??? Skew may not follow these general tendencies when:  Distribution is multimodal.  When the range to either side of the median is much shorter than the other. Where one tail is long but the other is heavy When variable is discrete (which is usually the case in social statistics)

Conservative Party Skew Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight] Mean = 4.8 Median = 5 Mean = 4.8 Median = 5 Are more observations above or below the mean?

Conservative Party Skew Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight] Mean < Median More above; therefore skew is negative!

Real GDP Skew Source: Gleditsch, K. S via Quality of Government (QoG) v6, April 2011 Mean = $9, Median = $5, Here, mean > median & skew is positive (1.35).

Kurtosis Kurtosis measures how tall or flat is the distribution of the variable. Even with the same variance, some distributions will have more observations in a tall peak near the mean and then be more spread out than a distribution with the observations more concentrated in a shallower, broad peak near the mean.  Rarely used in social science.

Kurtosis – Illustrated Relative to a ‘normal’ mesokurtic distribution (kurtosis=0)  Positive kurtosis (“leptokurtic”) means that the observations have tall peak near the mean.  Negative kurtosis values (“platykurtic” – sounds like ‘flat’) means that the observations are very spread apart with a broad, shallow peak.

Using Descriptive Statistics to Make Comparisons

Compare distributions Responses are opinions of Canadian adults.

How would you? Describe the opinions portrayed in the previous slide. What would you say? It may not be very easy.  There is no clear, standard or normal way to make the descriptions. This is where descriptive statistics proves its use.  It is possible to discuss the overall distribution, the mode, any apparent differences.

How much have these institutions done to help resolve the conflict in Lebanon? MeanStd. DevSkewness U. N U. S. A E. U Scale: 1 = A lot 2 = A little 3 = Not very much 4 = Nothing at all Note: the median = 3 for all three variables

Comparing Medians The median for all these variables is three, indicating that:  Most Canadians think that the UN, EU and UN are not doing much or nothing at all  There are NOT large differences in opinion between variables. But there are some differences, and the table clearly indicates what those differences are in a concise manner.

Comparing Means MeanStd. DevSkewness U. N U. S. A E. U Scale: 1 = A lot 2 = A little 3 = Not very much 4 = Nothing at all The mean for the U.N. is lower than the mean response for USA and EU, telling us that Canadians thought that the U.N. was doing [slightly] more to resolve the conflict than the EU and the USA. The low UN mean is sensitive to the relatively high number of respondents who said the UN was doing “a lot.”

Comparing dispersion MeanStd. DevSkewness U. N U. S. A E. U Scale: 1 = A lot 2 = A little 3 = Not very much 4 = Nothing at all The standard deviation is about the same, indicating that the dispersion of opinion is about the same.

How much have these institutions done to help resolve the conflict in Lebanon? MeanStd. DevSkewness U. N U. S. A E. U Scale: 1 = A lot 2 = A little 3 = Not very much 4 = Nothing at all All three variables skew negative, indicating that more opinions are “above” the mean. With the scale used for this variable, this means that more than half of all respondents thought that the UN, US and EU were doing “not very much” or “nothing at all.” In particular, the U.S.A., was seen by many as not doing very much. Can you see this in the chart?

Comparing attitudes towards the federal parties MeanMedianStd. DevIQRSkew Conservative Liberal NDP Greens Bloc Quebecois Which party, on average, was the most popular in 2008? Least popular?  Is one party much more or much less popular than the others? Source: Canadian Election Study, 2008, CES_MBS_I10a-e [National Weight]

Comparing attitudes towards the federal parties MeanMedianStd. DevIQRSkew Conservative Liberal NDP Towards which of the three largest parties is the widest range of feelings? Narrowest?  From this table, could you conclude that most Canadians feel much the same way about one party?  Do Canadians seem badly divided about any party? Source: Canadian Election Study, 2008, CES_MBS_I10a-e [National Weight]