Download presentation
Presentation is loading. Please wait.
Published byAileen Wells Modified over 8 years ago
1
Describing Data September 14, 2016
2
Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest speaker from DSSC (part of class) The following week, another speaker talking about Zotero.
3
Updates to assignments Updated LiPS assignment Still have to seven write-ups One must be either Fulong Wu (Monday evening Nov 14 th ) or Malo Hutson (Tuesday evening Sept. 20 th ) Assignment 2 posted to CourseWorks Due at the start of your lab in 2 weeks. Hand in a paper copy to your TA and post also to CourseWorks.
4
Today: Statistics Descriptive Describe and summarize our data to give insights Inferential Use statistics to make generalizations about a broader population
5
Types of Variables Categorical Nominal (not ranked) College major, type of property, color of car Ordinal (ordered or ranked) Useful for preferences, though no value assigned Dichotomous (two categories, not ranked) Yes/no Numerical Discrete (values are counts) Continuous (values are measures)
6
Variables Nominal Exclusive but not ordered or ranked Ordinal Ranked Interval Equally spaced variables
7
Nominal Examples Think of nominal scales as “labels” No quantitative value
8
Nominal Examples Think of nominal scales as “labels” No quantitative value
9
Nominal Examples Think of nominal scales as “labels” No quantitative value ColorCount Blue10 Black8 Red6 blue5 Purple3 Green2 Purple2 White2 BLUE1 Brown1 Burgundy1 Gray1 Pink1 Red1 Yellow1 nav1 orange1 purple1 red1 seafoam green1 turquoise1 white1
10
Nominal Examples Think of nominal scales as “labels” No quantitative value Other Examples: Gender Hair color Neighborhood When there are only two categories, we call this “dichotomous.” Examples – Heads/Tails, On/Off, Rural/Urban, In poverty / Not in poverty Q: What about gender? Is that a dichotomous variable?
11
Ordinal Ranked in order of values, but the difference between values is not always known Example: Educational attainment
12
Ordinal example: educational attainment
13
Interval Numerical scales where order of and differences between variables is known Examples: Money or income Height Weight
14
Likert items Allow people to respond according to some scale
15
Likert items Allow people to respond according to some scale Examples: Question: How frequently do you think you need to come to class to get a high pass? o Always o Often o Occasionally o Rarely o never
16
Likert items Allow people to respond according to some scale Examples: Question: I already know everything there is to know about “Planning Techniques” o Agree Strongly o Agree Slightly o Neutral o Disagree Slightly o Disagree Strongly
17
Likert items Allow people to respond according to some scale Examples – four point scale Question: I read emails from Nick Klein o Most of the time o Some of the time o Seldom o Never
18
Likert items Allow people to respond according to some scale Examples – four point scale Question: I read emails from Nick Klein o Most of the time – ALL OF THE TIME o Some of the time o Seldom o Never
19
Likert Scales What types of variables are these? How can we interpret them?
20
Descriptive stats
21
We need some data to describe
22
Lucky us!
23
What year were you born? 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990
24
Hard to make sense of this… 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990
25
We can use a “frequency table” Year bornFrequencyPercent 196012.00 198548.00 198712.00 198936.00 199036.00 1991612.00 19921224.00 1993918.00 19941020.00 199512.00
26
Let’s represent it another way, graphically
27
We can use a “dot plot” where each dot represents a response
28
This is similar to a histogram
29
But a histogram is more flexible
30
We can change the number of “bins”
31
And change the y-axis to a measure of “relative frequency” rather than a count.
32
Another approach is a “stem and leaf” 195. | 196. | 197. | 198. | 199. | 200. | The stem consists of the numbers with the last digit omitted. So for our years, this would mean ignore the year but keep the decade. So “1975” would become “197”
33
Another approach is a “stem and leaf” 195. | 196. | 0 197. | 198. | 55557999 199. | 00011111122222222222233333333344444444445 200. | Then add the final digits (the leaf or leaves) back in to the corresponding stem
34
Summary Statistics
35
Central Tendency and Spread Two of the most simple and most important measures
36
Central Tendency There are a number of measures of central tendency The most common are: Mean Median Mode Let’s focus on the first two
37
Mean
39
Median The median is the middle most value We can identify it by placing our data in order. Let’s use the same five values: 1985 1985 1992 1992 1992 The mean (1989.2) and median (1992) are often different. The median has a nice attribute in that it is generally not sensitive to outliers.
40
Median If there are two middle-most variables, we would take the average of the two middle values Let’s add our outlier (1960) to our data set and figure out the median: 1960 1985 1985 1992 1992 1992 The median is now (1985 + 1992) / 2 = 1988.5
41
Mean and Median Mean ● Easy to understand. It’s the average ● Affected by extreme high or low values (outliers) ● May not best characterize skewed distributions Median ● Not affected by outliers ● May better characterize skewed distributions
42
What about mode? Mode ● The most frequent value ● Less often used in social science
43
Mode ● The most frequent value ● Less often used in social science
44
Percentiles Imagine a chart will all the observable values in a population; it contains 100 percent of the possible values. The p th percentile is the value of a given distribution such that p% of the distribution is less than or equal to that value. Quartiles: The 25th, 50th, and 75th percentiles Quintiles: The 20th, 40th, 60th, and 80th are quintiles Deciles: 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th. The 50th percentile is the MEDIAN
45
10 th percentile=-1.2816 10 percent under curve (shaded red)
46
Basic descriptive statistics 25 th percentile=-0.67 25 percent under curve (shaded red)
47
Basic descriptive statistics 50 th percentile=0.00 50 percent under curve (shaded red)
48
75 th percentile=0.6745 75 percent under curve (shaded red)
49
Basic descriptive statistics 90 th percentile=1.2816 90 percent under curve (shaded red)
50
Percentiles from our data
51
50 th Percentile / the median value is 1992 25 th Percentile is 1991 75 th Percentile is 1993
52
Measures of Spread
53
How do we describe the different distributions?
54
Measures Range Interquartile range Index of dispersion Standard Deviation
55
Interquartile Range (IQR) The IQR is a simple measure of spread: It is the difference between 25 th and 75 th percentile values. The IQR tells us about the spread from the median
56
Interquartile Range (IQR) 50 th Percentile / the median value is 1992 25 th Percentile is 1991 75 th Percentile is 1993
57
Boxplots
58
Standard Deviation Often, we will use and talk about st. dev. Represented by sigma : σ The st. dev tells us about the spread from the mean (The IQR tells us about the spread form the median)
59
Standard Deviation
66
But the st. dev. is really useful. If we have normally distributed data, We can expect 68% is within 1 st. dev. And 95% is within 2.
67
Other ways to describe spread
68
Skewness and Symmetry
71
Why might data be skewed? Why might data be bimodal?
72
Skewed data example: Family Income
73
Q: Guess the mean
74
$71,840
75
Q: Guess the mean $71,840
76
Q: Guess the mean $71,840 Q: Guess the median
77
Q: Guess the mean $71,840 Q: Guess the median $55,000
78
Interpreting Tables
79
Elements of a Table Title describes content Sample size presented Actual and percentage shares presented
80
Assumptions stated Source of calculations stated
81
Interpreting Tables From Manski (2014) Death penalty moratorium was lifted in U.S. is 1976 Three ways to interpret data presented
82
Interpreting Tables 1)“Before and after” Average effect of death penalty is -.6 (calculated as 9.7-10.3)
83
Interpreting Tables 2) Compare treated and untreated Assumes all else equal, e.g. propensity to kill is the same everywhere Average effect in 1977 is 2.8 (=9.7-6.9)
84
Interpreting Tables 3) Difference in difference Changes in effects over time to account for policy changes Treated states declined from 10.3 to 9.7 = -.6 Untreated states declined from 8.0 to 6.9 = 1.1 Effect =.5 = [(9.7-10.3)-(6.9-8.0)]
85
Interpreting Tables Before and after shows reduced homicide rates Comparison of treated and untreated shows increase in rate to 2.8 Difference in difference shows increase in rate to.5 per 100,000 Explanations?
86
Presenting Data Tables Charts Graphs
87
Problems with Pie Charts No sample size Similarly sized pies suggest all groups are equal and all response rates are about the same Were yes/no the only options? What are “enough transportation options”?
88
When Pie Charts Are Appropriate
89
Bar Chart
91
Measures of association
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.