Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest.

Describing Data September 14, 2016

Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest speaker from DSSC (part of class) The following week, another speaker talking about Zotero.

Updates to assignments Updated LiPS assignment Still have to seven write-ups One must be either Fulong Wu (Monday evening Nov 14 th ) or Malo Hutson (Tuesday evening Sept. 20 th ) Assignment 2 posted to CourseWorks Due at the start of your lab in 2 weeks. Hand in a paper copy to your TA and post also to CourseWorks.

Today: Statistics Descriptive Describe and summarize our data to give insights Inferential Use statistics to make generalizations about a broader population

Types of Variables Categorical Nominal (not ranked) College major, type of property, color of car Ordinal (ordered or ranked) Useful for preferences, though no value assigned Dichotomous (two categories, not ranked) Yes/no Numerical Discrete (values are counts) Continuous (values are measures)

Variables Nominal Exclusive but not ordered or ranked Ordinal Ranked Interval Equally spaced variables

Nominal Examples Think of nominal scales as “labels” No quantitative value

Nominal Examples Think of nominal scales as “labels” No quantitative value ColorCount Blue10 Black8 Red6 blue5 Purple3 Green2 Purple2 White2 BLUE1 Brown1 Burgundy1 Gray1 Pink1 Red1 Yellow1 nav1 orange1 purple1 red1 seafoam green1 turquoise1 white1

Nominal Examples Think of nominal scales as “labels” No quantitative value Other Examples: Gender Hair color Neighborhood When there are only two categories, we call this “dichotomous.” Examples – Heads/Tails, On/Off, Rural/Urban, In poverty / Not in poverty Q: What about gender? Is that a dichotomous variable?

Ordinal Ranked in order of values, but the difference between values is not always known Example: Educational attainment

Ordinal example: educational attainment

Interval Numerical scales where order of and differences between variables is known Examples: Money or income Height Weight

Likert items Allow people to respond according to some scale

Likert items Allow people to respond according to some scale Examples: Question: How frequently do you think you need to come to class to get a high pass? o Always o Often o Occasionally o Rarely o never

Likert items Allow people to respond according to some scale Examples: Question: I already know everything there is to know about “Planning Techniques” o Agree Strongly o Agree Slightly o Neutral o Disagree Slightly o Disagree Strongly

Likert items Allow people to respond according to some scale Examples – four point scale Question: I read emails from Nick Klein o Most of the time o Some of the time o Seldom o Never

Likert items Allow people to respond according to some scale Examples – four point scale Question: I read emails from Nick Klein o Most of the time – ALL OF THE TIME o Some of the time o Seldom o Never

Likert Scales What types of variables are these? How can we interpret them?

Descriptive stats

We need some data to describe

Lucky us!

What year were you born? 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990

Hard to make sense of this… 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990

We can use a “frequency table” Year bornFrequencyPercent 196012.00 198548.00 198712.00 198936.00 199036.00 1991612.00 19921224.00 1993918.00 19941020.00 199512.00

Let’s represent it another way, graphically

We can use a “dot plot” where each dot represents a response

This is similar to a histogram

But a histogram is more flexible

We can change the number of “bins”

And change the y-axis to a measure of “relative frequency” rather than a count.

Another approach is a “stem and leaf” 195. | 196. | 197. | 198. | 199. | 200. | The stem consists of the numbers with the last digit omitted. So for our years, this would mean ignore the year but keep the decade. So “1975” would become “197”

Another approach is a “stem and leaf” 195. | 196. | 0 197. | 198. | 55557999 199. | 00011111122222222222233333333344444444445 200. | Then add the final digits (the leaf or leaves) back in to the corresponding stem

Summary Statistics

Central Tendency and Spread Two of the most simple and most important measures

Central Tendency There are a number of measures of central tendency The most common are: Mean Median Mode Let’s focus on the first two

Median The median is the middle most value We can identify it by placing our data in order. Let’s use the same five values: 1985 1985 1992 1992 1992 The mean (1989.2) and median (1992) are often different. The median has a nice attribute in that it is generally not sensitive to outliers.

Median If there are two middle-most variables, we would take the average of the two middle values Let’s add our outlier (1960) to our data set and figure out the median: 1960 1985 1985 1992 1992 1992 The median is now (1985 + 1992) / 2 = 1988.5

Mean and Median Mean ● Easy to understand. It’s the average ● Affected by extreme high or low values (outliers) ● May not best characterize skewed distributions Median ● Not affected by outliers ● May better characterize skewed distributions

What about mode? Mode ● The most frequent value ● Less often used in social science

Mode ● The most frequent value ● Less often used in social science

Percentiles Imagine a chart will all the observable values in a population; it contains 100 percent of the possible values. The p th percentile is the value of a given distribution such that p% of the distribution is less than or equal to that value. Quartiles: The 25th, 50th, and 75th percentiles Quintiles: The 20th, 40th, 60th, and 80th are quintiles Deciles: 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th. The 50th percentile is the MEDIAN

10 th percentile=-1.2816 10 percent under curve (shaded red)

Basic descriptive statistics 25 th percentile=-0.67 25 percent under curve (shaded red)

Basic descriptive statistics 50 th percentile=0.00 50 percent under curve (shaded red)

75 th percentile=0.6745 75 percent under curve (shaded red)

Basic descriptive statistics 90 th percentile=1.2816 90 percent under curve (shaded red)

Percentiles from our data

50 th Percentile / the median value is 1992 25 th Percentile is 1991 75 th Percentile is 1993

Measures of Spread

How do we describe the different distributions?

Measures Range Interquartile range Index of dispersion Standard Deviation

Interquartile Range (IQR) The IQR is a simple measure of spread: It is the difference between 25 th and 75 th percentile values. The IQR tells us about the spread from the median

Interquartile Range (IQR) 50 th Percentile / the median value is 1992 25 th Percentile is 1991 75 th Percentile is 1993

Boxplots

Standard Deviation Often, we will use and talk about st. dev. Represented by sigma : σ The st. dev tells us about the spread from the mean (The IQR tells us about the spread form the median)

Standard Deviation

But the st. dev. is really useful. If we have normally distributed data, We can expect 68% is within 1 st. dev. And 95% is within 2.

Other ways to describe spread

Skewness and Symmetry

Why might data be skewed? Why might data be bimodal?

Skewed data example: Family Income

Q: Guess the mean

$71,840

Q: Guess the mean $71,840

Q: Guess the mean $71,840 Q: Guess the median

Q: Guess the mean $71,840 Q: Guess the median $55,000

Interpreting Tables

Elements of a Table Title describes content Sample size presented Actual and percentage shares presented

Assumptions stated Source of calculations stated

Interpreting Tables From Manski (2014) Death penalty moratorium was lifted in U.S. is 1976 Three ways to interpret data presented

Interpreting Tables 1)“Before and after” Average effect of death penalty is -.6 (calculated as 9.7-10.3)

Interpreting Tables 2) Compare treated and untreated Assumes all else equal, e.g. propensity to kill is the same everywhere Average effect in 1977 is 2.8 (=9.7-6.9)

Interpreting Tables 3) Difference in difference Changes in effects over time to account for policy changes Treated states declined from 10.3 to 9.7 = -.6 Untreated states declined from 8.0 to 6.9 = 1.1 Effect =.5 = [(9.7-10.3)-(6.9-8.0)]

Interpreting Tables Before and after shows reduced homicide rates Comparison of treated and untreated shows increase in rate to 2.8 Difference in difference shows increase in rate to.5 per 100,000 Explanations?

Presenting Data Tables Charts Graphs

Problems with Pie Charts No sample size Similarly sized pies suggest all groups are equal and all response rates are about the same Were yes/no the only options? What are “enough transportation options”?

When Pie Charts Are Appropriate

Bar Chart

Measures of association

Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest.

Similar presentations

Presentation on theme: "Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest.

Similar presentations

Presentation on theme: "Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest."— Presentation transcript:

Similar presentations

About project

Feedback