Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest.

Similar presentations


Presentation on theme: "Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest."— Presentation transcript:

1 Describing Data September 14, 2016

2 Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest speaker from DSSC (part of class) The following week, another speaker talking about Zotero.

3 Updates to assignments Updated LiPS assignment Still have to seven write-ups One must be either Fulong Wu (Monday evening Nov 14 th ) or Malo Hutson (Tuesday evening Sept. 20 th ) Assignment 2 posted to CourseWorks Due at the start of your lab in 2 weeks. Hand in a paper copy to your TA and post also to CourseWorks.

4 Today: Statistics Descriptive Describe and summarize our data to give insights Inferential Use statistics to make generalizations about a broader population

5 Types of Variables Categorical Nominal (not ranked) College major, type of property, color of car Ordinal (ordered or ranked) Useful for preferences, though no value assigned Dichotomous (two categories, not ranked) Yes/no Numerical Discrete (values are counts) Continuous (values are measures)

6 Variables Nominal Exclusive but not ordered or ranked Ordinal Ranked Interval Equally spaced variables

7 Nominal Examples Think of nominal scales as “labels” No quantitative value

8 Nominal Examples Think of nominal scales as “labels” No quantitative value

9 Nominal Examples Think of nominal scales as “labels” No quantitative value ColorCount Blue10 Black8 Red6 blue5 Purple3 Green2 Purple2 White2 BLUE1 Brown1 Burgundy1 Gray1 Pink1 Red1 Yellow1 nav1 orange1 purple1 red1 seafoam green1 turquoise1 white1

10 Nominal Examples Think of nominal scales as “labels” No quantitative value Other Examples: Gender Hair color Neighborhood When there are only two categories, we call this “dichotomous.” Examples – Heads/Tails, On/Off, Rural/Urban, In poverty / Not in poverty Q: What about gender? Is that a dichotomous variable?

11 Ordinal Ranked in order of values, but the difference between values is not always known Example: Educational attainment

12 Ordinal example: educational attainment

13 Interval Numerical scales where order of and differences between variables is known Examples: Money or income Height Weight

14 Likert items Allow people to respond according to some scale

15 Likert items Allow people to respond according to some scale Examples: Question: How frequently do you think you need to come to class to get a high pass? o Always o Often o Occasionally o Rarely o never

16 Likert items Allow people to respond according to some scale Examples: Question: I already know everything there is to know about “Planning Techniques” o Agree Strongly o Agree Slightly o Neutral o Disagree Slightly o Disagree Strongly

17 Likert items Allow people to respond according to some scale Examples – four point scale Question: I read emails from Nick Klein o Most of the time o Some of the time o Seldom o Never

18 Likert items Allow people to respond according to some scale Examples – four point scale Question: I read emails from Nick Klein o Most of the time – ALL OF THE TIME o Some of the time o Seldom o Never

19 Likert Scales What types of variables are these? How can we interpret them?

20 Descriptive stats

21 We need some data to describe

22 Lucky us!

23 What year were you born? 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990

24 Hard to make sense of this… 50 responses: 1993, 1991, 1960, 1993, 1994, 1992, 1989, 1992, 1993, 1993, 1994, 1991, 1990, 1992, 1987, 1989, 1994, 1992, 1989, 1992, 1994, 1985, 1994, 1991, 1991, 1992, 1993, 1993, 1993, 1992, 1991, 1985, 1992, 1992, 1992, 1985, 1994, 1993, 1995, 1991, 1985, 1993, 1990, 1992, 1994, 1994, 1994, 1994, 1992, 1990

25 We can use a “frequency table” Year bornFrequencyPercent 196012.00 198548.00 198712.00 198936.00 199036.00 1991612.00 19921224.00 1993918.00 19941020.00 199512.00

26 Let’s represent it another way, graphically

27 We can use a “dot plot” where each dot represents a response

28 This is similar to a histogram

29 But a histogram is more flexible

30 We can change the number of “bins”

31 And change the y-axis to a measure of “relative frequency” rather than a count.

32 Another approach is a “stem and leaf” 195. | 196. | 197. | 198. | 199. | 200. | The stem consists of the numbers with the last digit omitted. So for our years, this would mean ignore the year but keep the decade. So “1975” would become “197”

33 Another approach is a “stem and leaf” 195. | 196. | 0 197. | 198. | 55557999 199. | 00011111122222222222233333333344444444445 200. | Then add the final digits (the leaf or leaves) back in to the corresponding stem

34 Summary Statistics

35 Central Tendency and Spread Two of the most simple and most important measures

36 Central Tendency There are a number of measures of central tendency The most common are: Mean Median Mode Let’s focus on the first two

37 Mean

38

39 Median The median is the middle most value We can identify it by placing our data in order. Let’s use the same five values: 1985 1985 1992 1992 1992 The mean (1989.2) and median (1992) are often different. The median has a nice attribute in that it is generally not sensitive to outliers.

40 Median If there are two middle-most variables, we would take the average of the two middle values Let’s add our outlier (1960) to our data set and figure out the median: 1960 1985 1985 1992 1992 1992 The median is now (1985 + 1992) / 2 = 1988.5

41 Mean and Median Mean ● Easy to understand. It’s the average ● Affected by extreme high or low values (outliers) ● May not best characterize skewed distributions Median ● Not affected by outliers ● May better characterize skewed distributions

42 What about mode? Mode ● The most frequent value ● Less often used in social science

43 Mode ● The most frequent value ● Less often used in social science

44 Percentiles Imagine a chart will all the observable values in a population; it contains 100 percent of the possible values. The p th percentile is the value of a given distribution such that p% of the distribution is less than or equal to that value. Quartiles: The 25th, 50th, and 75th percentiles Quintiles: The 20th, 40th, 60th, and 80th are quintiles Deciles: 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th. The 50th percentile is the MEDIAN

45 10 th percentile=-1.2816 10 percent under curve (shaded red)

46 Basic descriptive statistics 25 th percentile=-0.67 25 percent under curve (shaded red)

47 Basic descriptive statistics 50 th percentile=0.00 50 percent under curve (shaded red)

48 75 th percentile=0.6745 75 percent under curve (shaded red)

49 Basic descriptive statistics 90 th percentile=1.2816 90 percent under curve (shaded red)

50 Percentiles from our data

51 50 th Percentile / the median value is 1992 25 th Percentile is 1991 75 th Percentile is 1993

52 Measures of Spread

53 How do we describe the different distributions?

54 Measures Range Interquartile range Index of dispersion Standard Deviation

55 Interquartile Range (IQR) The IQR is a simple measure of spread: It is the difference between 25 th and 75 th percentile values. The IQR tells us about the spread from the median

56 Interquartile Range (IQR) 50 th Percentile / the median value is 1992 25 th Percentile is 1991 75 th Percentile is 1993

57 Boxplots

58 Standard Deviation Often, we will use and talk about st. dev. Represented by sigma : σ The st. dev tells us about the spread from the mean (The IQR tells us about the spread form the median)

59 Standard Deviation

60

61

62

63

64

65

66 But the st. dev. is really useful. If we have normally distributed data, We can expect 68% is within 1 st. dev. And 95% is within 2.

67 Other ways to describe spread

68 Skewness and Symmetry

69

70

71 Why might data be skewed? Why might data be bimodal?

72 Skewed data example: Family Income

73 Q: Guess the mean

74 $71,840

75 Q: Guess the mean $71,840

76 Q: Guess the mean $71,840 Q: Guess the median

77 Q: Guess the mean $71,840 Q: Guess the median $55,000

78 Interpreting Tables

79 Elements of a Table Title describes content Sample size presented Actual and percentage shares presented

80 Assumptions stated Source of calculations stated

81 Interpreting Tables From Manski (2014) Death penalty moratorium was lifted in U.S. is 1976 Three ways to interpret data presented

82 Interpreting Tables 1)“Before and after” Average effect of death penalty is -.6 (calculated as 9.7-10.3)

83 Interpreting Tables 2) Compare treated and untreated Assumes all else equal, e.g. propensity to kill is the same everywhere Average effect in 1977 is 2.8 (=9.7-6.9)

84 Interpreting Tables 3) Difference in difference Changes in effects over time to account for policy changes Treated states declined from 10.3 to 9.7 = -.6 Untreated states declined from 8.0 to 6.9 = 1.1 Effect =.5 = [(9.7-10.3)-(6.9-8.0)]

85 Interpreting Tables Before and after shows reduced homicide rates Comparison of treated and untreated shows increase in rate to 2.8 Difference in difference shows increase in rate to.5 per 100,000 Explanations?

86 Presenting Data Tables Charts Graphs

87 Problems with Pie Charts No sample size Similarly sized pies suggest all groups are equal and all response rates are about the same Were yes/no the only options? What are “enough transportation options”?

88 When Pie Charts Are Appropriate

89 Bar Chart

90

91 Measures of association


Download ppt "Describing Data September 14, 2016. Updates This week – Lab sections begin Wed: 2-4pm (Today!) Wed: 4-6pm (Today!) Mon: 4-6pm Next week Eric Glass, guest."

Similar presentations


Ads by Google