Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.

Similar presentations


Presentation on theme: "Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill."— Presentation transcript:

1

2 Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill Building 8:00 - 8:50 Mondays, Wednesdays & Fridays. http://courses.eller.arizona.edu/mgmt/delaney/d15s_database_weekone_screenshot.xlsx

3 Schedule of readings Before next exam (February 13 th ) Please read chapters 1 - 4 in Ha & Ha textbook Please read Appendix D, E & F online On syllabus this is referred to as online readings 1, 2 & 3 Please read Chapters 1, 5, 6 and 13 in Plous Chapter 1: Selective Perception Chapter 5: Plasticity Chapter 6: Effects of Question Wording and Framing Chapter 13: Anchoring and Adjustment

4

5 Everyone will want to be enrolled in one of the lab sessions Labs continue this week with Project 1

6 One positive correlation One negative correlation One t-test

7

8 By the end of lecture today 2/6/15 Use this as your study guide Dot Plots Frequency Distributions - Frequency Histograms Frequency, relative frequency Guidelines for constructing frequency distributions Characteristics of a distribution Central Tendency Dispersion Shape

9 No Homework due Monday, Feb 9 th

10 Review of Homework Worksheet.10.08 22 35 25 8 100,000 10.22.35.25 80,000 250,000 350,000 220,000 Notice Gillian asked 1300 people 130+104+325+455+286=1300 130/1300 =.10.10x100=10.10 x 1,000,000 = 100,000

11 Review of Homework Worksheet.10.08 22 35 25 8 100,000 10.22.35.25 80,000 250,000 350,000 220,000

12 Review of Homework Worksheet

13 10 2030 40 50 Age 1 2 3 4 5 6 7 8 9 Dollars Spent Strong Negative Down -.9

14 Review of Homework Worksheet =correl(A2:A11,B2:B11) =-0.9226648007 Strong Negative Down -0.9227

15 Review of Homework Worksheet =correl(A2:A11,B2:B11) =-0.9226648007 Strong Negative Down -0.9227 This shows a strong negative relationship (r = - 0.92) between the amount spent on snacks and the age of the moviegoer Description includes: Both variables Strength (weak,moderate,strong) Direction (positive, negative) Correlation r (actual number)

16 Review of Homework Worksheet =correl(A2:A11,B2:B11) =-0.9226648007 Strong Negative Down -0.9227 Must be complete and must be stapled Hand in your homework

17 Frequency distributions Frequency distributions an organized list of observations and their frequency of occurrence Review

18 Another example: How many kids in your family? 3 4 8 2 2 1 4 1 14 2 Number of kids in family 1313 1414 2424 2828 214 Review

19 Frequency distributions Crucial guidelines for constructing frequency distributions: 1. Classes should be mutually exclusive: Each observation should be represented only once (no overlap between classes) 2. Set of classes should be exhaustive: Should include all possible data values (no data points should fall outside range) Wrong 0 - 5 5 - 10 10 - 15 Correct 0 - 4 5 - 9 10 - 14 Correct 0 - under 5 5 - under 10 10 - under 15 How many kids are in your family? What is the most common family size? Number of kids in family 13 14 24 28 214 Wrong 0 - 4 8 - 11 12 - 15 Correct 0 - 3 4 - 7 8 - 11 12 - 15 No place for our families of 4, 5, 6 or 7

20 Frequency distributions Crucial guidelines for constructing frequency distributions: 3. All classes should have equal intervals (even if the frequency for that class is zero) Wrong 0 - 1 2 - 12 14 - 15 Correct 0 - 4 5 - 9 10 - 14 Correct 0 - under 5 5 - under 10 10 - under 15 How many kids are in your family? What is the most common family size? Number of kids in family 13 14 24 28 214

21 4. Selecting number of classes is subjective Generally 5 -15 will often work 8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 How about 6 classes? (“bins”) How about 8 classes? (“bins”) How about 16 classes? (“bins”)

22 5. Class width should be round (easy) numbers 6. Try to avoid open ended classes For example 10 and above Greater than 100 Less than 50 Clear & Easy 8 - 11 12 - 15 16 - 19 20 - 23 24 - 27 28 - 31 8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 Round numbers: 5, 10, 15, 20 etc or 3, 6, 9, 12 etc Lower boundary can be multiple of interval size Remember: This is all about helping readers understand quickly and clearly.

23 Let’s do one Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 If less than 10 groups, “ungrouped” is fine If more than 10 groups, “grouped” might be better How to figure how many values 99 - 53 + 1 = 47 Step 1: List scores 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Step 2: List scores in order Step 3: Decide whether grouped or ungrouped Step 4: Generate number and size of intervals (or size of bins) Largest number - smallest number + 1 Sample size (n) 10 – 16 17 – 32 33 – 64 65 – 128 129 - 255 256 – 511 512 – 1,024 Number of classes 5 6 7 8 9 10 11 If we have 6 bins – we’d have intervals of 8 Whaddya think? Would intervals of 5 be easier to read? Let’s just try it and see which we prefer…

24 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Scores on an exam Score Frequency 93 - 100 4 85 - 92 6 77- 84 6 69 - 76 7 61- 68 2 53 - 60 3 10 bins Interval of 5 6 bins Interval of 8 Let’s just try it and see which we prefer… Remember: This is all about helping readers understand quickly and clearly. Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1

25 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Let’s make a frequency histogram using 10 bins and bin width of 5!!

26 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Step 6: Complete the Frequency Table Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Relative Frequency.0715.1071.1786.1429.1071.0357.1071.0357 Relative Cumulative Frequency 1.0000.9285.8214.6428.4642.3213.2142.1785.0714.0357 6 bins Interval of 8 Just adding up the frequency data from the smallest to largest numbers Just dividing each frequency by total number to get a ratio (like a percent) Please note: 1 /28 =.0357 3/ 28 =.1071 4/28 =.1429 Just adding up the relative frequency data from the smallest to largest numbers Please note: Also just dividing cumulative frequency by total number 1/28 =.0357 2/28 =.0714 5/28 =.1786 Review

27 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Remember Dot Plots Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1 Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Step 5: Generate frequency histogram

28 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1 Remember Dot Plots Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Step 5: Generate frequency histogram

29 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1 Remember Dot Plots Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Step 5: Generate frequency histogram

30 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1 Remember Dot Plots Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Step 5: Generate frequency histogram

31 Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Step 5: Generate frequency histogram Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1

32 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1 Generate frequency polygon Plot midpoint of histogram intervals Connect the midpoints

33 Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Scores on an exam Score 95 – 99 90 - 94 85 - 89 80 – 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 30 25 20 15 10 5 Frequency ogive is used for cumulative data Generate frequency ogive (“oh-jive”) Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Connect the midpoints Plot midpoint of histogram intervals

34 Pareto Chart: Categories are displayed in descending order of frequency

35 Stacked Bar Chart: Bar Height is the sum of several subtotals

36 Simple Line Charts: Often used for time series data (continuous data) (the space between data points implies a continuous flow) Note: Can use a two-scale chart with caution Note: Fewer grid lines can be more effective Note: For multiple variables lines can be better than bar graph

37 Pie Charts: General idea of data that must sum to a total (these are problematic and overly used – use with much caution) Bar Charts can often be more effective Exploded 3-D pie charts look cool but a simple 2-D chart may be more clear Exploded 3-D pie charts look cool but a simple 2-D chart may be more clear

38 Overview Frequency distributions The normal curve Mean, Median, Mode, Trimmed Mean Standard deviation, Variance, Range Mean Absolute Deviation Skewed right, skewed left unimodal, bimodal, symmetric Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure of 1) central tendency 2) dispersion or 3) shape

39 Another example: How many kids in your family? 3 4 8 2 2 1 4 1 14 2 Number of kids in family 1414 3232 1818 4242 214

40 Measures of Central Tendency (Measures of location) The mean, median and mode Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations Mean for a sample: Mean for a population: ΣX / N = mean = µ (mu) Note: Σ = add up x or X = scores n or N = number of scores Σx / n = mean = x Measures of “location” Where on the number line the scores tend to cluster

41 Measures of Central Tendency (Measures of location) The mean, median and mode Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations Mean for a sample: Note: Σ = add up x or X = scores n or N = number of scores Σx / n = mean = x Number of kids in family 14 32 18 42 214 41/ 10 = mean = 4.1

42 How many kids are in your family? What is the most common family size? Number of kids in family 13 14 24 28 214 Median: The middle value when observations are ordered from least to most (or most to least)

43 How many kids are in your family? What is the most common family size? Median: The middle value when observations are ordered from least to most (or most to least) 1, 3, 1, 4, 2, 4, 2, 8, 2, 14 1, 2, 3, 4, 8, 14 Number of kids in family 14 32 18 42 214

44 Number of kids in family 14 32 18 42 214 14 8, 4, 2, 1, How many kids are in your family? What is the most common family size? Number of kids in family 13 14 24 28 214 Median: The middle value when observations are ordered from least to most (or most to least) 1, 3, 1, 4, 2, 4, 2, 8, 2, 14 2.5 2, 3, 1, 2, 4, 2, 4,8, 1, 14 2, 3, 1, Median always has a percentile rank of 50% regardless of shape of distribution 2 + 3 µ = 2.5 If there appears to be two medians, take the mean of the two

45 Mode: The value of the most frequent observation Number of kids in family 13 14 24 28 214 Score f. 12 23 31 42 50 60 70 81 90 100 110 120 130 141 Please note: The mode is “2” because it is the most frequently occurring score. It occurs “3” times. “3” is not the mode, it is just the frequency for the value that is the mode Bimodal distribution: If there are two most frequent observations

46 What about central tendency for qualitative data? Mode is good for nominal or ordinal data Median can be used with ordinal data Mean can be used with interval or ratio data

47


Download ppt "Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill."

Similar presentations


Ads by Google