Download presentation
Presentation is loading. Please wait.
Published byDaniel Lang Modified over 9 years ago
2
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated Learning Center (ILC) 10:00 - 10:50 Mondays, Wednesdays & Fridays. http://www.youtube.com/watch?v=oSQJP40PcGI
3
Preview of Questionnaire Homework There are four parts: Statement of Objectives Questionnaire itself (which is the operational definitions of the objectives) Data collection and creation of database Creation of graphs representing results Hand in questionnaire project (Homework assignments 3 & 4)
4
Everyone will want to be enrolled in one of the lab sessions Remember: Bring electronic copy of your data (flash drive or email it to yourself) Your data should have correct formatting See Lab Materials link on class website to double- check formatting of excel is exactly consistent Labs next week
5
Schedule of readings Before next exam (September 26 th ) Please read chapters 1 - 4 in Ha & Ha textbook Please read Appendix D, E & F online On syllabus this is referred to as online readings 1, 2 & 3 Please read Chapters 1, 5, 6 and 13 in Plous Chapter 1: Selective Perception Chapter 5: Plasticity Chapter 6: Effects of Question Wording and Framing Chapter 13: Anchoring and Adjustment
6
Reminder A note on doodling
7
By the end of lecture today 9/12/14 Use this as your study guide Dot Plots Frequency Distributions - Frequency Histograms Frequency, relative frequency Guidelines for constructing frequency distributions Correlational methodology Positive, Negative and Zero correlation
8
Homework due – Monday (September 15 th ) On class website: please print and complete homework worksheet #5
9
Descriptive statistics - organizing and summarizing data Descriptive vs inferential statistics Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected Sample versus population
10
Descriptive statistics - organizing and summarizing data Descriptive or inferential? Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected What is the average height of the basketball team? In this class, percentage of students who support the death penalty? Based on the data collected from the students in this class we can conclude that 60% of the students at this university support the death penalty Measured all of the players and reported the average height Measured all of the students in class and reported percentage who said “yes” Measured only a sample of the players and reported the average height for team Measured only a sample of the students in class and reported percentage who said “yes”
11
Descriptive statistics - organizing and summarizing data Descriptive or inferential? Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected Men are in general taller than women Shoe size is not a good predictor of intelligence Blondes have more fun The average age of students at the U of A is 21 Measured all of the citizens of Arizona and reported heights Measured all of the shoe sizes and IQ of students of 20 universities Asked 500 actresses to complete a happiness survey Asked all students in the fraternities and sororities their age
12
Descriptive statistics - organizing and summarizing data Descriptive vs inferential statistics Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected To determine this we have to consider the methodologies used in collecting the data
13
You’ve gathered your data…what’s the best way to display it??
14
141720252129 162527181613 112119242011 202816131714 14168171711 11141719248 16122592017 1114161822 1418231215 1013151111 Describing Data Visually 81114171924 81214172025 91215172025 101315172025 111316172027 111316172128 111416182129 1114161822 1114161823 1114161924 Lists of numbers too hard to see patterns Organizing numbers helps Graphical representation even more clear This is a dot plot
15
Describing Data Visually 81214171924 81214172025 91315172025 101315172025 111316172027 111316172128 111416182129 1114161822 1114161823 1114161924 Measuring the “frequency of occurrence” Then figure “frequency of occurrence” for the bins We’ve got to put these data into groups (“bins”)
16
Frequency distributions Frequency distributions an organized list of observations and their frequency of occurrence How many kids are in your family? What is the most common family size?
17
Another example: How many kids in your family? 3 4 8 2 2 1 4 1 14 2 Number of kids in family 1313 1414 2424 2828 214
18
Frequency distributions Crucial guidelines for constructing frequency distributions: 1. Classes should be mutually exclusive: Each observation should be represented only once (no overlap between classes) 2. Set of classes should be exhaustive: Should include all possible data values (no data points should fall outside range) Wrong 0 - 5 5 - 10 10 - 15 Correct 0 - 4 5 - 9 10 - 14 Correct 0 - under 5 5 - under 10 10 - under 15 How many kids are in your family? What is the most common family size? Number of kids in family 13 14 24 28 214 Wrong 0 - 3 4 - 7 8 - 11 Correct 0 - 3 4 - 7 8 - 11 12 - 15 No place for our family of 14!
19
Frequency distributions Crucial guidelines for constructing frequency distributions: 3. All classes should have equal intervals (even if the frequency for that class is zero) Wrong 0 - 1 2 - 12 14 - 15 Correct 0 - 4 5 - 9 10 - 14 Correct 0 - under 5 5 - under 10 10 - under 15 How many kids are in your family? What is the most common family size? Number of kids in family 13 14 24 28 214
20
4. Selecting number of classes is subjective Generally 5 -15 will often work 8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 How about 6 classes? (“bins”) How about 8 classes? (“bins”) How about 16 classes? (“bins”)
21
5. Class width should be round (easy) numbers 6. Try to avoid open ended classes For example 10 and above Greater than 100 Less than 50 Clear & Easy 8 - 11 12 - 15 16 - 19 20 - 23 24 - 27 28 - 31 8 12 14 17 19 24 8 12 14 17 20 25 9 13 15 17 20 25 10 13 15 17 20 25 11 13 16 17 20 27 11 13 16 17 21 28 11 14 16 18 21 29 11 14 16 18 22 11 14 16 18 23 11 14 16 19 24 Round numbers: 5, 10, 15, 20 etc or 3, 6, 9, 12 etc Lower boundary can be multiple of interval size Remember: This is all about helping readers understand quickly and clearly.
22
Let’s do one Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 If less than 10 groups, “ungrouped” is fine If more than 10 groups, “grouped” might be better How to figure how many values 99 - 53 + 1 = 47 Step 1: List scores 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Step 2: List scores in order Step 3: Decide whether grouped or ungrouped Step 4: Generate number and size of intervals (or size of bins) Largest number - smallest number + 1 Sample size (n) 10 – 16 17 – 32 33 – 64 65 – 128 129 - 255 256 – 511 512 – 1,024 Number of classes 5 6 7 8 9 10 11 If we have 6 bins – we’d have intervals of 8 Whaddya think? Would intervals of 5 be easier to read? Let’s just try it and see which we prefer…
23
Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 53 58 60 61 64 69 70 72 73 75 76 78 80 82 84 87 88 89 91 93 94 95 99 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Scores on an exam Score Frequency 93 - 100 4 85 - 92 6 77- 84 6 69 - 76 7 61- 68 2 53 - 60 3 10 bins Interval of 5 6 bins Interval of 8 Let’s just try it and see which we prefer… Remember: This is all about helping readers understand quickly and clearly. Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1
24
Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Let’s make a frequency histogram using 10 bins and bin width of 5!!
25
Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Step 6: Complete the Frequency Table Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Relative Frequency.0715.1071.1786.1429.1071.0357.1071.0357 Relative Cumulative Frequency 1.0000.9285.8214.6428.4642.3213.2142.1785.0714.0357 6 bins Interval of 8 Just adding up the frequency data from the smallest to largest numbers Just dividing each frequency by total number to get a ratio (like a percent) Please note: 1 /28 =.0357 3/ 28 =.1071 4/28 =.1429 Just adding up the relative frequency data from the smallest to largest numbers Please note: Also just dividing cumulative frequency by total number 1/28 =.0357 2/28 =.0714 5/28 =.1786
26
Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Cumulative Frequency Data Scores on an exam 82 58 64 80 75 72 87 73 88 94 84 78 93 69 70 60 53 84 76 87 84 61 89 95 87 91 75 99 Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Relative Frequency.0715.1071.1786.1429.1071.0357.1071.0357 Cumulative Rel. Freq. 1.0000.9285.8214.6428.4642.3213.2142.1785.0714.0357 Cumulative Frequency Histogram Where are we?
27
Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Step 5: Generate frequency histogram Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1
28
Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Scores on an exam Score Frequency 95 - 992 90 - 94 3 85 - 89 5 80 – 845 75 - 79 4 70 - 74 3 65 - 69 1 60 - 64 3 55 - 59 1 50 - 54 1 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 6 5 4 3 2 1 Generate frequency polygon Plot midpoint of histogram intervals Connect the midpoints
29
Scores on an exam 82586480 75728773 88948478 93697060 53847687 84618995 87917599 Scores on an exam Score 95 – 99 90 - 94 85 - 89 80 – 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 Score on exam 80 - 84 75 - 79 70 - 74 65 - 69 60 - 64 55 - 59 50 - 54 90 - 94 95 - 99 85 - 89 30 25 20 15 10 5 Frequency ogive is used for cumulative data Generate frequency ogive (“oh-jive”) Cumulative Frequency 28 26 23 18 13 9 6 5 2 1 Connect the midpoints Plot midpoint of histogram intervals
30
Pareto Chart: Categories are displayed in descending order of frequency
31
Stacked Bar Chart: Bar Height is the sum of several subtotals
32
Simple Line Charts: Often used for time series data (continuous data) (the space between data points implies a continuous flow) Note: Can use a two-scale chart with caution Note: Fewer grid lines can be more effective Note: For multiple variables lines can be better than bar graph
33
Pie Charts: General idea of data that must sum to a total (these are problematic and overly used – use with much caution) Bar Charts can often be more effective Exploded 3-D pie charts look cool but a simple 2-D chart may be more clear Exploded 3-D pie charts look cool but a simple 2-D chart may be more clear
34
Data based on Gallup poll on 8/24/11 Who is your favorite candidate Candidate Frequency Rick Perry29 Mitt Romney17 Ron Paul13 Michelle Bachman10 Herman Cain 4 Newt Gingrich 4 No preference23 Simple Frequency Table – Qualitative Data We asked 100 Republicans “Who is your favorite candidate?” Relative Frequency.2900.1700.1300.1000.0400.2300 Just divide each frequency by total number Please note: 29 /100 =.2900 17 /100 =.1700 13 /100 =.1300 4 /100 =.0400 Percent 29% 17% 13% 10% 4% 23% If 22 million Republicans voted today how many would vote for each candidate? Number expected to vote 6,380,000 3,740,000 2,860,000 2,200,000 880,000 5,060,000 Just multiply each relative frequency by 100 Please note:.2900 x 100 = 29%.1700 x 100 = 17%.1300 x 100 = 13%.0400 x 100 = 4% Just multiply each relative frequency by 22 million Please note:.2900 x 22m = 6,667k.1700 x 22m = 3,740k.1300 x 22m = 2,860k.0400 x 22m= 880k
38
Designed our study / observation / questionnaire Collected our data Organize and present our results
39
Scatterplot displays relationships between two continuous variables Correlation: Measure of how two variables co-occur and also can be used for prediction Range between -1 and +1 Range between -1 and +1 The closer to zero the weaker the relationship The closer to zero the weaker the relationship and the worse the prediction Positive or negative Positive or negative
40
Correlation Range between -1 and +1 Range between -1 and +1 -1.00 perfect relationship = perfect predictor +1.00 perfect relationship = perfect predictor 0 no relationship = very poor predictor +0.80 strong relationship = good predictor -0.80 strong relationship = good predictor -0.80 strong relationship = good predictor +0.20 weak relationship = poor predictor -0.20 weak relationship = poor predictor -0.20 weak relationship = poor predictor
41
Height of Mothers by Height of Daughters Positive Correlation Height of Daughters Height of Mothers Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down
42
Brushing teeth by number cavities Negative Correlation Number Cavities Brushing Teeth Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down
43
Perfect correlation = +1.00 or -1.00 One variable perfectly predicts the other Negative correlation Positive correlation Height in inches and height in feet Speed (mph) and time to finish race
44
Correlation Perfect correlation = +1.00 or -1.00 The more closely the dots approximate a straight line, (the less spread out they are) the stronger the relationship is. One variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line
45
Correlation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.