Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated Learning Center (ILC) 10: :50 Mondays, Wednesdays & Fridays.
Preview of Questionnaire Homework There are four parts: Statement of Objectives Questionnaire itself (which is the operational definitions of the objectives) Data collection and creation of database Creation of graphs representing results Hand in questionnaire project (Homework assignments 3 & 4)
Everyone will want to be enrolled in one of the lab sessions Remember: Bring electronic copy of your data (flash drive or it to yourself) Your data should have correct formatting See Lab Materials link on class website to double- check formatting of excel is exactly consistent Labs next week
Schedule of readings Before next exam (September 26 th ) Please read chapters in Ha & Ha textbook Please read Appendix D, E & F online On syllabus this is referred to as online readings 1, 2 & 3 Please read Chapters 1, 5, 6 and 13 in Plous Chapter 1: Selective Perception Chapter 5: Plasticity Chapter 6: Effects of Question Wording and Framing Chapter 13: Anchoring and Adjustment
Reminder A note on doodling
By the end of lecture today 9/12/14 Use this as your study guide Dot Plots Frequency Distributions - Frequency Histograms Frequency, relative frequency Guidelines for constructing frequency distributions Correlational methodology Positive, Negative and Zero correlation
Homework due – Monday (September 15 th ) On class website: please print and complete homework worksheet #5
Descriptive statistics - organizing and summarizing data Descriptive vs inferential statistics Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected Sample versus population
Descriptive statistics - organizing and summarizing data Descriptive or inferential? Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected What is the average height of the basketball team? In this class, percentage of students who support the death penalty? Based on the data collected from the students in this class we can conclude that 60% of the students at this university support the death penalty Measured all of the players and reported the average height Measured all of the students in class and reported percentage who said “yes” Measured only a sample of the players and reported the average height for team Measured only a sample of the students in class and reported percentage who said “yes”
Descriptive statistics - organizing and summarizing data Descriptive or inferential? Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected Men are in general taller than women Shoe size is not a good predictor of intelligence Blondes have more fun The average age of students at the U of A is 21 Measured all of the citizens of Arizona and reported heights Measured all of the shoe sizes and IQ of students of 20 universities Asked 500 actresses to complete a happiness survey Asked all students in the fraternities and sororities their age
Descriptive statistics - organizing and summarizing data Descriptive vs inferential statistics Inferential statistics - generalizing beyond actual observations making “inferences” based on data collected To determine this we have to consider the methodologies used in collecting the data
You’ve gathered your data…what’s the best way to display it??
Describing Data Visually Lists of numbers too hard to see patterns Organizing numbers helps Graphical representation even more clear This is a dot plot
Describing Data Visually Measuring the “frequency of occurrence” Then figure “frequency of occurrence” for the bins We’ve got to put these data into groups (“bins”)
Frequency distributions Frequency distributions an organized list of observations and their frequency of occurrence How many kids are in your family? What is the most common family size?
Another example: How many kids in your family? Number of kids in family
Frequency distributions Crucial guidelines for constructing frequency distributions: 1. Classes should be mutually exclusive: Each observation should be represented only once (no overlap between classes) 2. Set of classes should be exhaustive: Should include all possible data values (no data points should fall outside range) Wrong Correct Correct 0 - under under under 15 How many kids are in your family? What is the most common family size? Number of kids in family Wrong Correct No place for our family of 14!
Frequency distributions Crucial guidelines for constructing frequency distributions: 3. All classes should have equal intervals (even if the frequency for that class is zero) Wrong Correct Correct 0 - under under under 15 How many kids are in your family? What is the most common family size? Number of kids in family
4. Selecting number of classes is subjective Generally will often work How about 6 classes? (“bins”) How about 8 classes? (“bins”) How about 16 classes? (“bins”)
5. Class width should be round (easy) numbers 6. Try to avoid open ended classes For example 10 and above Greater than 100 Less than 50 Clear & Easy Round numbers: 5, 10, 15, 20 etc or 3, 6, 9, 12 etc Lower boundary can be multiple of interval size Remember: This is all about helping readers understand quickly and clearly.
Let’s do one Scores on an exam If less than 10 groups, “ungrouped” is fine If more than 10 groups, “grouped” might be better How to figure how many values = 47 Step 1: List scores Step 2: List scores in order Step 3: Decide whether grouped or ungrouped Step 4: Generate number and size of intervals (or size of bins) Largest number - smallest number + 1 Sample size (n) 10 – – – – – – 1,024 Number of classes If we have 6 bins – we’d have intervals of 8 Whaddya think? Would intervals of 5 be easier to read? Let’s just try it and see which we prefer…
Scores on an exam Scores on an exam Score Frequency – Scores on an exam Score Frequency bins Interval of 5 6 bins Interval of 8 Let’s just try it and see which we prefer… Remember: This is all about helping readers understand quickly and clearly. Scores on an exam Score Frequency –
Scores on an exam Scores on an exam Score Frequency – Let’s make a frequency histogram using 10 bins and bin width of 5!!
Scores on an exam Score Frequency – Step 6: Complete the Frequency Table Scores on an exam Cumulative Frequency Relative Frequency Relative Cumulative Frequency bins Interval of 8 Just adding up the frequency data from the smallest to largest numbers Just dividing each frequency by total number to get a ratio (like a percent) Please note: 1 /28 = / 28 = /28 =.1429 Just adding up the relative frequency data from the smallest to largest numbers Please note: Also just dividing cumulative frequency by total number 1/28 = /28 = /28 =.1786
Scores on an exam Score Frequency – Cumulative Frequency Data Scores on an exam Cumulative Frequency Relative Frequency Cumulative Rel. Freq Cumulative Frequency Histogram Where are we?
Step 4: Decide 10 for # bins (classes) 5 for bin width (interval size) Scores on an exam Step 1: List scores Step 2: List scores in order Step 3: Decide grouped Scores on an exam Score Frequency – Step 5: Generate frequency histogram Score on exam
Scores on an exam Scores on an exam Score Frequency – Score on exam Generate frequency polygon Plot midpoint of histogram intervals Connect the midpoints
Scores on an exam Scores on an exam Score 95 – – Score on exam Frequency ogive is used for cumulative data Generate frequency ogive (“oh-jive”) Cumulative Frequency Connect the midpoints Plot midpoint of histogram intervals
Pareto Chart: Categories are displayed in descending order of frequency
Stacked Bar Chart: Bar Height is the sum of several subtotals
Simple Line Charts: Often used for time series data (continuous data) (the space between data points implies a continuous flow) Note: Can use a two-scale chart with caution Note: Fewer grid lines can be more effective Note: For multiple variables lines can be better than bar graph
Pie Charts: General idea of data that must sum to a total (these are problematic and overly used – use with much caution) Bar Charts can often be more effective Exploded 3-D pie charts look cool but a simple 2-D chart may be more clear Exploded 3-D pie charts look cool but a simple 2-D chart may be more clear
Data based on Gallup poll on 8/24/11 Who is your favorite candidate Candidate Frequency Rick Perry29 Mitt Romney17 Ron Paul13 Michelle Bachman10 Herman Cain 4 Newt Gingrich 4 No preference23 Simple Frequency Table – Qualitative Data We asked 100 Republicans “Who is your favorite candidate?” Relative Frequency Just divide each frequency by total number Please note: 29 /100 = /100 = /100 = /100 =.0400 Percent 29% 17% 13% 10% 4% 23% If 22 million Republicans voted today how many would vote for each candidate? Number expected to vote 6,380,000 3,740,000 2,860,000 2,200, ,000 5,060,000 Just multiply each relative frequency by 100 Please note:.2900 x 100 = 29%.1700 x 100 = 17%.1300 x 100 = 13%.0400 x 100 = 4% Just multiply each relative frequency by 22 million Please note:.2900 x 22m = 6,667k.1700 x 22m = 3,740k.1300 x 22m = 2,860k.0400 x 22m= 880k
Designed our study / observation / questionnaire Collected our data Organize and present our results
Scatterplot displays relationships between two continuous variables Correlation: Measure of how two variables co-occur and also can be used for prediction Range between -1 and +1 Range between -1 and +1 The closer to zero the weaker the relationship The closer to zero the weaker the relationship and the worse the prediction Positive or negative Positive or negative
Correlation Range between -1 and +1 Range between -1 and perfect relationship = perfect predictor perfect relationship = perfect predictor 0 no relationship = very poor predictor strong relationship = good predictor strong relationship = good predictor strong relationship = good predictor weak relationship = poor predictor weak relationship = poor predictor weak relationship = poor predictor
Height of Mothers by Height of Daughters Positive Correlation Height of Daughters Height of Mothers Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down
Brushing teeth by number cavities Negative Correlation Number Cavities Brushing Teeth Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down
Perfect correlation = or One variable perfectly predicts the other Negative correlation Positive correlation Height in inches and height in feet Speed (mph) and time to finish race
Correlation Perfect correlation = or The more closely the dots approximate a straight line, (the less spread out they are) the stronger the relationship is. One variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line
Correlation