1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no homework done → no lessonno homework done → no lesson
8 Topics Relationship between two variables - diagrams and tables - Pearson correlation coefficient - Spearman's rho
74 1.Quiz 2.I Hate Statistics Game 3.Relationship between two variables 4.SPSS workshop 5.Discussion homework next week Content seminar 4
1. Quiz
Quiz 10 questions Password:
2. I HATE STATISTICS GAME
3. Chapters of Howitt & Cramer
8 Raw data – just two variables var00001var00002var00003var00004var00005var00006var00007var00008var00009var00010var00011var
9 Contingency tables Pro: complete overview Con: hard to read, especially when there are many columns or rows In SPSS: Analyze > Descriptives > Cross Tables
10 3. Bivariate diagrams Pretty, but hard to interpret
11 Scatter plot Only for two scalar variables
CONTINGENCY TABLES
13 Contingency table These are constructed from normal tables 13(0.7) 27(1.6) 337(8.2) 425(5.6) 533(7.3) 633(7.3) 743(9.6) 883(18.4) 9141(31.3) 1045(10.0) TOTAL450(100.0)
14 Contingency table total 1 3(0.7) 2 7(1.6) 3 37(8.2) 4 25(5.6) 5 33(7.3) 6 33(7.3) 7 43(9.6) 8 83(18.4) 9 141(31.3) 10 45(10.0) TOTAL 450(100.0)
15 Contingency table male female total 12(0.8) 1(0.5) 3(0.7) 24(1.6) 3(1.5) 7(1.6) 315(6.0) 22(11.0) 37(8.2) 411(4.4) 14(7.0) 25(5.6) 516(6.4) 17(8.5) 33(7.3) 618(7.2) 15(7.5) 33(7.3) 725(10.0) 18(9.0) 43(9.6) 843(17.2) 40(20.0) 83(18.4) 991(36.4) 50(25.0) 141(31.3) 1025(10.0) 20(10.0) 45(10.0) TOTAL250(100.0) 200(100.0) 450(100.0)
16 Two ways to show the same simple contingency table (2x2) manwomanTOTAL smoker non-smoker TOTAL smokernon-smokerTOTAL man woman TOTAL Table 1: Relation between gender and smoking Table 2: Relation between smoking and gender
17 A simple contingency table (2x2) manwomanTOTAL smoker non-smoker TOTAL smokernon-smokerTOTAL man woman TOTAL Table: Relation between gender and smoking independent: horizontal dependent: vertical
18 Independent and dependent variables an independent variable is supposed to have a causal influence on the dependent variable independent variable dependent variable convention independent variable: horizontal dimension dependent variable: vertical dimension in other words: the horizontal dimension causes the vertical dimension. Remember: Dependent Down
19 Sometimes that doesn't work sunglassesno sunglassesTOTAL sunscreen no sunscreen TOTAL Table: Relation between wearing sunglasses and using sunscreen
marginal totals and relative frequencies
21 Contingency table (3x2) ManWomanTOTAL Will certainly watch it Will maybe watch it Will not watch it TOTAL Table: Intention to watch the TV show (numbers for men and women)
22 Contingency table ManWomanTOTAL Will certainly watch it Will maybe watch it Will not watch it TOTAL Table: Intention to watch the TV show (numbers for men and women) frequencies in the contingency table
23 Contingency table ManWomanTOTAL Will certainly watch it Will maybe watch it Will not watch it TOTAL Table: Intention to watch the TV show (numbers for men and women) frequencies in the contingency table marginal totals
24 Column percentages ManWomanTOTAL Will certainly watch it050 0(50%)050 0(17%)100 0(25%) Will maybe watch it025 0(25%)075 0(25%)100 0(25%) Will not watch it025 0(25%)175 0(58%)200 0(50%) TOTAL100 (100%)300 (100%)400 (100%) Table: Intention to watch the TV show (numbers for men and women) IF the dependent variable is indeed listed down, this should be the most informative way of showing relative frequencies.
25 Row percentages ManWomanTOTAL Will certainly watch it050 0(50%) 100 (100%) Will maybe watch it025 0(25%)075 0(75%)100 (100%) Will not watch it025 0(13%)175 0(88%)200 (100%) TOTAL100 0(25%)300 0(75%)400 (100%) Table: Intention to watch the TV show (numbers for men and women) This is usually less informative. Exceptions can be made for tables without a independent / dependent distinction.
26 Total percentages ManWomanTOTAL Will certainly watch it050 0(13%) 100 0(25%) Will maybe watch it025 00(6%)075 0(19%)100 0(25%) Will not watch it025 00(6%)175 0(44%)200 0(50%) TOTAL100 (25%)300 (75%)400 (100%) Table: Intention to watch the TV show (numbers for men and women) This helps you interpret the overall numbers, but the contingency is lost (of the man, how many will maybe watch it?)
limitations of contingency tables
28 Applicability of contingency tables Contingence tables can be used with any level of measurement: dichotomous, other nominal, ordinal, interval and ratio. The table can be unclear for numerical variables with too many different values. Ways to solve this problem: –present grouped frequencies for one variable or both variables –use a measure of central tendency for one variable If that doesn't help you can –present the results graphically (scatterplots etc.) –use a measure of association SPSS likes to put all contingent frequencies (rows and columns) in, which is just plain confusing.
29 Original table male female total 12(0.8) 1(0.5) 3(0.7) 24(1.6) 3(1.5) 7(1.6) 315(6.0) 22(11.0) 37(8.2) 411(4.4) 14(7.0) 25(5.6) 516(6.4) 17(8.5) 33(7.3) 618(6.4) 15(7.5) 33(7.3) 725(10.0) 18(9.0) 43(9.6) 843(17.2) 40(20.0) 83(18.4) 991(36.4) 50(25.0) 141(31.3) 1025(10.0) 20(10.0) 45(10.0) TOTAL250(100.0) 200(100.0) 450(100.0)
30 Grouping rows for clarity male female total 1-548(19.2) 57(28.5) 105(23.4) 6-886(34.4) 73(36.5) 159(35.3) (46.4) 70(35.0) 186(41.3) TOTAL250(100.0) 200(100.0) 450(100.0)
31 Adding summary statistics male female total 12(0.8) 1(0.5) 3(0.7) 24(1.6) 3(1.5) 7(1.6) 315(6.0) 22(11.0) 37(8.2) 411(4.4) 14(7.0) 25(5.6) 516(6.4) 17(8.5) 33(7.3) 618(6.4) 15(7.5) 33(7.3) 725(10.0) 18(9.0) 43(9.6) 843(17.2) 40(20.0) 83(18.4) 991(36.4) 50(25.0) 141(31.3) 1025(10.0) 20(10.0) 45(10.0) MEANSM=7.5(n=250) M=7.0(n=200) M=7.3(n=450)
32 Ways to present relationships cross tables with simple or grouped frequencies tables comparing means graphs showing simple or grouped frequencies graphs presenting means graphs showing individual data (scatterplots) measures of association
contingency graphs: clustered bar chart and grouped bar chart
34 Bar chart (one variable) Number of people who prefer a short (red), medium (blue) or long workout (green).
35 Clustered bar chart (two variables) never incidentally always Number of people who prefer a short (red), medium (blue) or long workout (green), shown by whether the respondents visits a gym each week
36 Stacked bar chart never incidentally always
presenting means and medians in a graph and scatterplots
38 Means or medians in a bar graph
39 Means or medians in a line graph The obvious drawback of this graph (and the previous) is that you cannot see the individual cases
40 Scatterplot (aka scattergram)
41 scatter plot each dot is one case
42 No correlation
43 Perfect correlation
44 Imperfect correlation
MEASURES OF ASSOCIATION
46 Measures of association not applicable if at least one variable is nominal strength of the relationship direction of the relationship Spearman's rho (ρ) Pearson's correlation coefficient (Pearson's r) There are others, that we will not discuss in this class
47 3. Pearson's correlation coefficient (r) used for the relation between two interval/ratio variables varies between -1 and 1 +1 perfect positive correlation –1 perfect negative correlation 0 no correlation at all +0.8 realistic positive correlation
Pearson's correlation coefficient (r) The strength of the linear association between two interval variables is quantified by Pearson's correlation coefficient. The formula for Pearson's correlation takes on many forms. This one is used most frequently: OR: These formulas are not on the exam
49 OR The easy way A simple looking formula can be used if the numbers are converted into z-scores: So this is the importance of the Z-scores where z x is the variable X converted into z scores and z y is the variable Y converted into z scores. This formula is on the exam
50 2. Spearman's rho (ρ) used for the relation between an ordinal variable and another variable (not nominal) varies between -1 and 1 (indicating the direction and strength of the relationship)
51 Spearman's rho (ρ) and Pearson's r In principle, ρ is simply a special case of Pearson's correlation coefficient (r) in which the two variables are converted to rankings before calculating the coefficient.
05/21/1252 Levels of measurement nominalordinal interval or ratio dichotomy nominal Cramérs V or eta Cramérs V ordinal Cramérs Vrho interval or ratio Cramérs V or eta rhor dichotomy Cramérs Vrho and measures of association phi
53 Positive correlation coefficient (+1)
54 Negative correlation coefficient (-1)
55 No correlation coefficient (0)
56 Realistic positive and strong correlation coefficient (.7 or so)
4. SPSS workshop
77 Open new data sheet Create two new variables, shoe size and height your teacher may choose different variables For 10 persons sitting in the back rows, enter values Create all plots and correlations coefficients that we discussed so far Some of those are not applicable to the data SPSS Topics
6. Homework
77 Homework Assignment
61