Download presentation
Presentation is loading. Please wait.
Published byBasil Gardner Modified over 9 years ago
2
1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no homework done → no lessonno homework done → no lesson
3
8 Topics Relationship between two variables - diagrams and tables - Pearson correlation coefficient - Spearman's rho
4
74 1.Quiz 2.I Hate Statistics Game 3.Relationship between two variables 4.SPSS workshop 5.Discussion homework next week Content seminar 4
5
1. Quiz
6
Quiz 10 questions Password:
7
2. I HATE STATISTICS GAME
8
3. Chapters 6 + 7 of Howitt & Cramer
9
8 Raw data – just two variables var00001var00002var00003var00004var00005var00006var00007var00008var00009var00010var00011var00012 121211312033175 2441314312102310 2193122111271812 227321212030173 15721323142205 227311311334170 1395215115174810 23631222112518 137211312034150 2512224320161811 124521311104216 2292123120342212 120511212022317 2222134213371412 24631152202694 241312231422710 125511311121294 246312421418811 13031151202396 125521311123190
10
9 Contingency tables Pro: complete overview Con: hard to read, especially when there are many columns or rows In SPSS: Analyze > Descriptives > Cross Tables
11
10 3. Bivariate diagrams Pretty, but hard to interpret
12
11 Scatter plot Only for two scalar variables
13
CONTINGENCY TABLES
14
13 Contingency table These are constructed from normal tables 13(0.7) 27(1.6) 337(8.2) 425(5.6) 533(7.3) 633(7.3) 743(9.6) 883(18.4) 9141(31.3) 1045(10.0) TOTAL450(100.0)
15
14 Contingency table total 1 3(0.7) 2 7(1.6) 3 37(8.2) 4 25(5.6) 5 33(7.3) 6 33(7.3) 7 43(9.6) 8 83(18.4) 9 141(31.3) 10 45(10.0) TOTAL 450(100.0)
16
15 Contingency table male female total 12(0.8) 1(0.5) 3(0.7) 24(1.6) 3(1.5) 7(1.6) 315(6.0) 22(11.0) 37(8.2) 411(4.4) 14(7.0) 25(5.6) 516(6.4) 17(8.5) 33(7.3) 618(7.2) 15(7.5) 33(7.3) 725(10.0) 18(9.0) 43(9.6) 843(17.2) 40(20.0) 83(18.4) 991(36.4) 50(25.0) 141(31.3) 1025(10.0) 20(10.0) 45(10.0) TOTAL250(100.0) 200(100.0) 450(100.0)
17
16 Two ways to show the same simple contingency table (2x2) manwomanTOTAL smoker075050125 non-smoker025250275 TOTAL100300400 smokernon-smokerTOTAL man7525100 woman50250300 TOTAL125275400 Table 1: Relation between gender and smoking Table 2: Relation between smoking and gender
18
17 A simple contingency table (2x2) manwomanTOTAL smoker075050125 non-smoker025250275 TOTAL100300400 smokernon-smokerTOTAL man7525100 woman50250300 TOTAL125275400 Table: Relation between gender and smoking independent: horizontal dependent: vertical
19
18 Independent and dependent variables an independent variable is supposed to have a causal influence on the dependent variable independent variable dependent variable convention independent variable: horizontal dimension dependent variable: vertical dimension in other words: the horizontal dimension causes the vertical dimension. Remember: Dependent Down
20
19 Sometimes that doesn't work sunglassesno sunglassesTOTAL sunscreen 075050100 no sunscreen025250300 TOTAL100300400 Table: Relation between wearing sunglasses and using sunscreen
21
marginal totals and relative frequencies
22
21 Contingency table (3x2) ManWomanTOTAL Will certainly watch it05050100 Will maybe watch it02575100 Will not watch it025175200 TOTAL100300400 Table: Intention to watch the TV show (numbers for men and women)
23
22 Contingency table ManWomanTOTAL Will certainly watch it50 100 Will maybe watch it2575100 Will not watch it25175200 TOTAL100300400 Table: Intention to watch the TV show (numbers for men and women) frequencies in the contingency table
24
23 Contingency table ManWomanTOTAL Will certainly watch it50 100 Will maybe watch it2575100 Will not watch it25175200 TOTAL100300400 Table: Intention to watch the TV show (numbers for men and women) frequencies in the contingency table marginal totals
25
24 Column percentages ManWomanTOTAL Will certainly watch it050 0(50%)050 0(17%)100 0(25%) Will maybe watch it025 0(25%)075 0(25%)100 0(25%) Will not watch it025 0(25%)175 0(58%)200 0(50%) TOTAL100 (100%)300 (100%)400 (100%) Table: Intention to watch the TV show (numbers for men and women) IF the dependent variable is indeed listed down, this should be the most informative way of showing relative frequencies.
26
25 Row percentages ManWomanTOTAL Will certainly watch it050 0(50%) 100 (100%) Will maybe watch it025 0(25%)075 0(75%)100 (100%) Will not watch it025 0(13%)175 0(88%)200 (100%) TOTAL100 0(25%)300 0(75%)400 (100%) Table: Intention to watch the TV show (numbers for men and women) This is usually less informative. Exceptions can be made for tables without a independent / dependent distinction.
27
26 Total percentages ManWomanTOTAL Will certainly watch it050 0(13%) 100 0(25%) Will maybe watch it025 00(6%)075 0(19%)100 0(25%) Will not watch it025 00(6%)175 0(44%)200 0(50%) TOTAL100 (25%)300 (75%)400 (100%) Table: Intention to watch the TV show (numbers for men and women) This helps you interpret the overall numbers, but the contingency is lost (of the man, how many will maybe watch it?)
28
limitations of contingency tables
29
28 Applicability of contingency tables Contingence tables can be used with any level of measurement: dichotomous, other nominal, ordinal, interval and ratio. The table can be unclear for numerical variables with too many different values. Ways to solve this problem: –present grouped frequencies for one variable or both variables –use a measure of central tendency for one variable If that doesn't help you can –present the results graphically (scatterplots etc.) –use a measure of association SPSS likes to put all contingent frequencies (rows and columns) in, which is just plain confusing.
30
29 Original table male female total 12(0.8) 1(0.5) 3(0.7) 24(1.6) 3(1.5) 7(1.6) 315(6.0) 22(11.0) 37(8.2) 411(4.4) 14(7.0) 25(5.6) 516(6.4) 17(8.5) 33(7.3) 618(6.4) 15(7.5) 33(7.3) 725(10.0) 18(9.0) 43(9.6) 843(17.2) 40(20.0) 83(18.4) 991(36.4) 50(25.0) 141(31.3) 1025(10.0) 20(10.0) 45(10.0) TOTAL250(100.0) 200(100.0) 450(100.0)
31
30 Grouping rows for clarity male female total 1-548(19.2) 57(28.5) 105(23.4) 6-886(34.4) 73(36.5) 159(35.3) 9-10116(46.4) 70(35.0) 186(41.3) TOTAL250(100.0) 200(100.0) 450(100.0)
32
31 Adding summary statistics male female total 12(0.8) 1(0.5) 3(0.7) 24(1.6) 3(1.5) 7(1.6) 315(6.0) 22(11.0) 37(8.2) 411(4.4) 14(7.0) 25(5.6) 516(6.4) 17(8.5) 33(7.3) 618(6.4) 15(7.5) 33(7.3) 725(10.0) 18(9.0) 43(9.6) 843(17.2) 40(20.0) 83(18.4) 991(36.4) 50(25.0) 141(31.3) 1025(10.0) 20(10.0) 45(10.0) MEANSM=7.5(n=250) M=7.0(n=200) M=7.3(n=450)
33
32 Ways to present relationships cross tables with simple or grouped frequencies tables comparing means graphs showing simple or grouped frequencies graphs presenting means graphs showing individual data (scatterplots) measures of association
34
contingency graphs: clustered bar chart and grouped bar chart
35
34 Bar chart (one variable) Number of people who prefer a short (red), medium (blue) or long workout (green).
36
35 Clustered bar chart (two variables) never incidentally always Number of people who prefer a short (red), medium (blue) or long workout (green), shown by whether the respondents visits a gym each week
37
36 Stacked bar chart never incidentally always
38
presenting means and medians in a graph and scatterplots
39
38 Means or medians in a bar graph
40
39 Means or medians in a line graph The obvious drawback of this graph (and the previous) is that you cannot see the individual cases
41
40 Scatterplot (aka scattergram)
42
41 scatter plot each dot is one case
43
42 No correlation
44
43 Perfect correlation
45
44 Imperfect correlation
46
MEASURES OF ASSOCIATION
47
46 Measures of association not applicable if at least one variable is nominal strength of the relationship direction of the relationship Spearman's rho (ρ) Pearson's correlation coefficient (Pearson's r) There are others, that we will not discuss in this class
48
47 3. Pearson's correlation coefficient (r) used for the relation between two interval/ratio variables varies between -1 and 1 +1 perfect positive correlation –1 perfect negative correlation 0 no correlation at all +0.8 realistic positive correlation
49
Pearson's correlation coefficient (r) The strength of the linear association between two interval variables is quantified by Pearson's correlation coefficient. The formula for Pearson's correlation takes on many forms. This one is used most frequently: OR: These formulas are not on the exam
50
49 OR The easy way A simple looking formula can be used if the numbers are converted into z-scores: So this is the importance of the Z-scores where z x is the variable X converted into z scores and z y is the variable Y converted into z scores. This formula is on the exam
51
50 2. Spearman's rho (ρ) used for the relation between an ordinal variable and another variable (not nominal) varies between -1 and 1 (indicating the direction and strength of the relationship)
52
51 Spearman's rho (ρ) and Pearson's r In principle, ρ is simply a special case of Pearson's correlation coefficient (r) in which the two variables are converted to rankings before calculating the coefficient.
53
05/21/1252 Levels of measurement nominalordinal interval or ratio dichotomy nominal Cramérs V or eta Cramérs V ordinal Cramérs Vrho interval or ratio Cramérs V or eta rhor dichotomy Cramérs Vrho and measures of association phi
54
53 Positive correlation coefficient (+1)
55
54 Negative correlation coefficient (-1)
56
55 No correlation coefficient (0)
57
56 Realistic positive and strong correlation coefficient (.7 or so)
58
4. SPSS workshop
59
77 Open new data sheet Create two new variables, shoe size and height your teacher may choose different variables For 10 persons sitting in the back rows, enter values Create all plots and correlations coefficients that we discussed so far Some of those are not applicable to the data SPSS Topics
60
6. Homework
61
77 Homework Assignment
62
61
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.