Simple tests in SPSS.

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Bivariate Analysis Cross-tabulation and chi-square.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Statistical Tests Karen H. Hagglund, M.S.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
MSc Applied Psychology PYM403 Research Methods Quantitative Methods I.
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
Chi-square Test of Independence
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Non-parametric statistics
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Hypothesis testing – mean differences between populations
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
MK346 – Undergraduate Dissertation Preparation Part II - Data Analysis and Significance Testing.
Nonparametric Statistics
Chi-square Test of Independence
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
PART 2 SPSS (the Statistical Package for the Social Sciences)
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Chapter 4 Selected Nonparemetric Techniques: PARAMETRIC VS. NONPARAMETRIC.
I. ANOVA revisited & reviewed
Introduction to Marketing Research
Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
Statistical Significance
Data measurement, probability and Spearman’s Rho
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
Chapter 12 Chi-Square Tests and Nonparametric Tests
Presentation 12 Chi-Square test.
Done by : Mohammad Da’as Special thanks to Dana Rida and her slides 
5.1 INTRODUCTORY CHI-SQUARE TEST
Lecture Slides Elementary Statistics Twelfth Edition
Hypothesis testing. Chi-square test
Statistics.
Social Science Research Design and Statistics, 2/e Alfred P
Inferential Statistics
Spearman’s rho Chi-square (χ2)
Inferential Statistics
Inferential Statistics
Inferential statistics,
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Inferential Statistics
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
Reasoning in Psychology Using Statistics
Hypothesis Testing and Comparing Two Proportions
Hypothesis testing. Chi-square test
Two Categorical Variables: The Chi-Square Test
Statistical Analysis using SPSS
Non-parametric tests, part A:
Part 2 - Compare average in different groups
Analyzing the Association Between Categorical Variables
Parametric versus Nonparametric (Chi-square)
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Introduction to Hypothesis Testing Dr Jenny Freeman Mathematics & Statistics Help University of Sheffield.
Learning outcomes By the end of this session you should know about:
Exercise 1: Open the file ‘Birthweight_reduced’
Exercise 1: Entering data into SPSS
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Exercise 1 (a): producing individual tables, using the cross-tabs menu
BUS-221 Quantitative Methods
BUS-221 Quantitative Methods
MASH R session 3.
Learning outcomes By the end of this session you should know about:
Exercise 1: Open the file ‘Birthweight_reduced’
Presentation transcript:

Simple tests in SPSS

Download the slides from the MASH website MASH > Resources > Statistics Resources > Workshop materials

Learning outcomes By the end of this session you should understand: The difference between paired and unpaired data When to use some simple statistical tests and the types of data that they apply to By the end of this session you should be able to: Undertake a t-test in SPSS Undertake a chi-squared test in SPSS Check the assumptions underlying these tests Appropriately report the results of these tests

Steps for choosing the right test Clearly define your research question Decide which are the outcome (dependent) / explanatory (independent) variables What data types are they? How are these summarised? What charts can you use to display them?

Recap: Data types What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.

Recap: Data types What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.

Steps for choosing the right test Are you interested: Comparing groups. How many groups are there? Assessing/modelling the relationship between variables Are the observations paired? Is the pairing due to having repeated measurements of the same variable for each subject? Does the test you have chosen make any assumptions? Are the assumptions met? e.g. assumption of normality for t-test Today we are looking at how you test for a difference between two groups: Continuous outcome (t-test) Categorical outcome (chi-squared test)

Comparing 2 groups Comparing: Paired or unpaired observations Test Assumptions not met Continuous outcome Unpaired Independent samples t- test Mann-Whitney U test Paired Paired t-test Wilcoxon Signed rank test Categorical outcome Chi-squared test Fisher’s exact test McNemar’s test No simple answer. Speak to a statistician!

Comparing 2 groups Comparing: Paired or unpaired observations Test Assumptions not met Continuous outcome Unpaired Independent samples t- test Mann-Whitney U test Paired Paired t-test Wilcoxon Signed rank test Categorical outcome Chi-squared test Fisher’s exact test McNemar’s test No simple answer. Speak to a statistician!

Paired data Most commonly, measurements from the same individuals collected on more than one occasion Can be used to look at differences in mean score: 2 or more time points e.g. before/after a diet 2 or more conditions e.g. hearing test at different frequencies Each person listened to a sound until they could no longer hear it at two different frequencies. Would use paired t-test

Continuous outcome Comparing: Paired or unpaired observations Test Assumptions not met Continuous outcome Unpaired Independent samples t- test Mann-Whitney U test Paired Paired t-test Wilcoxon Signed rank test Categorical outcome Chi-squared test Fisher’s exact test McNemar’s test No simple answer. Speak to a statistician!

Continuous outcome: Example Is there a difference in the average commute time between London and the Yorkshire & Humber region? Outcome: journey time – continuous data Two groups: London / Yorkshire & Humber region Independent samples t-test

Example: Independent samples t-test Analyze Compare Means  Independent-Samples T Test…

Example: Independent samples t-test Analyze Compare Means  Independent-Samples T Test… Move outcome variable Commute time into the ‘Test Variable(s)’ box Move the group variable Region (term-time or holidays) to the ‘Grouping Variable’ box

Example: Independent samples t-test Click on ‘Define Groups’ to tell SPSS what values of the grouping variable to use. In this case the groups are coded as: 1: London 2: Yorkshire & Humber Click Continue Click OK

Basic statistics comparing the two groups Example: Understanding the output Basic statistics comparing the two groups Key output table

Basic statistics comparing the two groups Example: Understanding the output Basic statistics comparing the two groups Key output table

Basic statistics comparing the two groups Example: Understanding the output Basic statistics comparing the two groups

Statistical significance Null: There is NO difference in the commute times between London and Yorkshire & Humber  𝑚𝑒𝑎𝑛 𝐿=𝑚𝑒𝑎𝑛 𝑌𝐻 Alternative: There is A difference in the commute times between London and Yorkshire & Humber  𝑚𝑒𝑎𝑛 𝐿 ≠𝑚𝑒𝑎𝑛 𝑌𝐻   p < 0.05 p ≥ 0.05 Result is Statistically significant Not statistically significant Decide Reject the null in favour of the alternative That there is insufficient evidence to reject the null hypothesis Conclusion There is evidence to suggest that there is a difference in the mean commute times between London and Yorkshire & Humber There is a LACK OF EVIDENCE to suggest that there is a difference in the mean commute times between London and Yorkshire & Humber

Statistical significance The significance level is usually set at 5% (0.05) The smaller the p-value, the more confident we are with our decision to reject the null hypothesis p-value Decision > 0.05 Do not reject 0.01 - 0.05 Evidence to reject 0.001 - 0.01 Strong evidence to reject < 0.001 Overwhelming evidence to reject

Example: Reporting the results An independent samples t-test was carried out to examine whether there was a difference in the average commute time between London and Yorkshire & Humber. There is evidence to suggest that on average it takes longer to travel to work in London than in Yorkshire & Humber (p=0.001) As p = 0.001, there is only a 1 in 1000 chance of rejecting the null when it is true (type 1 error – claiming a significant difference when there is none) What is the difference? For my sample, the average commute in London is 16 minutes longer than in Yorkshire and Humber (95% CI: 6.8 to 24.5 minutes)

Assumptions for Independent samples t-test The data in the groups are approximately normally distributed The variability in the two groups is the same. Can either look at the value of the standard deviations (should be similar, as a rule of thumb the value of the larger standard deviation should be no more than twice the value of the smaller standard deviation. Or can test for this using Levene’s test Groups are independent (no way to test for this, it should be implicit in the design)

Assumption 1: Data in the groups are approximately normally distributed Graphs  Legacy Dialogs  Histogram Add ‘Region’ to the Rows Box Don’t need to be perfect, just approximately symmetrical

Assumption 2: Variances are the same Two options: Look at the standard deviations in the Group Statistics table. These look similar enough (don’t expect them to be exactly the same) Look at Levene’s test in the main output table: should be not significant, i.e. p > 0.05, then can assume the variances do not differ from each other If the variances are not equal report results from the second line of output ‘Equal variances not assumed’

Exercise 1: Open the file ‘Birthweight_reduced’ Recode mnocig ‘Number of cigarettes smoked per day’ into smoker/non-smoker (tip: use ‘Transform  Recode into different variable’ and create a new variable ‘Smoker’ with codes 0: non-smoker; 1: smoker) Conduct a t-test to examine whether birthweight differed between women who smoked and women who did not smoke. Don’t forget to look at the assumptions and see if they are met What do you conclude?

Independent t-test options Is the dependent variable normally distributed for both groups? Use the Mann-Whitney U test and report medians No Yes Is one SD more than twice the other? Use adjusted t-test and means Yes No Use standard t-test and report means

Example: Mann-Whitney U test Cost of ticket on the Titanic The data are highly skewed. There appear to be a few individuals who paid more than £300 for their ticket What happens if we exclude them?

Example: Mann-Whitney U test. Cost of ticket on the Titanic (excluding > £300) Even excluding them, the data are still highly skewed

Example: Mann-Whitney U test Analyze Nonparametric Tests  Independent Samples…

Example: Mann-Whitney U test In the Objective tab, make sure that ‘Automatically compare distributions across groups’ is selected Click on the Fields tab Move ‘Cost of ticket’ to the Test Fields box Move ‘Survived’ to the Groups box Click Run

Key (only!) output table Example: Understanding the output Key (only!) output table The p-value is ‘Sig’. This is recorded in the table as 0.000, but should be reported as p< 0.001 Note that the Mann-Whitney U test is a test of the distributions of the data in the two groups. It tests the null that the distribution of the data in the two groups is the same As the p-value is < 0.001, the result is statistically significant. To make sense of the results, look at key statistics, such as the medians in each group and report these.

Example: Reporting the results A Mann-Whitney U test was carried out to see if there was a difference in the price paid for a ticket between passengers who died and passengers who survived on the Titanic. There is very strong evidence (p < 0.001) to suggest that ticket price differed between the two groups What is the difference? The median price paid for a ticket was much lower for those who died (£10.50) compared to those who survived (£26)

Paired data Most commonly, measurements from the same individuals collected on more than one occasion Can be used to look at differences in mean score: 2 or more time points e.g. before/after a diet 2 or more conditions e.g. hearing test at different frequencies Each person listened to a sound until they could no longer hear it at two different frequencies. Would use paired t-test

Paired data: weight loss after diet The manufacturers claim that their drug will reduce weight without making any dietary changes. Weights before and after the trial were compared for each person Test: Paired t-test (before/after weights) Null: The average change in weight loss is 0 Alternative: The average change in weight loss less than 0 (if after – before is calculated)

Paired t-test A paired t-test is a test of the paired differences (d), NOT the original data It tests the null hypothesis that the mean of the differences is 0 For each subject, the difference (change) is calculated The mean 𝑋 and the SD 𝑠 of the differences are calculated If there is no change, the mean difference is roughly 0 These differences need to be normally distributed

Example: Paired t-test Note that the paired data need to be organised in two separate columns ‘After’ and ‘Before’ Analyze Compare Means  Paired-Samples T Test…

Example: Paired t-test Select the pair of variables to be compared. In this case ‘Weight after diet’ and ‘Weight before diet’ Click OK

Basic statistics comparing the two groups Example: Understanding the output Basic statistics comparing the two groups IGNORE Key output table

Basic statistics comparing the two groups Example: Understanding the output Basic statistics comparing the two groups IGNORE

Example: Reporting the results A paired t-test was conducted to examine whether a particular diet had an impact on weight loss. There is strong evidence to suggest that on the diet did have an impact on weight loss (p=0.004). As p = 0.004, there is only a 0.4% chance (or 1 in 250) of rejecting the null when it is true (type 1 error – claiming a significant difference when there is none) What is the difference? For these data, the average weight loss was 4kgs (95% CI for weight loss: 1.48 to 6.69 kg)

Key assumptions for paired t-test The paired differences are approximately normally distributed Each individual is independent of every other individual (can’t check this statistically, it should be implicit in the design)

Assumption 1: Paired data are approximately normally distributed Graphs  Legacy Dialogs  Histogram Doesn’t need to be perfect, just approximately symmetrical

Exercise 2: Open the file ‘Journey Time’ This file contains data on my journey to and from work. These data are paired by day as each to/from combination represents a particular day Conduct a paired t-test to examine whether it takes me longer to cycle home than it does to cycle to work. Don’t forget to check the assumption that the paired differences are approximately normally distributed (you will need to calculate the differences to do this: (tip: use ‘Transform  Compute variable’ and create a new variable of the differences) What do you conclude?

Paired t-test options Analyze Nonparametric tests  Related Samples… Are the paired differences normally distributed? Use the non-parametric Wilcoxon Signed rank test. Report the median of the differences No Yes Paired t-test and report mean difference Analyze Nonparametric tests  Related Samples…

Example: Wilcoxon signed rank test In the Objective tab, make sure that ‘Automatically compare distributions across groups’ is selected Click on the Fields tab Move the two time variables to the Test Fields box Click Run

Key (only!) output table Example: Understanding the output Key (only!) output table The p-value is ‘Sig’: 0.013 Note that the Wilcoxon signed rank test is a test of the median of the differences. It tests the null that the median difference is 0 As the p-value is 0.013, the result is statistically significant. To make sense of the results, look at key statistics, such as the median difference and report this.

Categorical outcome Comparing: Paired or unpaired observations Test Assumptions not met Continuous outcome Unpaired Independent t-test Mann-Whitney U test Paired Paired t-test Wilcoxon Signed rank test Categorical outcome Chi-squared test Fisher’s exact test McNemar’s test No simple answer. Speak to a statistician!

Titanic The Titanic sank in 1912 with the loss of most of its passengers Data: Details can be obtained on 1309 passengers and crew on board the ship Titanic

Example: Chi-squared test Research question: Did class affect survival? Have two categorical variables ‘Class’: First / Second / Third ‘Survival’: Alive / Died Null: There is no relationship between class and survival Alternative: There is a relationship between class and survival This is called a 3 x 2 contingency table

Example: Chi-squared test Analyze Descriptive Statistics  Crosstabs… Move Class to the Rows box and Survived to the Columns box Click on Statistics to open the Statistics dialogue box

Example: Chi-squared test Click the Chi-squared box Click on Continue. This will close the Statistics dialogue box Back in the main Crosstabs dialogue box click on the Cells box

Example: Chi-squared test Click Percentages Row box. This will give you the percentages across the row. i.e. within each each level of class (in the columns) it will show the percentage who survived and died Click Continue. This will close the Cell display dialogue box Back in the main Crosstabs dialogue box click OK

Information the number of observations and whether any missing Example: Understanding the output Information the number of observations and whether any missing Basic cross-tabulation of class by survival. Note the percentage surviving/dying in each class. This is a 3 x 2 table as it has three rows (class) and 2 columns (survival) Results of the chi-squared test. Have the chi-squared statistic (127.859), its degrees of freedom (2) and the p-value (0.000). Note that when writing up, should write p< 0.001 NOT p=0.000

Example: Reporting the results A chi-squared test was conducted to examine whether there was a relationship between travel class and survival for passengers on board the Titanic. There is very strong evidence (p < 0.001) to suggest that class and survival were linked What is the difference? 38% of passengers who travelled first class died (123/323) compared to 57% in second class (158/277) and 74.5% in third class (528/709)

Example: Reporting the results Can also use a stacked barchart to summarise the results:

Key assumptions for Chi-squared test The chi-squared tests is a ‘large sample’ test. It works by calculating the expected number in each cell (based on the row and cell totals). No more than 20% of these expected numbers should be < 5; and all of them should be > 1 The data should be uncorrelated. For example if you had matched data or you were looking at the response of individuals under two different conditions, these should not be analysed using a chi-squared test. This is inherent in the design of the study: Condition 2 Outcome 1 Outcome 2 Condition 1

Assumption 1: Expected frequencies In this case the assumption that no more than 20% of the expected frequencies are < 5 is met If it is not met, can combine the categories or do a Fisher’s exact test. For example:

Example: Fisher’s exact test (for info) Analyze Descriptive Statistics  Crosstabs… Move Class to the Rows box and Survived to the Columns box Click on Exact to open the Exact Tests dialogue box Click Exact and then Continue

Exercise 3: Open the file ‘Titanic’ Conduct a Chi-squared test to examine whether there is a relationship between gender and survival. Remember to check the assumption that no more than 20% of the expected frequencies are < 5 and none are < 1 What do you conclude?

Recap: Comparing 2 groups Paired or unpaired observations Test Assumptions not met Continuous outcome Unpaired Independent samples t- test Mann-Whitney U test Paired Paired t-test Wilcoxon Signed rank test Categorical outcome Chi-squared test Fisher’s exact test McNemar’s test No simple answer. Speak to a statistician!

Learning outcomes You should now understand: The difference between paired and unpaired data When to use some simple statistical tests and the types of data that they apply to You should now be able to: Undertake a t-test in SPSS Undertake a chi-squared test in SPSS Check the assumptions underlying these tests Appropriately report the results of these tests

Maths And Statistics Help Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm) http://www.sheffield.ac.uk/mash

Resources: All resources are available in paper form at MASH or on the MASH website

Contacts Follow MASH on twitter: @mash_uos Staff Jenny Freeman (j.v.freeman@sheffield.ac.uk) Basile Marquier (b.marquier@sheffield.ac.uk) Marta Emmett (m.emmett@sheffield.ac.uk) Website http://www.sheffield.ac.uk/mash Follow MASH on twitter: @mash_uos