Exercise 1 (a): producing individual tables, using the cross-tabs menu

Slides:



Advertisements
Similar presentations
Height of 5 different tomato plant populations from different areas tomato plants height (cm)pop1pop2pop3pop4pop
Advertisements

WINKS SDA Statistical Data Analysis (Windows Kwikstat) Getting Started Guide.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
A Simple Guide to Using SPSS© for Windows
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
2 Categorical Variables (frequencies) Testing mean differences of a continuous variable between groups (categorical variable) 2 Continuous Variables 2.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Using Google Sheets To help with data. Sheets is a spreadsheet program that can interface with Docs, or Slides A spreadsheet program has cells (little.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Inference for Linear Regression
I-squared Conceptually, I-squared is the proportion of total variation due to ‘true’ differences between studies. Proportion of total variance due to.
Descriptive Statistics: Tabular and Graphical Methods
Introduction to Marketing Research
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Clinical Calculation 5th Edition
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Logistic Regression APKC – STATS AFAC (2016).
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
The Practice of Statistics in the Life Sciences Third Edition
Comparing Three or More Means
Association between two categorical variables
Statistical Inference for more than two groups
Basic Statistics Overview
Analysis of Covariance (ANCOVA)
Laugh, and the world laughs with you. Weep and you weep alone
(Residuals and
Edexcel: Large Data Set Activities
Comparing k Populations
CHAPTER 1: Picturing Distributions with Graphs
Prepared by Lee Revere and John Large
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Statistical Analysis using SPSS
15.1 Goodness-of-Fit Tests
Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.
One way ANOVA One way Analysis of Variance (ANOVA) is used to test the significance difference of mean of one dependent variable across more than two.
Program This course will be dived into 3 parts: Part 1 Descriptive statistics and introduction to continuous outcome variables Part 2 Continuous outcome.
Analyzing Bivariate Data
Factorial Analysis of Variance
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
Section 4-3 Relations in Categorical Data
Multiple Regression – Split Sample Validation
Xbar Chart By Farrokh Alemi Ph.D
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means I: (Extending) Analysis.
Simple tests in SPSS.
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Exercise 1: Open the file ‘Birthweight_reduced’
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Chapter 13 Excel Extension: Now You Try!
Exercise 1: Gestational age and birthweight
How to Use Microsoft Excel for Data Entry
Exercise 1: Open the file ‘Birthweight_reduced’
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Exercise 1 (a): producing individual tables, using the cross-tabs menu Choosing the right test solutions Exercise 1 (a): producing individual tables, using the cross-tabs menu

Exercise 1 (a): using Custom Tables Analyse  Tables  Custom tables Drag Biopsy to the columns field Drag Smoking, HPV16 and HPV18 to the rows. Note that you need to add HPV16 and HPV18 underneath smoking, otherwise subsets them

Exercise 1 (a): using Custom Tables To compare between the different outcome groups it is useful to display the percentages in the different levels of the potential explanatory groups (Smoking, HPV16 & HPV18): Click on BIOPSY to highlight this variable In the bottom left of the dialogue box click on Summary Statistics As you are interested in comparing between Biopsy groups, click on the box to the left of Column Percent to see the options, then select Column N % and move it to the Display field on the right Click on Apply to Selection and Close Back in the Custom Tables main dialogue box, click on OK to obtain the table

Exercise 1 (a): using Custom Tables

Exercise 1 (a): Graphs Graphs  Legacy Dialogue  Stacked Barchart… Note: Once you obtain your graph, you can edit it by double clicking on it to open the Chart Editor dialogue box. To get the stacks to represent %’s summed to 100%, click on the image on the top right: Can also change the colours of the bars and font size etc

Exercise 1 (a): Graphs

Exercise 1 (a): Summary Overall the percentages in each smoking category are broadly similar between the two biopsy groups. This pattern is also reflected in the percentages in the HPV18 groups with almost all patients being negative for HPV18 infection in both biopsy groups. However, marked differences are noticeable for HPV16 infection with 11% in the abnormal biopsy group testing positive for HPV16 infection, compared to less than 1% in the normal biopsy group. Note that there is a great deal of missing data on smoking status, with 65% of individuals having missing data (432/664)

Exercise 1 (b): Chi-squared test Both the outcome (Biopsy status) and infection types (HPV16 & HPV18) are categorical. To see if HPV infection and Biopsy status are related we use a Chi-squared test (provided the assumptions are met). If it is not met, use Fisher’s exact test HPV16 results: A chi-squared test was carried out to examine whether there was a relationship between HPV16 infection and biopsy result. The result was statistically significant at the 5% level (p< 0.001) indicating that there was strong evidence of a relationship: Less than 1% on individuals in the normal biopsy group had a HPV16 infection compared to over 11% in the abnormal biopsy group

Exercise 1 (b): Chi-squared test HPV18 results: As more than 25% of cell counts were less than 5, the assumptions underlying the chi-squared test were not valid. Thus Fishers exact test was carried out to examine the relationship between HPV18 infection and biopsy result. The result was not statistically significant at the 5% level (p=0.183) indicating a lack of evidence of a relationship between HPV18 infection and biopsy result. For both groups the HPV18 infection rate was low (0.5% in the normal group and 2% in the abnormal biopsy group).

Exercise 2: Machines As measured here, the time to error is continuous and the standard method for comparing outcomes between two groups for a continuous outcome is the t-test. Examining the histograms of the time to error for the two machines we can see that the data are highly skewed. Thus one of the key assumptions underlying the t-test is violated and to compare differences between the groups we should use the Mann-Whitney U test

Exercise 2: Machines A Mann-Whitney U test was conducted to examine whether there was a difference in the distribution of the time to error between two machines. The p-value was 0.042 indicating that there was a statistically significant difference in the distribution of the time to error between the two machines. Machine one had a median time to error of 2.34 days and machine 2 had a median time of 4.58 days

Exercise 3: Is there a relationship between age and bone density? Categorised age and categorised bone density Both variables are categorical and to investigate a relationship between the two, we can use a chi-squared test (Note: if the assumptions don’t hold then we use Fisher’s exact test). The results of the chi-squared test are statistically significant (p=0.002) and indicate that there is a strong association between age and bone density. As age increases there is a tendency for bone density to decrease. The percentage of individuals in the older age group in the higher bone density group was 15.4% (2/13), compared to 67.9% in the younger age group (19/28)

Exercise 3: Is there a relationship between age and bone density? Categorised age and continuous bone density Consider carefully what the outcome variable is. As bone density is likely to be related to age in that as a person ages, they lose bone density, thus bone density is the outcome and age is the explanatory variable. Here we have two groups (young & old) and we are comparing them with respect to a continuous outcome variable (bone density). Hence we would use a t- test to compare the groups. The numbers in each group are small so it is difficult to say whether they are normally distributed but the t-test is reasonably robust to this assumption, so we can use a t-test

Exercise 3: Is there a relationship between age and bone density? Categorised age and continuous bone density The result of the t-test is statistically significant at the 5% level (p<0.001) and we conclude that the mean bone density is different for people aged less than 50 years compared to those who are 50 years or more. On average bone density is 0.16 g/cm2 lower for the older age group compared to the younger age group (95% confidence interval: 0.09 to 0.24 gm/cm2)

Exercise 3: Is there a relationship between age and bone density? Continuous age and continuous bone density Start by producing a scatter plot. As age is the predictor (explantory variable) and bone density is the outcome variable, plot age on the horizontal axis and bone density on the vertical axis. Always plot the explanatory variable on the horizontal axis and the outcome on the vertical axis There is a negative linear relationship between age and bone density. As age increases bone density decreases

Exercise 3: Which approach is most appropriate In general the third approach with continuous age and bone density works best as not only does it tell you that there is a relationship between age and bone density (both variables categorised), or what the average difference is between young and old (categorised age and continuous bone density), it also allows you to quantify what the average loss is for each year of aging

Exercise 4: Cholesterol We are comparing a continuous measure between three groups (South, Midlands and North) so the most appropriate analysis to use is an analysis of variance. The numbers in each region are quite small so it is difficult to say whether cholesterol is normally distributed but ANOVA is quite robust to this assumption.

The results of the ANOVA suggest that cholesterol levels differ significantly by region (p-value for the overall regression < 0.001). The average cholesterol in the South is 192.5, in the Midlands it is 203.8 and in the North it is 248.1. Post hoc comparisons between the groups demonstrated that the South and Midlands were not statistically different from each other but both were different from the North