Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests Brian Yuen 15 January 2013.

Slides:

Advertisements

Similar presentations

Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.

Advertisements

A PowerPoint®-based guide to assist in choosing the suitable statistical test. NOTE: This presentation has the main purpose to assist researchers and students.

CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.

Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.

Departments of Medicine and Biostatistics

Statistical Tests Karen H. Hagglund, M.S.

By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.

Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.

MSc Applied Psychology PYM403 Research Methods Quantitative Methods I.

Final Review Session.

Chapter 19 Data Analysis Overview

Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.

Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.

Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 18-1 Chapter 18 Data Analysis Overview Statistics for Managers using Microsoft Excel.

Summary of Quantitative Analysis Neuman and Robson Ch. 11

Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.

Non-Parametric Methods Professor of Epidemiology and Biostatistics

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.

Inferential Statistics

Understanding Research Results

Categorical Data Prof. Andy Field.

Hypothesis Testing Dr Trevor Bryant. Learning Outcomes Following this session you should be able to: Understand the concept and general procedure of hypothesis.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.

Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,

 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.

Multiple Choice Questions for discussion

Simple Linear Regression

Statistics for clinical research An introductory course.

Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.

Choosing and using statistics to test ecological hypotheses

Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.

How to Teach Statistics in EBM Rafael Perera. Basic teaching advice Know your audience Know your audience! Create a knowledge gap Give a map of the main.

Common Nonparametric Statistical Techniques in Behavioral Sciences Chi Zhang, Ph.D. University of Miami June, 2005.

X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx  Let X =  cholesterol level (mg/dL);

Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)

Linear correlation and linear regression + summary of tests

Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.

Statistics for the Terrified Talk 4: Analysis of Clinical Trial data 30 th September 2010 Janet Dunn Louise Hiller.

STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.

Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.

ANALYSIS PLAN: STATISTICAL PROCEDURES

Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.

Simple linear regression Tron Anders Moger

Chap 18-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 18-1 Chapter 18 A Roadmap for Analyzing Data Basic Business Statistics.

Statistical Analysis using SPSS Dr.Shaikh Shaffi Ahamed Asst. Professor Dept. of Family & Community Medicine.

Master’s Essay in Epidemiology I P9419 Methods Luisa N. Borrell, DDS, PhD October 25, 2004.

Statistics for Neurosurgeons A David Mendelow Barbara A Gregson Newcastle upon Tyne England, UK.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

Analisis Non-Parametrik Antonius NW Pratama MK Metodologi Penelitian Bagian Farmasi Klinik dan Komunitas Fakultas Farmasi Universitas Jember.

Principles of statistical testing

Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.

Biostatistics Nonparametric Statistics Class 8 March 14, 2000.

Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.

Nonparametric Statistics

Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.

Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.

 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.

Additional Regression techniques Scott Harris October 2009.

PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT

Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.

Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.

Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.

Description of Data (Summary and Variability measures)

Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Hypothesis testing. Chi-square test

Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

Presentation transcript:

Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests Brian Yuen 15 January 2013

Using appropriate statistics and graphs Report statistics and graphs depends on the types of variables of interest: For continuous (Normally distributed) variables N, mean, standard deviation, minimum, maximum histograms, dot plots, box plots, scatter plots For continuous (skewed) variables N, median, lower quartile, upper quartile, minimum, maximum, geometric mean For categorical variables frequency counts, percentages one-way tables, two-way tables bar charts 2

Using appropriate statistics and graphs… Z=Cat. Y=Cat. Y=Cont. X=Cat. Use 3-Way Table X=Cont. X=Time N/A All these graphs are available in Chart Builder, from the Choose from: list.

Flow chart of commonly used descriptive statistics and graphical illustrations Categorical data Frequency Percentage (Row, Column or Total) Continuous data: Measure of location Descriptive statistics Mean Median Continuous data: Measure of variation Standard deviation Range (Min, Max) Inter-quartile range (LQ, UQ) Exploring data Categorical data Bar chart Clustered bar charts (two categorical variables) Bar charts with error bars Continuous data Graphical illustrations Histogram (can be plotted against a categorical variable) Box & Whisker plot (can be plotted against a categorical variable) Dot plot (can be plotted against a categorical variable) Scatter plot (two continuous variables)

Choosing appropriate statistical test Having a well-defined hypothesis helps to distinguish the outcome variable and the exposure variable Answer the following questions to decide which statistical test is appropriate to analysis your data What is the variable type for the outcome variable? Continuous (Normal, Skew) / Binary / Time dependent If more than one outcomes, are they paired or related? What is the variable type for the main exposure variable? Categorical (1 group, 2 groups, >2 groups) / Continuous For 2 or >2 groups: Independent (Unrelated) / Paired (Related) Any other covariates, confounding factors?

Flow chart of commonly used statistical tests Exposure variable Normal Skew 1 group One-sample t test Sign test / Signed rank test 2 groups Two-sample t test Mann-Whitney U test Continuous Paired Paired t test Wilcoxon signed rank test >2 groups One-way ANOVA test Kruskal Wallis test Continuous Pearson Corr / Linear Reg Spearman Corr / Linear Reg 1 group Chi-square test / Exact test 2 groups Chi-square test / Fisher’s exact test / Logistic regression Outcome variable Categorical Paired McNemar’s test / Kappa statistic >2 groups Chi-square test / Fisher’s exact test / Logistic regression Continuous Logistic regression / Sensitivity & specificity / ROC 2 groups KM plot with Log-rank test Survival >2 groups KM plot with Log-rank test Continuous Cox regression

http://www. som. soton. ac http://www.som.soton.ac.uk/learn/resmethods/statisticalnotes/which_test.htm

Case Studies

CONTINUOUS & ORDINAL DATA Case Study 1 A simple study investigating: the fitness level of our locally selected group of healthy volunteers with the published average value on fitness level which was done previously on the national level fitness level was measured by the length of time walking on a treadmill before stopping through tiredness Objective: any difference between the group average and the published value Outcome & type: Exposure & type: If the continuous outcome is Normally distributed  Not Normally distributed      vs.     

CONTINUOUS & ORDINAL DATA Case Study 2 A clinical trial investigating: the effect of two physiotherapy treatments (standard and enhanced exercise) for patients with a broken leg on their fitness level (length of time walking on a treadmill before stopping through tiredness) Objective: any difference between the 2 group averages Outcome & type: Exposure & type: If the continuous outcome is Normally distributed  Not Normally distributed                 

CONTINUOUS & ORDINAL DATA Case Study 3 Now each patient performs the walking test before and after enhanced physiotherapy treatment data might be presented as two variables, one as before data and the other as after data, but the values for individual patients are paired Objective: any difference between the before and the after averages Number of outcomes: Outcomes & type: If the difference in outcomes (e.g. after - before) is Normally distributed  Not Normally distributed                         

CONTINUOUS & ORDINAL DATA Case Study 4 Based on Case Study 2 (standard vs. enhanced exercises), but now with a control group i.e. patients without a broken leg Objective: any difference among the 3 group averages Outcome & type: Exposure & type: If the continuous outcome is Normally distributed  Not Normally distributed                         

CONTINUOUS & ORDINAL DATA Case Study 5 Now a group of patients each perform the walking test 3 times firstly when the cast is removed after six weeks of physiotherapy at six months after the physiotherapy treatment Objective: any improvement over time Number of outcomes: Outcomes & type: If the continuous outcome is Normally distributed  Not Normally distributed  Note –                                        

CONTINUOUS & ORDINAL DATA Case Study 6 Before the participants started their fitness test, their blood pressure (BP) was recorded by two different machines machine 1 was the ‘gold standard’ machine 2 was newly made and claimed to be more accurate aim to validate the measurements recorded from machine 2 by assessing the level of agreement with that obtained from machine 1 Objective: any agreement between measuring tools Number of outcomes: Outcomes & type: Choice of test: Note –                                

When the continuous outcome is not normally distributed? If outcome normally distributed use t-tests / ANOVA easy to obtain confidence interval for differences So far we’ve recommended using non-parametric tests when data not normal often less powerful non-parametric confidence intervals problematic Recall another possibility – take logs (natural log) of the outcome check to see if outcome looks normal after logging can then use t-tests / ANOVA estimate of the difference and its confidence interval on log scale easily available back transform to get estimate of percent change between groups back transform confidence interval better to analyse on log scale if data become normally distributed than to use non-parametric test

BINARY DATA Case Study 7 Fitness is now assessed only as Unfit / Fit could be as a result of dichotomising the previous continuous outcome (0-5 minutes = Unfit; >5 minutes = Fit) investigate whether the proportions of Unfit and Fit are equal (i.e. 50% each) after the standard treatment or compare the proportions to specific values (e.g. 10% Fit, 90% Unfit) Objective: any difference in proportion within the group (or any difference from the specific proportions) Outcome & type: Exposure & type: Choice of test: Unfit Fit Standard     

BINARY DATA Case Study 8 Similar setting as Case Study 2, but with the binary outcome defined from Case Study 7 (Unfit / Fit) to find out if the enhanced treatment is better than the standard treatment, i.e. more patients into the Fit category Objective: any difference in proportion between the groups Outcome & type: Exposure & type: Choice of test: Note – Unfit Fit Standard      Enhanced     

BINARY DATA Case Study 9 Fitness still assessed as Unfit / Fit, but we now have only one group of patients assessed before and after enhanced physiotherapy each patient was measured before and after treatment their status in fitness may change similar to Case Study 3 Objective: any change in status Number of outcomes: Outcomes & type: Choice of test: Before After Unfit Fit       

BINARY DATA Case Study 10 Recall resting blood pressure (BP) was recorded by two different machines (machine 1 and 2) on our participants from Case Study 6 the measurements were now categorised as Low BP and High BP could be as a result of dichotomising the previous continuous outcome by the default settings from the two machines aim to validate the status recorded from machine 2 by assessing the level of agreement with that obtained from machine 1 Objective: any agreement between measuring tools Number of outcomes: Outcomes & type: Choice of test: Mac. 1 Mac. 2 Low High      

SURVIVAL DATA Case Study 11 A clinical trial investigating the survival time of patients with a particular cancer patients are being randomised into a number of treatment groups they are then monitored until the end of the study the length of time between first diagnosis and death is recorded some people will still be alive at the end of study and we don’t want to exclude them Objective: any difference in the average survival time between groups Outcome & type: Exposure & type: Choice of test: Note –                                                

Comparing a binary outcome between two groups – data presented as a 2x2 table Unfit Fit Total Standard 80 (a) 140 (b) 220 (a+b) Enhanced 20 (c) (d) 240 (c+d) Table shows results from our trial (number of patients) Difference in proportion of Fit between groups (absolute difference): d/(c+d) - b/(a+b) An alternative parameter is the relative risk (multiplicative difference): d/(c+d) b/(a+b) Another alternative is the odds ratio: d/c ad b/a bc Chi-square test and Fisher’s exact test show if there is any association between the two independent variables, but it doesn’t provide the effect size between the groups regarding the outcome of interest, e.g. Fit =

Percentage of Fit in standard group: 140/220 (63 Percentage of Fit in standard group: 140/220 (63.6%) Percentage of Fit in enhanced group: 220/240 (91.7%) Parameter (95% CI) Absolute difference in proportions d/(c+d) - b/(a+b) 28.1% (21%, 35%)* Relative risk d/(c+d) Relative risk c/(a+b) 1.44 (1.29, 1.60) Odds ratio ad Odds ratio bc 6.29 (3.69, 10.72) * Asymptotic 95% confidence intervals (calculated in CIA)  95% confidence intervals calculated in SPSS Reminder: Report confidence intervals for ALL key parameter estimates If 95% confidence interval for a difference excludes 0  statistically significant e.g. Absolute difference If 95% confidence interval for a ratio excludes 1  statistically significant e.g. Relative risk and Odds ratio

Advantages and disadvantages of absolute and relative changes, and odds ratios Absolute difference simplest to calculate and to interpret when applied to number of subjects in a group gives number of subjects expected to benefit 1/(absolute difference) gives NNT – ‘number needed to treat’ to see one additional positive response Relative risk intuitively appealing a multiplicative effect – proportion (risk) of failure in the treatment group examined relative to (or compare to) that in the reference group different result depending on whether risks of ‘Fit’ or ‘Unfit’ are examined and whether ‘Standard exercise’ group is selected as the reference level natural parameter for cohort studies Odds ratio difficult to understand – unless you’re a betting person! ratio of ‘number of successes expected per number of failures’ between the treatment group of interest and the reference group invariant to whether rate of ‘Fit’, ‘Unfit’, or rate of taking ‘Enhanced exercise’ are examined logistic regression in terms of odds ratios natural parameter for case-control studies

CONTINUOUS & ORDINAL DATA Case Study 12 Now, in the physiotherapy trial, we wanted to investigate if there was any relationship between the participants’ fitness level and their age at assessment we suspected that age at assessment affected their fitness level regardless of the treatment group they were in quantify the relationship by the direction, strength, and magnitude Objective: assess and quantify the relationship between two variables Outcome & type: Exposure & type: Choice of test: If any of the variables is Normally distributed  If both variables are not Normally distributed 

CONTINUOUS & ORDINAL DATA Case Study 13 We now found, in Case Study 12, that age at assignment had some linear relationship with participants’ fitness level needed to quantify this relationship, i.e. what is the average fitness level at different age at assignment also wanted to predict fitness level for future patients, given their age at assignment Objective: set up a statistical model to quantify the effect of exposure variable on the outcome variable Outcome & type: Exposure & type: Choice of test: Note –

BINARY DATA Case Study 14 Similar analysis was performed as in Case Study 13, but substituted the binary fitness level (Unfit / Fit) instead of the continuous fitness level and wanted to predict the status of fitness level (Unfit / Fit) for future patients, given their age at assignment Objective: set up a statistical model to quantify the effect of exposure variable on the outcome variable Outcome & type: Exposure & type: Choice of test: Note –

BINARY DATA Case Study 15 Using the logistic regression model from Case Study 14, we can aim to evaluate the predictive performance of the regression model developed given we know the true outcome status of fitness level for each participant investigate the optimal predictive performance of the model relate the results to an individual participant indicating the likelihood of them having a specific status of fitness Objective: (1) assess the predictive performance of the model; (2) determine the probability that an individual test result is accurate Outcome & type: Exposure & type: Choice of test: (1) (2) Note –

SURVIVAL DATA Case Study 16 Recall the clinical trial investigating the survival time of patients with a particular cancer (Case Study 11) age at randomisation is now considered as an important factor in this relationship regardless of the treatment group still interested in the length of time between first diagnosis and death note that censored data still present due to some people having dropped out during follow-up, or are still alive at the end of study and we want to make use of this information Objective: set up a statistical model to quantify the relationship between the exposure variable and the survival status / time Outcome & type: Exposure & type: Choice of test: Note –

References Altman, D.G. Practical Statistics for Medical Research. Chapman and Hall 1991. Kirkwood B.R. & Sterne J.A.C. Essential Medical Statistics. 2nd Edition. Oxford: Blackwell Science Ltd 2003. Bland M. An Introduction to Medical Statistics. 3rd Edition. Oxford: Oxford Medical Publications 2000. Altman D.G., Machin D., Bryant, T.N. & Gardner M.J. Statistics with Confidence. 2nd Edition. BMJ Books 2000. Campbell M.J. & Machin D. Medical Statistics: A Commonsense Approach. 3rd Edition, 1999. Field A. Discovering Statistics Using SPSS for Windows. 2nd edition. London: Sage Publications 2005. Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, 307-310. Mathews JNS, Altman DG, Campbell MJ, Royston P (1990) Analysis of serial measurements in medical research. British Medical Journal, 300, 230-235.

Other web and software resources UCLA – What statistical analysis should I use? http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm DISCUS Discovering Important Statistical Concepts Using Spreadsheets Interactive spreadsheets, designed for teaching statistics Web-sites for download and information - http://www.coventry.ac.uk/ec/research/discus/discus_home.html Choosing the correct statistical test http://bama.ua.edu/~jleeper/627/choosestat.html SPSS for Windows Help Statistics Coach Statistics for the Terrified

Solutions to Case Studies

CONTINUOUS & ORDINAL DATA Case Study 1 A simple study investigating: the fitness level of our locally selected group of healthy volunteers with the published average value on fitness level which was done previously on the national level fitness level was measured by the length of time walking on a treadmill before stopping through tiredness Objective: any difference between the group average and the published value Outcome & type: fitness level (length of time) – continuous Exposure & type: one group only If the continuous outcome is Normally distributed  One-sample t test Not Normally distributed  Sign test / Signed rank test     vs.     

CONTINUOUS & ORDINAL DATA Case Study 2 A clinical trial investigating: the effect of two physiotherapy treatments (standard and enhanced exercise) for patients with a broken leg on their fitness level (length of time walking on a treadmill before stopping through tiredness) Objective: any difference between the 2 group averages Outcome & type: fitness level – continuous Exposure & type: treatment group – binary, independent (or unrelated) If the continuous outcome is Normally distributed  Two-sample t test Not Normally distributed  Mann-Whitney U test                

CONTINUOUS & ORDINAL DATA Case Study 3 Now each patient performs the walking test before and after enhanced physiotherapy treatment data might be presented as two variables, one as before data and the other as after data, but the values for individual patients are paired Objective: any difference between the before and the after averages Number of outcomes: 2 (before and after) Outcomes & type: fitness level – continuous, paired (or related) If the difference in outcomes (e.g. after - before) is Normally distributed  Paired t test Not Normally distributed  Wilcoxon signed rank test                        

CONTINUOUS & ORDINAL DATA Case Study 4 Based on Case Study 2 (standard vs. enhanced exercises), but now with a control group i.e. patients without a broken leg Objective: any difference among the 3 group averages Outcome & type: fitness level – continuous Exposure & type: treatment group – categorical (more than two levels), independent (or unrelated) If the continuous outcome is Normally distributed  One-way ANOVA test Not Normally distributed  Kruskal-Wallis test                        

CONTINUOUS & ORDINAL DATA Case Study 5 Now a group of patients each perform the walking test 3 times firstly when the cast is removed after six weeks of physiotherapy at six months after the physiotherapy treatment Objective: any improvement over time Number of outcomes: 3 (time points) Outcomes & type: fitness level – continuous, related (more than two repeated measures per patient) If the continuous outcome is Normally distributed  Repeated measures ANOVA test Not Normally distributed  Friedman’s test Note – might have a problem with patients dropping out Note – both approaches only use patients with measures at all three time points                                        

CONTINUOUS & ORDINAL DATA Case Study 6 Before the participants started their fitness test, their blood pressure (BP) was recorded by two different machines machine 1 was the ‘gold standard’ machine 2 was newly made and claimed to be more accurate aim to validate the measurements recorded from machine 2 by assessing the level of agreement with that obtained from machine 1 Objective: any agreement between measuring tools Number of outcomes: 2 (machines) Outcomes & type: blood pressure – continuous, paired (or related) Choice of test: Bland-Altman method (& Paired t-test) Note – the Bland-Altman method is not a statistical test Note – see the Bland and Altman paper for details                                

BINARY DATA Case Study 7 Fitness is now assessed only as Unfit / Fit could be as a result of dichotomising the previous continuous outcome (0-5 minutes = Unfit; >5 minutes = Fit) investigate whether the proportions of Unfit and Fit are equal (i.e. 50% each) after the standard treatment or compare the proportions to specific values (e.g. 10% Fit, 90% Unfit) Objective: any difference in proportion within the group (or any difference from the specific proportions) Outcome & type: fitness level category – binary Exposure & type: one group only Choice of test: Chi-square test (large sample size) Exact test (small sample size) Unfit Fit Standard     

BINARY DATA Case Study 8 Similar setting as Case Study 2, but with the binary outcome defined from Case Study 7 (Unfit / Fit) to find out if the enhanced treatment is better than the standard treatment, i.e. more patients into the Fit category Objective: any difference in proportion between the groups Outcome & type: fitness level category – binary Exposure & type: treatment groups – binary, independent (or unrelated) Choice of test: Chi-square test (large sample size) Fisher’s exact test (small sample size) Note – same tests for more than 2 groups Unfit Fit Standard      Enhanced    

BINARY DATA Case Study 9 Fitness still assessed as Unfit / Fit, but we now have only one group of patients assessed before and after enhanced physiotherapy each patient was measured before and after treatment their status in fitness may change similar to Case Study 3 Objective: any change in status Number of outcomes: 2 (before and after) Outcomes & type: fitness level category – binary, paired (or related) Choice of test: McNemar’s test Before After Unfit Fit       

BINARY DATA Case Study 10 Recall resting blood pressure (BP) was recorded by two different machines (machine 1 and 2) on our participants from Case Study 6 the measurements were now categorised as Low BP and High BP could be as a result of dichotomising the previous continuous outcome by the default settings from the two machines aim to validate the status recorded from machine 2 by assessing the level of agreement with that obtained from machine 1 Objective: any agreement between measuring tools Number of outcomes: 2 (machines) Outcomes & type: blood pressure status (from each machine) – binary, paired (or related) Choice of test: Kappa statistic Mac. 1 Mac. 2 Low High      

SURVIVAL DATA Case Study 11 A clinical trial investigating the survival time of patients with a particular cancer patients are being randomised into a number of treatment groups they are then monitored until the end of the study the length of time between first diagnosis and death is recorded some people will still be alive at the end of study and we don’t want to exclude them Objective: any difference in the average survival time between groups Outcome & type: time monitored & death status – survival Exposure & type: treatment group – binary, independent (or unrelated) Choice of test: KM plot with Log-rank test Note – we can also apply this to our physiotherapy example, to look at the “survival time”, that is the time to stop walking on the treadmill through tiredness for both groups of patients in the presence of censored data                                                

CONTINUOUS & ORDINAL DATA Case Study 12 Now, in the physiotherapy trial, we wanted to investigate if there was any relationship between the participants’ fitness level and their age at assessment we suspected that age at assessment affected their fitness level regardless of the treatment group they were in quantify the relationship by the direction, strength, and magnitude Objective: assess and quantify the relationship between two variables Outcome & type: fitness level – continuous Exposure & type: age at assessment – continuous Choice of test: If any of the variables is Normally distributed  Pearson correlation If both variables are not Normally distributed  Spearman’s rank correlation

CONTINUOUS & ORDINAL DATA Case Study 13 We now found, in Case Study 12, that age at assignment had some linear relationship with participants’ fitness level needed to quantify this relationship, i.e. what is the average fitness level at different age at assignment also wanted to predict fitness level for future patients, given their age at assignment Objective: set up a statistical model to quantify the effect of exposure variable on the outcome variable Outcome & type: fitness level – continuous Exposure & type: age at assessment – continuous Choice of test: (Simple) Linear regression Note – Linear regression is also appropriate when the exposure variable is categorical, e.g. exercise treatment group (standard & enhanced), as well as controlling for other covariates

BINARY DATA Case Study 14 Similar analysis was performed as in Case Study 13, but substituted the binary fitness level (Unfit / Fit) instead of the continuous fitness level and wanted to predict the status of fitness level (Unfit / Fit) for future patients, given their age at assignment Objective: set up a statistical model to quantify the effect of exposure variable on the outcome variable Outcome & type: fitness level category – binary Exposure & type: age at assessment – continuous Choice of test: (Simple) Logistic regression Note – Logistic regression is also appropriate when the exposure variable is categorical, e.g. exercise treatment group (standard & enhanced), as well as controlling for other covariates

BINARY DATA Case Study 15 Using the logistic regression model from Case Study 14, we can aim to evaluate the predictive performance of the regression model developed given we know the true outcome status of fitness level for each participant investigate the optimal predictive performance of the model relate the results to an individual participant indicating the likelihood of them having a specific status of fitness Objective: (1) assess the predictive performance of the model; (2) determine the probability that an individual test result is accurate Outcome & type: fitness level category – binary Exposure & type: age at assessment – continuous Choice of test: (1) Sensitivity and specificity, ROC curve (2) PPV and NPV Note – none of the above methods are statistical tests

SURVIVAL DATA Case Study 16 Recall the clinical trial investigating the survival time of patients with a particular cancer (Case Study 11) age at randomisation is now considered as an important factor in this relationship regardless of the treatment group still interested in the length of time between first diagnosis and death note that censored data still present due to some people having dropped out during follow-up, or are still alive at the end of study and we want to make use of this information Objective: set up a statistical model to quantify the relationship between the exposure variable and the survival status / time Outcome & type: time monitored & death status – survival Exposure & type: age at randomisation – continuous Choice of test: Cox regression Note – Cox regression is also appropriate when the exposure variable is categorical, e.g. treatment groups (active & placebo), as well as controlling for other covariates