Download presentation
Presentation is loading. Please wait.
Published byRobert Mercier Modified over 5 years ago
1
Statistical Analysis Planning Mathematics & Statistics Help University of Sheffield
2
Learning outcomes By the end of this session you should know about:
Statistical Analysis Cycle Some useful approaches to analysing data By the end of this session you should be able to: Recognise different data types Use a flowchart to decide which analysis method to use Write a statistical analysis plan
3
Statistical analysis cycle
Plan: Main research question (what are you interested in) Data collection: What data will you need to investigate your research topic Process/analyse data: Summary statistics/ graphs Hypothesis testing Report: Summarise the results in context, not just as statistical output This slide shows us the statistical analysis cycle. We always start with a question, what are you interested in and what is your main research question? Based on the question you want to answer you will need to collect the relevant data. Once you have your data, you have to analyse it, this is where we transform data into information by applying statistical methods. Finally you need to interpret and summarise your information. It is important that you present it within the context of your own discipline, not just as statistical results. Sometimes your results will lead you to ask further questions and to start the cycle all over again.
4
Simple example PLAN: How many hours on average does a baby sleep per night? COLLECT: I record the number of hours my baby slept each night for a week: the rows will be the nights: Night 1, Night 2, Night 3, etc. The first column or variable will be “Night” ANALYSE: Apply a method for summarising the data (e.g. calculate the average by dividing the total hours my baby slept at night that week by 7) REPORT: I report the average as a way of summarising Suppose I am interested in knowing how many hours on average I sleep per day. It is very important that you consider carefully what information is needed to answer your questions. There is no point in counting how many times I turn over when I am sleeping when all I am interested in is the average number of hours I slept. I start by collecting my data, this is the total number of hours I slept during the week. Then I apply a method to calculate the average, this is, I divide the total number of hours I slept in the week by 7, as there are 7 days is the week. And that’s it! This is a very simple example but it just shows how you can use a method to statistically analyse data to draw some information and answer a question.
5
Plan: main research question
What do you want to investigate and why? What are your aims? How are you going to investigate it? How will you collect your data? Who/what is in the sample? How will you summarise your data? How will you analyse your data?
6
Plan: Data types What types of data are there? Within the data structure there are observations or individuals, and for each observation there are data variables. Data variables can be continuous, nominal or ordinal. Variables can be divided into two main categories: numerical and categorical. Categorical variables indicate categories, for example gender (Male or Female) and marital status (Single, Married, Divorced or Widowed). Sometimes they are coded as numbers e.g. 1= male. Categorical variables can be divided into two: ordinal and nominal. If the categories are meaningfully ordered, the variable is ordinal; if it doesn’t matter in which way the categories are ordered, then the variable is nominal. For example, satisfaction levels (dissatisfied, satisfied and highly satisfied) and education level (secondary, sixth form, undergraduate and postgraduate) are ordinal variables; Student’s religion (Christian, Muslim, Hindu, etc) and Gender (Male, Female) are nominal variables. Numerical variables appear as meaningful comparable numbers, such as blood pressure, height, weight, income, age, and probability of illness etc. Numerical variables can be further divided into two subtypes: continuous and discrete. The continuous variables can take any value within a range and are the most common, e.g. body weight, height, income, etc. Discrete variables can only take whole numbers, such as number of students in class, number of new patients every day, etc but are treated as continuous for statistical analysis if there are a large range of numbers. There is another variable type called ‘Label’ variable, which identifies observations uniquely, such as Student ID, subjects’ name.
7
Data types: Categorical
Nominal: No natural ordering: blood group; ethic group When only two categories: Binary Alive / dead
8
Data types: Ordinal Has recognisable order e.g. 1st, 2nd, 3rd
Likert scales are ordinal e.g. strongly disagree to strongly agree Can be numbered but the numbers are no different to names/labels The gap between 1st and 2nd may be different to the gap between 2nd and 3rd
9
Data types: Numerical DISCRETE: Can only take whole numbers
Number of children in a family, how many times have you been on holiday this year CONTINUOUS: Measured on any scale. Limited on by precision Height, anything that can have decimals Time to run a race Discrete data often treated as continuous in analysis
10
Exercise The ship Titanic sank in 1912 with the loss of most of its passengers Details can be obtained on 1309 passengers and crew on board the ship Titanic TITANIC: What are the data types of the variables listed below? Name class Survived = died Gender Age No. of siblings/ spouses on board No. of parents/ children on board price of ticket Abbing, Anthony 3 male 42 7.55 Abbott, Rosa 1 female 35 20.25 Abelseth, Karen 16 7.65 Type of variable Nominal Here we have a table with a subset of passenger records from the Titanic. What types of variables do we have here and given the whole dataset, what research questions could be investigated? Identify the data types for the variables in the data set and what is the key dependent variable (the one being influenced by other explanatory variables).
11
Exercise TITANIC: What are the data types of the variables? Name class
Survived = died Gender Age No. of siblings/ spouses on board No. of parents/ children on board price of ticket Abbing, Anthony 3 male 42 7.55 Abbott, Rosa 1 female 35 20.25 Abelseth, Karen 16 7.65 Type of variable Ordinal Nominal Nominal (binary) Cont Discrete Cont Here we have a table with a subset of passenger records from the Titanic. What types of variables do we have here and given the whole dataset, what research questions could be investigated? Identify the data types for the variables in the data set and what is the key dependent variable (the one being influenced by other explanatory variables).
12
Different ways of measuring
Have a think about the following variables. Can you think of at least three ways in which each could be measured? Fear of statistics? Age? Time? What impact do you think this could have on your study?
13
Neither agree or disagree
Fear of statistics? How could ‘fear’ be measured? NOMINAL: Are you afraid of statistics? Yes No ORDINAL: How much do you agree or disagree with the following: I am afraid of statistics Strongly agree Agree Neither agree or disagree Disagree Strongly disagree
14
Fear of statistics? CONTINUOUS: Indicate your fear of statistics on a scale of 1 – 100 As 10.3 is a possibility, the answer gives continuous data Finally I could measure heart rate before and after asking someone to use this equation to calculate standard deviation:
15
Examples?
16
Outcome (dependent) variable
What is your main outcome variable and which other variables may affect it? Does attendance have an effect on exam score? Does it take longer to commute to work in London compared to Yorkshire? DEPENDENT (outcome) variable INDEPENDENT (explanatory/ predictor) variable affects You will need to distinguish between independent (explanatory) variables and dependent (outcome) variables regarding the research question. The explanatory variables are thought to have an effect on the dependent variable and the distinction between the two is important when carrying out statistical analysis. For example, if you were investigating the affect attendance has on exam score, the independent variable is the attendance and the dependent variable is the exam score. If you wanted to collect data to test the theory ‘do women drink more coffee than men’, you would randomly selected some men and women and ask them how many coffees they drink per day. The mean number of cups of coffee per day would be compared. Here the dependent variable is the amount of coffee drunk per day and the independent variable is gender. We think that gender is affecting coffee drunk. JMR: What if your key research question is which European languages are most alike?
17
Exercise Were wealthy people more likely to die on the Titanic?
Which variables from the data set would help answer this question? Are young people better at parking than older people? How would you investigate this? What would the dependent/independent variables be and what data types? What else may affect how good someone is at parking? Here we have a table with a subset of passenger records from the Titanic. What types of variables do we have here and given the whole dataset, what research questions could be investigated? Identify the data types for the variables in the data set and what is the key dependent variable (the one being influenced by other explanatory variables).
18
Exercise Research question Dependent Independent 1) Were wealthy people more likely to die on the Titanic? Survival (binary) Class (Ordinal) Price of ticket (Continuous) Are young or old people better at parking? Parking speed or accuracy (scale) Age (binary) Other factors affecting parking: Experience, drink, tiredness, gender, others in car …..
19
Data collection: points to consider
Who/what is in the sample How are you going to collect your data: primary data: data collected by the researcher secondary data: data collected by someone else e.g. data from the National Student Survey If primary data collection, consider doing a pilot study Why is this a good idea? Think about data quality How might you assess this? During the planning stage, think very carefully about what your specific research question and how you are going to collect the data needed to answer the question. Data are measurements on individuals which can be turned into information. When planning research are you going to collect data yourself (primary data) using an experiment or questionnaire or use someone else’s data (secondary data)? An example of primary research would be the observation of the number of hours slept on average or designing your own questionnaire. Questionnaires can be used to collect quantitative data which can be analysed in some way e.g. rating a course from 1 – 5 etc or qualitative responses to open ended questions. When designing a questionnaire, get advice at the planning stage to avoid getting to the analysis and finding out there were problems.
20
Analyse: Exploratory data analysis
What is the advantage of this? During the planning stage, think very carefully about what your specific research question and how you are going to collect the data needed to answer the question. Data are measurements on individuals which can be turned into information. When planning research are you going to collect data yourself (primary data) using an experiment or questionnaire or use someone else’s data (secondary data)? An example of primary research would be the observation of the number of hours slept on average or designing your own questionnaire. Questionnaires can be used to collect quantitative data which can be analysed in some way e.g. rating a course from 1 – 5 etc or qualitative responses to open ended questions. When designing a questionnaire, get advice at the planning stage to avoid getting to the analysis and finding out there were problems.
21
EDA: useful summary measures
Data type Summary statistics Nominal Mode, %’s Ordinal Mode, Median, %’s Discrete (Count) %’s, can also calculate means and medians as you would for continuous data but does depend on how many separate counts you have Continuous: normally distributed Mean, Standard deviation skewed Median, Interquartile range
22
EDA: Useful chart types:
Bar chart: 1 categorical Histogram: 1 continuous Clustered bar chart: 2 cat. Bar chart: 1 cont., 1 cat. Boxplot: 1 cont., 1 cat. Scatter plot: 2 cont.
23
EDA: useful chart types
One variable Categorical: Pie chart, barchart Numerical discrete: barchart Numerical continuous: histogram, boxplots Two variables Both categorical: stacked barchart, clustered barchart, multiple pie charts One categorical / one numerical discrete: boxplots (sometimes!), multiple barcharts One categorical / one numerical continuous: boxplots, multiple histograms Both numerical: scatterplot
24
Analyse: Main data analysis
Consider What are the outcome (dependent) and explanatory (independent) variables? What data types are they? What is the null hypothesis? What is the significance level? (i.e. Do we reject or accept the null hypothesis?) During the planning stage, think very carefully about what your specific research question and how you are going to collect the data needed to answer the question. Data are measurements on individuals which can be turned into information. When planning research are you going to collect data yourself (primary data) using an experiment or questionnaire or use someone else’s data (secondary data)? An example of primary research would be the observation of the number of hours slept on average or designing your own questionnaire. Questionnaires can be used to collect quantitative data which can be analysed in some way e.g. rating a course from 1 – 5 etc or qualitative responses to open ended questions. When designing a questionnaire, get advice at the planning stage to avoid getting to the analysis and finding out there were problems.
25
Null hypothesis (H0) The null hypothesis of your research question is usually the hypothesis where there is no relationship. If the significance level is less than the threshold 0.05, then the null hypothesis is rejected and the relationship is “significant”.
26
Steps for choosing the right test (1)
Clearly define your research question What is your main outcome of interest? There may be more than one. What data type is it? The data type will determine the type of analysis Are the observations paired? Can it be characterised using a known distribution (i.e. parametric vs non-parametric test)? What may affect the outcome of interest? What data type is it/are they? How will your results be summarised? What charts can you use to display your results?
27
Steps for choosing the right test (2)
Are you interested: Testing differences between groups. How many groups are there? Assessing/modelling the relationship between variables Are the observations paired? Is the pairing due to having repeated measurements of the same variable for each subject? Does the test you have chosen make any assumptions? Are the assumptions met? e.g. assumption of normality for t-test
28
Test assumptions Parametric tests: Non-parametric:
Generally assume data or some function of the data follows a known distribution e.g. normal Parametric tests: Non-parametric: Nonparametric techniques are usually based on ranks/signs rather than actual data All tests have assumptions and one of the main assumptions for a lot of the most common tests is that the data is normally distributed. Tests with this assumption are called parametric tests. Non-parametric tests do not require this assumption as they are based on the ranks of the data rather than the actual data.
29
Non-parametric methods are used when:
Dependent variable is ordinal A plot of the data appears to be very skewed or the data do not seem to follow any particular shape or distribution (e.g. Normal) Assumptions underlying parametric test not met There are potentially influential outliers in the dataset Sample size is small
30
Paired data Most commonly, measurements from the same individuals collected on more than one occasion Can be used to look at differences in mean score: 2 or more time points e.g. before/after a diet 2 or more conditions e.g. hearing test at different frequencies Each person listened to a sound until they could no longer hear it at three different frequencies. Would use Repeated measures ANOVA to test for a difference between the frequencies.
31
Comparing averages Comparing: Dependent (outcome) variable
Independent (explanatory) variable Parametric test Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Nominal (Binary) Independent t- test Mann-Whitney test/ Wilcoxon rank sum Comparing 3+ INDEPENDENT groups Nominal One-way ANOVA Kruskal-Wallis test Comparing 2 measurements on the same subject e.g. weight before and after a diet Time/ Condition variable Paired t-test Wilcoxon signed rank test Comparing 3+ measurements on the same subject Time/ condition variable Repeated measures ANOVA Friedman test
32
Exercise: Did gender affect ticket price paid on the Titanic?
What is the outcome variable? What is the grouping / explanatory variable? What methods are available to analyse these data? What test do you think would be appropriate?
33
Exercise: Did gender affect ticket price paid on the Titanic?
What is the outcome variable? Ticket price What is the grouping / explanatory variable? Gender What methods are available to analyse these data? Comparing ticket price between two groups (male and female). Most appropriate method is independent samples t-test. However as the data are highly skewed, and thus the assumption of normality is not met, the Mann-Whitney U test is the most appropriate
34
Examples from your field?
35
Investigating relationships
Comparing: Dependent (outcome) variable Independent (explanatory) variable Parametric test Non-parametric test (ordinal/ skewed data) Comparing two INDEPENDENT groups Continuous Pearson’s correlation Spearman’s correlation Predicting the value of one variable from the value of a predictor variable or looking for significant relationships Any Simple linear regression Transform the data Nominal (binary) Logistic regression Assessing the relationship between two categorical variables Categorical Chi-squared test
36
Examples from your field?
37
Statistical Analysis Plan: background
Randomised Controlled Trials – need to improve reproducibility, transparency and validity Useful for designing the statistical analysis & describing the planned analyses as they relate to study objectives Recommended to be completed alongside the study protocol
38
Statistical Analysis Plan: Exercise
What do you think are the key elements of an analysis plan?
39
Statistical Analysis Plan: key elements
Clear title Version control (with dates) Revision history with reasons for each revision List of key contributors, including author of SAP, study principle investigator, senior statistician/statistical advisor Background and rationale for study Objectives and hypotheses Study type Randomisation details, if applicable Sample size details, if applicable Timing and time interval for assessing each outcome
40
Statistical Analysis Plan: key elements
Significance level (p values) and whether one- or two sided Plan and rationale for adjustment for multiple testing, if applicable Confidence intervals to be reported Definition of population being analysed Inclusion and exclusion criteria Presentation of included and excluded data, if applicable Presentation of withdrawal data, if applicable Baseline patient characteristics Definitions of outcomes and sequence of measurement, including units used
41
Statistical Analysis Plan: key elements
Calculation or transformations used to derive outcome Analysis methods used Covariates and adjustments Methods for checking distributional assumptions, including model fit Alternatives if assumptions are false Sensitivity analyses, if applicable Subgroup definitions and analyses, if applicable Method for handling missing data, if applicable Additional secondary analyses, if applicable Safety data analyses, if applicable Statistical packages to be used
42
Learning outcomes You should now know about:
Statistical Analysis Cycle Some useful approaches to analysing data You should now be able to: Recognise different data types Use a flowchart to decide which analysis method to use Write a statistical analysis plan
43
Download the slides from the MASH website
MASH > statistics> workshop_statistics
44
Maths And Statistics Help
Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm)
45
Resources: All resources are available in paper form at MASH or on the MASH website
46
Contacts Follow MASH on twitter: @mash_uos Staff (stats)
Jenny Freeman Basile Marquier Marta Emmett Website Follow MASH on
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.