Download presentation
Presentation is loading. Please wait.
Published byWilliam Dominic Summers Modified over 6 years ago
1
ASPIRE Class 5 Biostatistics and Data Collection Tools
Daniel M. Witt, PharmD, BCPS, FCCP
2
Learning Objectives ASPIRE Class 5: Biostatistics Differentiate between descriptive and inferential statistics Choose an appropriate statistical test based on the type of data being analyzed Describe the concepts of normal distribution, population, and sample Formulate the analytic plan for a research study Evaluate various data collection tools and databases to collect study data
3
Elements of a Research Protocol
Background Population Design Objectives Procedures Analytical Plan
4
Class 5 Assignment Please come prepared with the above items October 19th, 2:30-5:00 Kaiser Permanente Central Support Services
5
Daniel M. Witt, PharmD, FCCP, BCPS, CACP Kaiser Permanente Colorado
Biostatistics Daniel M. Witt, PharmD, FCCP, BCPS, CACP Kaiser Permanente Colorado
6
Why Biostatistics? Which medical practices actually help?
Determining what therapies are helpful based on simple experience doesn’t work Biologic variability Placebo effect
7
Drug appears to be effective at increasing CO
An Example Drug appears to be effective at increasing CO Cardiac output Drug dose
8
Clearly no relationship between drug dose and CO
An Example Clearly no relationship between drug dose and CO Cardiac output Drug dose
9
Biostatistics A useful tool
Turns clinical and laboratory experience into quantitative statements Determines whether and by how much a treatment or procedure affected a group of patients Turns boring data into an interesting story
10
Learning Point Experiments rarely include entire population
Selecting unrepresentative samples (bad luck) is unlikely but possible Biostatistical procedures permit estimation of the chance of such bad luck Tell a story (who, what, why, where, how)
11
General Research Goals
Obtain descriptive information about a population based on a sample of that population Test hypotheses about the population Minimize bias
12
Random Variables Definition: Two types “This is important because….”
Outcomes of an experiment or observation whose values cannot be anticipated with certainty Two types Discrete Continuous “This is important because….” choosing (and evaluating) statistical methods depends, in part, on the type of data (variables) used
13
Discrete (counting) Variables
2 types- Nominal: classified into groups in no particular order, and with no indication of relative severity (e.g., sex, mortality, disease state, bleeding, stroke, MI) Ordinal: ranked in a specific order, but with no consistent level of magnitude difference between ranks (e.g., NYHA class, trauma score) 1 2 3 Discrete Variables Caution: Mean and standard deviation is NOT reported with this type of data
14
Continuous (measuring) Variables
Data are ranked in a specific order with a consistent change in magnitude between units; (e.g., heart rate, LDL cholesterol, blood glucose, INR, blood pressure, time, distance) 1 2 Continuous Data
15
Summarizing Data Bell-shaped frequency distribution Landmarks x: mean
SD: standard deviation (SD) Normal distribution: (most common model for population distributions) 30 35 40 45 50 N=200 Mean=40 SD=5.0 x SD SD
16
Mean (average) Only used for continuous, normally distributed data
SD=2.5 10 15 20 Mean (average) Only used for continuous, normally distributed data Sensitive to outliers Most commonly used measure of central tendency
17
Non-Normal Distributions
Mean ± SD N=100 Mean=37.6 SD=4.5 Although mean and SD can be calculated for any population, Does not summarize the distribution as well as for normal distributions A better approach is to use percentiles
18
Median Half of observations fall below and half lie above
Median (50th percentile) Median Half of observations fall below and half lie above Can be used for ordinal or continuous data Insensitive to outliers
19
Percentiles 25th percentile 75th percentile
The in a distribution where a value is larger than 25% or 75% of the other values in the sample Does not assume that the population has a normal distribution
20
Standard Deviation (SD)
68% 95% - 2SD - 1SD mean + 1SD + 2SD Standard Deviation (SD) Appropriately applied only to data that are normally or near normally distributed Applicable only to continuous data Within +/- 1 SD are found 68% of the sample’s values, Within +/- 2 SD are found 95% of the sample’s values
21
Hypothesis Testing The null hypothesis (Ho)
posits no difference between groups being compared (Group A = Group B) a statistical convention (but a good one) is used to assist in determining if any observed differences between groups is due to chance alone (bad luck) in other words, is any observed difference likely due to sampling variation?
22
Hypothesis Testing Example: A new anti-obesity medication is compared to an existing one to determine if one agent is better at achieving goal BMI at the recommended starting dose. Results: Ho: success rate for new drug = success rate for old drug
23
Hypothesis Testing Tests for statistical significance determine if the data are consistent with Ho If Ho is “rejected” = statistically significant difference between groups (unlikely due to chance or ‘bad luck’) If Ho is “accepted” = no statistically significant difference between groups (results may be due to ‘bad luck’)
24
Hypothesis Testing The distribution (range of values) for statistical tests when Ho is true is known Depending on this statistic’s value, Ho is accepted or rejected Choosing the appropriate statistical test depends on: Type of data (nominal, ordinal, continuous) Study design (parallel, cross-over, etc.) presence of Confounding variables
25
Hypothesis Testing For our example,
0.05 0.01 C2 3.84 6.64 For our example, data is nominal data, parallel design with no confounders appropriate test is C2 The frequency distribution of C2 when Ho is true is shown above
26
Hypothesis Testing Large values are possible when Ho is true, but they occur infrequently (5% of the time when C2 is >3.84 and only 1% of the time when C2 is > 6.64) These extreme values are used to demarcate the point(s) at which Ho is accepted or rejected
27
Hypothesis Testing For our example:
using the data in the formula for calculating C2 yields a value of 1.64 because 1.64 < 3.84, accept Ho and say that the new drug is not statistically significantly better than the old drug in getting patients to their goal BMI with the recommended starting dose 1.64 3.84 C2
28
Decision Errors
29
Decision Errors The probability of making a Type I error is defined as the significance level a By setting a at 0.05, this effectively means that 1 out of 20 times a Type I error will occur when Ho is rejected The calculated probability that a Type I error has occurred is called the “p-value” When the a level is set a priori, Ho is rejected when p < a
30
Decision Errors The probability of making a Type II error (accepting Ho when it should be rejected) is termed b By convention, b should be < 0.20
31
Decision Errors Power (1-b)
The ability to detect actual differences between groups Power is increased by: Increasing a Increasing n Large differences between populations Power is decreased by: Poor study design Incorrect statistical tests
32
Statistical Significance Areas for Vigilance
Size of p-value is not related to the importance of the result Statistically significant does not necessarily mean clinically significant Lack of statistical significance does not mean results are unimportant
33
Choosing a Statistical Test
Parametric versus non-parametric Parametric tests assume an underlying normal distribution Non-parametric tests: Non-normally distributed data Nominal or ordinal data
34
Choosing a Statistical Test Continuous Data
Student’s t-test 1 sample: compares mean of study population to the mean of a population whose mean is known 2 sample (independent samples): compares the means of 2 normal distributions Paired: compares the means of paired or matched samples
35
Choosing a Statistical Test Continuous Data
Analysis of variance (ANOVA) Compares the means of 3 or more groups in a study Multiple comparison procedures are used to determine which groups actually differ from each other e.g., Bonferroni, Tukey, Scheffe, others Analysis of covariance (ANACOVA) Controls for the effects of confounding variables
36
Choosing a Statistical Test Ordinal Data
Wilcoxon rank sum Mann-Whitney U Wilcoxon signed rank Kruskal-Wallis Friedman These tests may also be used for non-normally distributed continuous data
37
Choosing a Statistical Test Nominal Data
X2 Compares percentages between 2 or more groups Fisher’s exact test Infrequent outcomes McNemar’s Paired samples Mantel-Haenszel Controls for influence of confounders
38
95% Confidence Intervals
When the ABSOLUTE difference between groups is considered: A 95% confidence interval that excludes zero is considered statistically significant The 95% confidence interval also provides information regarding the MAGNITUDE of the difference between groups
39
Regression Regression useful in constructing predictive models
Multiple regression involves modeling many possible predictor variables to ascertain which predict a particular target variable Regression modeling often used to control or adjust for the effects of confounding variables
40
Example of predictive modeling Expected performance derived
from regression model Expected performance (99% CI) Observed performance Observed differs from expected by >5% Circ Cardiovasc Qual Outcomes 2011;4:22-29
41
Survival Analysis Studies the time between entry into a study and some event (e.g., death) Takes into account that some subjects leave the study due to reasons other than the ‘event’ (e.g. lost to follow up, study period ends) May be utilized to arrive at different types of models Kaplan-Meier Cox Regression Model Proportional hazards regression analysis
42
Kaplan Meier Uses survival times (or censored survival times) to estimate the proportion of people who would survive a given length of time under the same circumstances Allows for the production of a survival curve Uses log-rank test to test for statistically significant differences between groups
43
Survival Analysis-Kaplan Meier Survival Curve
Cumulative Proportion Surviving 1.0 0.8 0.6 0.4 0.2 0.0 Time Treatment Control
44
Cox Regression Modeling
Reported graphically like Kaplan-Meier Investigates several variables at a time Allows calculation of relative risk estimate while adjusting for differences between groups
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.