Overview Biostatistics Applications of Statistics in Research

Slides:



Advertisements
Similar presentations
Departments of Medicine and Biostatistics
Advertisements

Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Clustered or Multilevel Data
Chapter 2 Simple Comparative Experiments
Statistics By Z S Chaudry. Why do I need to know about statistics ? Tested in AKT To understand Journal articles and research papers.
Sample Size Determination
AM Recitation 2/10/11.
Statistical Inference: Which Statistical Test To Use? Pınar Ay, MD, MPH Marmara University School of Medicine Department of Public Health
1 Bandit Thinkhamrop, PhD.(Statistics) Dept. of Biostatistics & Demography Faculty of Public Health Khon Kaen University Formulation of a research Using.
Applications of Statistics in Research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen.
T tests comparing two means t tests comparing two means.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Logistic Regression Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health, Khon Kaen University.
The exam duration: 1hour 30 min. Marks :25 All MCQ’s. You should choose the correct answer. No major calculations, but simple maths IQ is required. No.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Today - Messages Additional shared lab hours in A-269 –M, W, F 2:30-4:25 –T, Th 4:00-5:15 First priority is for PH5452. No TA or instructor Handouts –
How to Read Scientific Journal Articles
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.
Chapter 10 The t Test for Two Independent Samples
Chapter Eight: Using Statistics to Answer Questions.
Overview and Common Pitfalls in Statistics and How to Avoid Them
Sample Size Determination
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Instrument design Essential concept behind the design Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public.
1 Bandit Thinkhamrop, PhD.(Statistics) Dept. of Biostatistics & Demography Faculty of Public Health Khon Kaen University Overview and Common Pitfalls in.
1 Probability and Statistics Confidence Intervals.
Course: Research in Biomedicine and Health III Seminar 5: Critical assessment of evidence.
Revision of topics for CMED 305 Final Exam. The exam duration: 2 hours Marks :25 All MCQ’s. (50 questions) You should choose the correct answer. No major.
Chapter 13 Understanding research results: statistical inference.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Methods of Presenting and Interpreting Information Class 9.
Statistics & Evidence-Based Practice
Howard Community College
Sample size calculation Ahmed Hassouna, MD
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Sample Size Determination
Hypothesis Testing: One Sample Cases
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
How to read a paper D. Singh-Ranger.
Biostatistics Case Studies 2016
The binomial applied: absolute and relative risks, chi-square
This Week Review of estimation and hypothesis testing
Chapter 2 Simple Comparative Experiments
Statistical Inference for more than two groups
Basic Statistics Overview
12 Inferential Analysis.
Elementary Statistics
Lecture 1: Fundamentals of epidemiologic study design and analysis
Statistics.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
Introduction to Statistics
NURS 790: Methods for Research and Evidence Based Practice
Comparing Populations
12 Inferential Analysis.
Statistics II: An Overview of Statistics
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
15.1 The Role of Statistics in the Research Process
Interpreting Epidemiologic Results.
Chapter Nine: Using Statistics to Answer Questions
One-Factor Experiments
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Overview Biostatistics Applications of Statistics in Research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Begin at the conclusion

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Identify the primary research question Where to find the research question? Title of the study The objective(s) The conclusion(s) If more than one, find the primary aim. Try to make the question “quantifiable”

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Identify the primary study outcome It is the “primary” dependence variable It is the main finding that was used as the basis for the conclusion of the study It is the target of the statistical inference It is the basis for sample size calculation It resided in the : Title Research question Objective Sample size calculation Main finding in the RESULTS section of the report Conclusion

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Type of the study outcome: Key for selecting appropriate statistical methods Dependent variable or response variable Focus on primary study outcome if there are more Type of the study outcome Continuous Categorical (dichotomous, polytomous, ordinal) Numerical (Poisson) count Even-free duration

Continuous outcome Primary target of estimation: Mean (SD) Median (Min:Max) Correlation coefficient: r and ICC Modeling: Linear regression The model coefficient = Mean difference Quantile regression The model coefficient = Median difference Example: Outcome = Weight, BP, score of ?, level of ?, etc. RQ: Factors affecting birth weight

Categorical outcome Primary target of estimation : Modeling: Example: Proportion or Risk Modeling: Logistic regression The model coefficient = Odds ratio (OR) Example: Outcome = Disease (y/n), Dead(y/n), cured(y/n), etc. RQ: Factors affecting low birth weight

Numerical (Poisson) count outcome Primary target of estimation : Incidence rate (e.g., rate per person time) Modeling: Poisson regression The model coefficient = Incidence rate ratio (IRR) Example: Outcome = Total number of falls Total time at risk of falling RQ: Factors affecting elderly fall

Event-free duration outcome Primary target of estimation : Median survival time Modeling: Cox regression The model coefficient = Hazard ratio (HR) Example: Outcome = Overall survival, disease-free survival, progression-free survival, etc. RQ: Factors affecting survival

The outcome determine statistics Continuous Mean Median Categorical Proportion (Prevalence Or Risk) Count Rate per “space” Survival Median survival Risk of events at T(t) Poisson Reg. Linear Reg. Logistic Reg. Cox Reg.

Statistics quantify errors for judgments Parameter estimation [95%CI] Hypothesis testing [P-value]

Statistics quantify errors for judgments Parameter estimation [95%CI] Hypothesis testing [P-value] 7

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Types of Research Qualitative Quantitative Observational Experimental Phenomenology Grounded Theory Ethnography Description Observational Experimental Clinical trial Field trial Community intervention trial Parallel or Cross-over or factorial Fixed length or group sequential With or without baseline Descriptive Analytical Quasi-experimental Randomized-controlled Cross-sectional descriptive Prevalence survey Poll Cross-sectional Case-control Cohort Prevalence case-control Nested case-control Case-cohort case-control Prospective cohort Retrospective cohort Ambi-spective cohort Systematic review Meta-analysis

Caution about biases Selection bias Information bias Confounding bias Research Design Prevent them Minimize them

Caution about biases If data available: Selection bias (SB) Information bias (IB) Confounding bias (CB) If data available: SB & IB can be assessed CB can be adjusted using multivariable analysis

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Generate a mock data set General format of the data layout id y x1 x2 X3 1 2 3 4 5 … n Variables Data

Generate a mock data set Continuous outcome example id y x1 x2 X3 1 2 21 22 12 19 3 4 20 89 5 14 18 … n 6 45 Mean (SD)

Generate a mock data set Continuous outcome example id y x1 x2 X3 1 21 22 2 12 19 3 4 20 89 5 18 … n 45 n, percentage

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Common types of the statistical goals Single measurements (no comparison) Difference (compared by subtraction) Ratio (compared by division) Prediction (diagnostic test or predictive model) Correlation (examine a joint distribution) Agreement (examine concordance or similarity between pairs of observations)

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Dependency of the study outcome required special statistical methods to handle it Example of dependency or correlated data: Before-after or Pre-post design Measuring paired organs i.e., ears, eyes, arms, etc. Longitudinal data, repeated measurement Clustered data, many observation unit within a cluster Choices of approaches: Ignore it => use ordinary analysis as independency - not save Simplify it => use summary measure then analyze the data as it is independent – not efficient Handle it => Mixed model, multilevel modeling, GEE - recommended

Dependency of the study outcome required special statistical methods to handle it Continuous Categorical Count Survival Mean Median Proportion (Prevalence Or Risk) Rate per “space” Median survival Risk of events at T(t) Linear Reg. Logistic Reg. Poisson Reg. Cox Reg. Mixed model, multilevel model, GEE

Back to the conclusion Continuous Categorical Count Survival Appropriate statistical methods Mean Median Proportion (Prevalence or Risk) Rate per “space” Median survival Risk of events at T(t) Answer the research question based on lower or upper limit of the CI Magnitude of effect 95% CI P-value

Always report the magnitude of effect and its confidence interval Absolute effects: Mean, Mean difference Proportion or prevalence, Rate or risk, Rate or Risk difference Median survival time Relative effects: Relative risk, Rate ratio, Hazard ratio Odds ratio Other magnitude of effects: Correlation coefficient (r), Intra-class correlation (ICC) Kappa Diagnostic performance Etc.

Touch the variability (uncertainty) to understand statistical inference id A (x- ) (x- ) 2 1 2 -2 4 3 -4 16 5 14 10 100 Sum () 20 128 Mean( ) 32.0 SD 5.66 Median X X 2+2+0+2+14 = 20 2+2+0+2+14 = 20 = 4 5 5 0 2 2 2 14 Variance = SD2 X Standard deviation = SD

Touch the variability (uncertainty) to understand statistical inference Measure of central tendency id A (x- ) (x- ) 2 1 2 -2 4 3 -4 16 5 14 10 100 Sum () 20 128 Mean( ) 32.0 SD 5.66 Median X X Measure of variation X

Degree of freedom Standard deviation (SD) = The average distant between each data item to their mean Degree of freedom

Same mean BUT different variation id A 1 2 3 4 5 14 Sum () 20 Mean SD 5.66 Median id B 1 2 3 4 5 8 Sum () 20 Mean SD 2.91 Median id C 1 4 2 3 5 Sum () 20 Mean SD 0.71 Median Heterogeneous data Skew distribution Heterogeneous data Symmetry distribution Homogeneous data Symmetry distribution

Facts about Variation Because of variability, repeated samples will NOT obtain the same statistic such as mean or proportion: Statistics varies from study to study because of the role of chance Hard to believe that the statistic is the parameter Thus we need statistical inference to estimate the parameter based on the statistics obtained from a study Data varied widely = heterogeneous data Heterogeneous data requires large sample size to achieve a conclusive finding

The Histogram id A 1 2 3 4 5 14 id B 1 4 2 3 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14

The Frequency Curve id A 1 2 3 4 5 14 id B 1 4 2 3 5 1 2 3 4 5 6 7 8 9 4 5 14 id B 1 4 2 3 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Area Under The Frequency Curve id A 1 2 3 4 5 14 id B 1 4 2 3 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Central Limit Theorem  Right Skew X1 Symmetry X2 Left Skew X3 Normally distributed X1 XX Xn 

Central Limit Theorem  Distribution of the raw data Distribution of X1 X2 X3 Distribution of the sampling mean X1 XX Xn 

(Theoretical) Normal Distribution Central Limit Theorem Distribution of the raw data X1 XX Xn  Distribution of the sampling mean (Theoretical) Normal Distribution Large sample

Standardized for whatever n, Mean = 0, Standard deviation = 1 Central Limit Theorem Many X, , SD X Standard deviation of the sampling mean Standard error (SE) X1 XX Xn  Many , , SE X Standardized for whatever n, Mean = 0, Standard deviation = 1 Large sample SE = SD n  Estimated by

(Theoretical) Normal Distribution

99.73% of AUC Mean ± 3SD

95.45% of AUC Mean ± 2SD

68.26% of AUC Mean ± 1SD

Sample n = 25 X = 52 SD = 5 Population Parameter estimation [95%CI] Hypothesis testing [P-value]

Z = 2.58 Z = 1.96 Z = 1.64 5 = 1 5

Sample n = 25 X = 52 SD = 5 SE = 1 Population Parameter estimation Z = 2.58 Z = 1.96 Z = 1.64 Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96

Sample n = 25 X = 52 SD = 5 SE = 1 Population Hypothesis testing HA :   55 Z = 55 – 52 1 3

Hypothesis testing H0 :  = 55 HA :   55 52 55 -3SE +3SE If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027

Report and interpret p-value appropriately Example of over reliance on p-value: Real results: n=5900; ORDrug A vs Drug B = 1.02 (P<0.001) Inappropriate: Quote p-value as < 0.05 or put * or **** (star) to indicate significant results Wrong: Drug A is highly significantly better than Drug B (P<0.001) What if 95%CI: 1.001 to 1.300? This is no clinical meaningful at all….!

Report and interpret p-value appropriately Example of over reliance on p-value: Real results: n=30; ORDrug A vs Drug B = 9.2 (P=0.715) Inappropriate: Quote p-value as > 0.05 Wrong: There is no statistical significant difference of the treatment effect (P<0.05). Thus Drug A is as effective as Drug B What if 95%CI: 0.99 to 28.97? This is study indicated a low power, NOT suggested an equivalence…! Correct: There was no sufficient information to concluded that . . . => inconclusive findings

P-value is the magnitude of chance NOT magnitude of effect P-value < 0.05 = Significant findings Small chance of being wrong in rejecting the null hypothesis If in fact there is no [effect], it is unlikely to get the [effect] = [magnitude of effect] or more extreme Significance DOES NOT MEAN importance Any extra-large studies can give a very small P-value even if the [magnitude of effect] is very small

P-value is the magnitude of chance NOT magnitude of effect P-value > 0.05 = Non-significant findings High chance of being wrong in rejecting the null hypothesis If in fact there is no [effect], the [effect] = [magnitude of effect] or more extreme can be occurred chance. Non-significance DOES NOT MEAN no difference, equal, or no association Any small studies can give a very large P-value even if the [magnitude of effect] is very large

P-value vs. 95%CI (1) An example of a study with dichotomous outcome A study compared cure rate between Drug A and Drug B Setting: Drug A = Alternative treatment Drug B = Conventional treatment Results: Drug A: n1 = 50, Pa = 80% Drug B: n2 = 50, Pb = 50% Pa-Pb = 30% (95%CI: 26% to 34%; P=0.001)

P-value vs. 95%CI (2) Pa > Pb Pb > Pa Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05)

P-value vs. 95%CI (3) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99

Tips #6 (b) P-value vs. 95%CI (4) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were statistically significant different between the two groups.

Tips #6 (b) P-value vs. 95%CI (5) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were no statistically significant different between the two groups.

P-value vs. 95%CI (4) Save tips: Always report 95%CI with p-value, NOT report solely p-value Always interpret based on the lower or upper limit of the confidence interval, p-value can be an optional Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI can provide this message.

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

The outcome determine statistics Continuous Categorical Count Survival Mean Median Proportion (Prevalence Or Risk) Rate per “space” Median survival Risk of events at T(t) Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.

Dependency of the study outcome required special statistical methods to handle it Continuous Categorical Count Survival Mean Median Proportion (Prevalence Or Risk) Rate per “space” Median survival Risk of events at T(t) Linear Reg. Logistic Reg. Poisson Reg. Cox Reg. Mixed model, multilevel model, GEE

Back to the conclusion Continuous Categorical Count Survival Appropriate statistical methods Mean Median Proportion (Prevalence or Risk) Rate per “space” Median survival Risk of events at T(t) Magnitude of effect 95% CI P-value Answer the research question based on lower or upper limit of the CI

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Perform the data analysis using a software Use the data being generated as if it would be after completion of the research Analyze according to the analysis plan Try to understand the computer output and to find if the research question has been answered: What is the magnitude of effect and its 95% confidence interval? Was the results due to the role of chance?

Steps of Statistical Applications (Practical guides for beginners) Begin at the conclusion Identify the primary research question Identify the primary study outcome Identify type of the study outcome Identify type of the study design Generate a mock data set Identify type of the main statistical goal List choices of the statistical methods Select the most appropriate statistical method Perform the data analysis using a software Report and interpret the results from the outputs

Writing Results Sections Outline Sections: Study algorithm Characteristics of the study sample Results of an exploratory analysis to support ways to answer the RQ Results to answer the RQ Results of an exploratory analysis to know more about the answer of the RQ Follow formats required by the research sponsor or the target journal Best done with SAP – Statistical Analysis Plan Narrated tables or figures with key messages and avoid repetitions Do not include explanations in Results section

Tips for Writing Results Section Report results with purpose Refer to the corresponding table or figures early at the beginning of the descriptions Report sufficient data to allow evaluation of the calculation while avoid redundancy Document steps of data analysis from which the results were transcribed Provide statistical inference for the main findings that are the basis for the conclusions Always report the confidence intervals, p-value can be an optional – not the main target 9

Q & A Thank you