Doing Analyses on Binary Outcome. From November 14 th Dr Sainani talked about how the math works for binomial data.

Slides:



Advertisements
Similar presentations
Comparing Two Proportions (p1 vs. p2)
Advertisements

1 2 Two-samples tests, X 2 Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
Chi-Squares II (other categorical measures of association)
Simple Logistic Regression
Case-Control Studies (Retrospective Studies). What is a cohort?
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Understanding real research 3. Assessment of risk.
CRITICAL APPRAISAL Dr. Cristina Ana Stoian Resident Journal Club
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Statistics By Z S Chaudry. Why do I need to know about statistics ? Tested in AKT To understand Journal articles and research papers.
Statistics for Health Care
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Regression and Correlation
Study Design and Analysis in Epidemiology: Where does modeling fit? Meaningful Modeling of Epidemiologic Data, 2010 AIMS, Muizenberg, South Africa Steve.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
David Yens, Ph.D. NYCOM PASW-SPSS STATISTICS David P. Yens, Ph.D. New York College of Osteopathic Medicine, NYIT l PRESENTATION.
Absolute, Relative and Attributable Risks. Outcomes or differences that we are interested in:  Differences in means or proportions  Odds ratio (OR)
Analysis of Categorical Data
 Mean: true average  Median: middle number once ranked  Mode: most repetitive  Range : difference between largest and smallest.
Multiple Choice Questions for discussion
September 15. In Chapter 18: 18.1 Types of Samples 18.2 Naturalistic and Cohort Samples 18.3 Chi-Square Test of Association 18.4 Test for Trend 18.5 Case-Control.
Statistics for clinical research An introductory course.
Stats Tutorial. Is My Coin Fair? Assume it is no different from others (null hypothesis) When will you no longer accept this assumption?
Amsterdam Rehabilitation Research Center | Reade Testing significance - categorical data Martin van der Esch, PhD.
DEB BYNUM, MD AUGUST 2010 Evidence Based Medicine: Review of the basics.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Estimation of Various Population Parameters Point Estimation and Confidence Intervals Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology.
Statistics for Health Care Biostatistics. Phases of a Full Clinical Trial Phase I – the trial takes place after the development of a therapy and is designed.
INFO 515Lecture #91 Action Research More Crosstab Measures INFO 515 Glenn Booker.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx  Let X =  cholesterol level (mg/dL);
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics.
Understanding real research 4. Randomised controlled trials.
EBCP. Random vs Systemic error Random error: errors in measurement that lead to measured values being inconsistent when repeated measures are taken. Ie:
O RGANIZING & DISPLAYING DATA Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Approaches to the measurement of excess risk 1. Ratio of RISKS 2. Difference in RISKS: –(risk in Exposed)-(risk in Non-Exposed) Risk in Exposed Risk in.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
The binomial applied: absolute and relative risks, chi-square.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Please turn off cell phones, pagers, etc. The lecture will begin shortly.
Medical Statistics as a science
Henrik Støvring Basic Biostatistics - Day 4 1 PhD course in Basic Biostatistics – Day 4 Henrik Støvring, Department of Biostatistics, Aarhus University©
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
More Contingency Tables & Paired Categorical Data Lecture 8.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx  Let X = decrease (–) in cholesterol.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Fall 2002Biostat Inference for two-way tables General R x C tables Tests of homogeneity of a factor across groups or independence of two factors.
Handbook for Health Care Research, Second Edition Chapter 11 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 11 Statistical Methods for Nominal Measures.
Chapter 7 Criterion-Referenced Measurement PoorSufficientBetter.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Biostatistics Board Review Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
“Reading and commenting papers” (Scientific English) Alexis Descatha INSERM, UMS UVSQ- Unité de pathologie professionnelle, Garches.
March 28 Analyses of binary outcomes 2 x 2 tables
Lecture8 Test forcomparison of proportion
Epidemiologic Measures of Association
Class session 7 Screening, validity, reliability
The binomial applied: absolute and relative risks, chi-square
Lecture 8 – Comparing Proportions
Chapter 18 Cross-Tabulated Counts
Risk ratios 12/6/ : Risk Ratios 12/6/2018 Risk ratios StatPrimer.
How to assess an abstract
Presentation transcript:

Doing Analyses on Binary Outcome

From November 14 th Dr Sainani talked about how the math works for binomial data.

Binomial Code There is some SAS code on the website to show how to play around with binomial probabilities. Run the macro code first that is the part from: %macro binom(events, trials, prob); down to %mend; Then you just plug in the number of events, trials and probability and run this line: %binom(3, 5,.5);

Results The code kicks out two summary tables. One has descriptive statistics and the binomial probabilities plus a couple checks on whether or not you can use Z approximations of the confidence limits. Plug in several examples from the lecture on the 12 th and 14 th to validate the code.

Results (2) The code also give you the confidence limits around the probability: Search the SAS online documentation for proc reliability to see the details on the CLS and setting your own limits if you want say 90% CLs. support.sas.com/onlinedoc/913/docMainpage.jsp

The Easy Answer Once the macro has been run once you just need to run this line: %binom(2, 6,.551);

Math Stuff

Difference in Proportion Math 95% CI on the Difference = (Δ – 1.96 * SE) up to (Δ * SE)

RR Math

OR Math

Ummm… No Thanks If you don’t want to do the algebra by hand you don’t have to. SAS has can do all this work for you easily.

Relative Risks If you use SAS for analyses be sure to set your tables up correctly. You will recall that Dr. Sainani showed this take with a RR of 1.61 Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D)

Making the table There is an EG project showing how to make the data:

A Contingency Table Contingency table values have to be mutually exclusive counts. Swap these for real life!!!!!

Getting a Relative Risk (sort of) It does not give you the RR!

Rotate the table The risk factor was in the first column. Notice it is the same odds ratio.

Risk Differences Smoker (E)Non-smoker (~E) Heart disease (D)2113 No Disease (~D)

Risk Difference

Association

Expected values in Chi Square Which cells are above or below the expected values?

R Version This does a quick analysis then deletes the table. Rerun the code to keep the table.

Rerun these

Another Example This example is taken from Motulsky’s Intuitive Biostatistics (a book I highly recommend when you encounter people who HATE math). The data is from Cooper et al’s zidovudine (AZT) trial for people who are HIV+. –76 of 475 on AZT had disease progress –129 of 461 on placebo had disease progress

The effect 76/475 or 16% progression vs. 129/461 or 28%. Is the 12% reduction a significant difference?

Grouped Data

The percents you care about The difference you care about The bad thing is in column 1

Important stuff Your subjects have to be randomly selected (independent from each other) from the population you wish to generalize to and the only differences between the two groups should be exposure to the risk factor (in a cohort study) or treatment (in a trial).

Beyond the Relative Risk Epidemiologists get very excited about relative risks but look at the overall prevalence. –A risk factor that changes the risk from 1 in 1,000,000 to 2 in 1,000,000 is not too important compared to a risk factor that changes the risk from 1 in 10 to 2 in 10 but the relative risk is the same. The relative risk is the same.5 for both risk factors.

NNT The number needed to treat (NNT) is the reciprocal of the difference in the probabilities between the two groups. It gives you a metric to judge the relative importance of the effects. In this case you need to treat a million people to made a difference in the rare disease vs 10. Risk of 1 in 1,000,000 = Risk of 2 in 2,000,000 =

Probability vs Odds So far I have talked about probabilities using the number of people progressing while on AZT relative to all the people on it. You can also look at odds by make a ratio of people progressing while on AZT to those who are not progressing on AZT Odds are not easy to think about and because of this not ideal for this data.

Relative Risk vs. Odds Ratio Relative Risk Odds Ratio

Why mess around with odds? If your disease/outcome of interest is rare you will not want to study hundreds of thousands of exposed people to find the few who get disease. You will want to find people with disease and match them to controls and then look to see if they were exposed. –This is a retrospective case-control study.

Dangers of RR You can get any relative risk you want by sampling different numbers of cases and controls in the case-control study!

Get 100 times the controls Get 100 times the cases The original study of cat scratch fever

Rare diseases If the disease is rare then the odds ratio approximates the relative risk.

Case Control Finding the right controls is VERY tricky. Take a class in epidemiology from Dr. Rita Popat to learn about the problems associated with the different types of controls.

Contingency Table Analyses You have seen contingency tables used to describe many kinds of studies. –Experiments –Cohorts –Case-Control studies Contingency tables are also used to describe results of lab results. –You will see a new test which calls people sick or not sick and you will have a gold standard. You want statistics to describe how good the new test does.

Screening and Diagnostic Sensitivity - correctly calling people sick when they are. Specificity - correctly calling people healthy when they are. Predictive value of a positive test - the percentage of people who are positive given a positive test result Predictive value of a negative test - the percentage of people who are negate given a negative test result.

Testing Formals sensitivity = 100 * a /(a+c) specificity = 100 * d / (b+d) predictive value of positive = 100 * a/(a+b) predictive value of negative = 100 * d/(c+d) Given half a chance I will mess up the algebra so I wrote code to do it.

Reading from the Bible Fleiss wrote the authoritative book on categorical data analysis (Statistical Methods for Rates and Proportions, 3rd Edition 2003 Fleiss, Levin, Paik). Get a copy if you are going to deal with categorical data in real life. I have coded up a lot of the book so you don’t need to think how to code up what goes with the brilliant prose.

Using the Code

McNemar’s Test If you have pairs of matched data points (husband and wife saying yes/no, right eye vs. left eye vision good yes/no) you will want to measure the association considering that the pairs of data points are related. You can do McNemar’s test to see if there is an association in the paired data.

Agreement If you want to look at the degree of agreement between two raters you need a statistic that considers how frequently the people would agree by chance alone. You use the same SAS or EG code.