Introduction to Bayesian Methods (I) C. Shane Reese Department of Statistics Brigham Young University.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics.
Chapter 7 Hypothesis Testing
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
What is Statistical Modeling
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
CS 589 Information Risk Management 30 January 2007.
1 A Bayesian Non-Inferiority Approach to Evaluation of Bridging Studies Chin-Fu Hsiao, Jen-Pei Liu Division of Biostatistics and Bioinformatics National.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
Sample Size Determination
Determining the Size of
Sample Size Determination Ziad Taib March 7, 2014.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Bayesian Inference SECTION 11.1, 11.2 Bayes rule (11.2) Bayesian inference.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Introduction to Hypothesis Testing
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Bayesian Statistics in Clinical Trials Case Studies: Agenda
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Inference for a Single Population Proportion (p).
Virtual COMSATS Inferential Statistics Lecture-6
Statistical Decision Theory
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Bayesian Analysis and Applications of A Cure Rate Model.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 An Interim Monitoring Approach for a Small Sample Size Incidence Density Problem By: Shane Rosanbalm Co-author: Dennis Wallace.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Statistical Inference Statistical Inference is the process of making judgments about a population based on properties of the sample Statistical Inference.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharmaceuticals Inc.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
© Copyright McGraw-Hill 2004
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Date | Presenter Case Example: Bayesian Adaptive, Dose-Finding, Seamless Phase 2/3 Study of a Long-Acting Glucagon-Like Peptide-1 Analog (Dulaglutide)
Gibbs Sampling and Hidden Markov Models in the Event Detection Problem By Marc Sobel.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Canadian Bioinformatics Workshops
Bayesian Estimation and Confidence Intervals Lecture XXII.
HYPOTHESIS TESTING.
Bayesian Semi-Parametric Multiple Shrinkage
Bayesian Estimation and Confidence Intervals
The Importance of Adequately Powered Studies
Model Inference and Averaging
Chapter 9 Hypothesis Testing.
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Bayes for Beginners Luca Chech and Jolanda Malamud
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
CS639: Data Management for Data Science
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Introduction to Bayesian Methods (I) C. Shane Reese Department of Statistics Brigham Young University

Outline  Definitions  Classical or Frequentist  Bayesian  Comparison (Bayesian vs. Classical)  Bayesian Data Analysis  Examples

Definitions  Problem: Unknown population parameter (θ) must be estimated.  EXAMPLE #1:  θ = Probability that a randomly selected person will be a cancer survivor  Data are binary, parameter is unknown and continuous  EXAMPLE #2:  θ = Mean survival time of cancer patients.  Data are continuous, parameter is continuous.

Definitions  Step 1 of either formulation is to pose a statistical (or probability)model for the random variable which represents the phenomenon.  EXAMPLE #1:  a reasonable choice for f (y|θ) (the sampling density or likelihood function) would be that the number of 6 month survivors (Y) would follow a binomial distribution with a total of n subjects followed and the probability of any one subject surviving is θ.  EXAMPLE #2:  a reasonable choice for f (y|θ) survival time (Y) has an exponential distribution with mean θ.

Classical (Frequentist) Approach  All pertinent information enters the problem through the likelihood function in the form of data(Y1,...,Yn)  objective in nature  software packages all have this capability  maximum likelihood, unbiased estimation, etc.  confidence intervals, difficult interpretation

Bayesian Data Analysis  data (enters through the likelihood function as well as allowance of other information  reads: the posterior distribution is a constant multiplied by the likelihood muliplied by the prior Distribution  posterior distribution: in light of the data our updated view of the parameter  prior distribution: before any data collection, the view of the parameter

Additional Information  Prior Distributions  can come from expert opinion, historical studies, previous research, or general knowledge of a situation (see examples)  there exists a “flat prior” or “noninformative” which represents a state of ignorance.  Controversial piece of Bayesian methods  Objective Bayes, Empirical Bayes

Bayesian Data Analysis  inherently subjective (prior is controversial)  few software packages have this capability  result is a probability distribution  credible intervals use the language that everyone uses anyway. (Probability that θ is in the interval is 0.95)  see examples for demonstration

Mammography Test Result PositiveNegative Patient Status Cancer88%12% Healthy24%76% o Sensitivity: o True Positive o Cancer ID’d! o Specificity: o True Negative o Healthy not ID’d!

Mammography Illustration  My friend (40!!!) heads into her OB/GYN for a mammography (according to Dr.’s orders) and finds a positive test result.  Does she have cancer?  Specificity, sensitivity both high! Seems likely... or does it?  Important points: incidence of breast cancer in 40 year old women is per 100,000 women.

Bayes Theorem for Mammography

Mammography Tradeoffs  Impacts of false positive  Stress  Invasive follow-up procedures  Worth the trade-off with less than 1% (0.46%)chance you actually have cancer???

Mammography Illustration  My mother-in-law has the same diagnosis in  Holden, UT is a “downwinder”, she was 65.  Does she have cancer?  Specificity, sensitivity both high! Seems likely... or does it?  Important points: incidence of breast cancer in 65 year old women is 470 per 100,000 women, and approx 43% in “downwinder” cities.  Does this change our assessment?

Downwinder Mammography

Modified Example #1  One person in the class stand at the back and throw the ball tothe target on the board (10 times).  before we have the person throw the ball ten times does the choice of person change the a priori belief you have about the probability they will hit the target (θ)?  before we have the person throw the ball ten times does the choice of target size change the a priori belief you have about the probability they will hit the target (θ)?

Prior Distributions  a convenient choice for this prior information is the Beta distribution where the parameters defining this distribution are the number of a priori successes and failures. For example, if you believe your prior opinions on the success or failure are worth 8 throws and you think the person selected can hit the target drawn on the board 6 times, we would say that has a Beta(6,2) distribution.

Bayes for Example #1  if our data are Binomial(n, θ) then we would calculate Y/n as our estimate and use a confidence interval formula for a proportion.  If our data are Binomial(n, θ) and our prior distribution is Beta(a,b), then our posterior distribution is Beta(a+y,b+n−y).  thus, in our example:  a = b = n = y =  and so the posterior distribution is: Beta(, )

Bayesian Interpretation  Therefore we can say that the probability that θ is in the interval (, ) is  Notice that we don’t have to address the problem of “in repeated sampling”  this is a direct probability statement  relies on the prior distribution

Example: Phase II Dose Finding  Goal:  Fit models of the form: Where And d=1,…,D is the dose level

Definition of Terms  ED(Q):  Lowest dose for which Q% of efficacy is achieved  Multiple definitions:  Def. 1  Def. 2  Example: Q=.95, ED95 dose is the lowest dose for which.95 efficacy is achieved

Classical Approach  Completely randomized design  Perform F-test for difference between groups  If significant at, then call the trial a “success”, and determine the most effective dose as the lowest dose that achieves some pre-specified criteria (ED95)

Bayesian Adaptive Approach  Assign patients to doses adaptively based on the amount of information about the dose-response relationship.  Goal: maximize expected change in information gain:  Weighted average of the posterior variances and the probability that a particular dose is the ED95 dose.

Probability of Allocation  Assign patients to doses based on Where is the probability of being assigned to dose

Four Decisions at Interim Looks  Stop trial for success: the trial is a success, let’s move on to next phase.  Stop trial for futililty: the trial is going nowhere, let’s stop now and cut our losses.  Stop trial because the maximum number of patients allowed is reached (Stop for cap): trial outcome is still uncertain, but we can’t afford to continue trial.  Continue

Stop for Futility  The dose-finding trial is stopped because there is insufficient evidence that any of the doses is efficacious.  If the posterior probability that the mean change for the most likely ED95 dose is within a “clinically meaningful amount” of the placebo response is greater than 0.99 then the trial stops for futility.

Stop for Success  The dose-finding trial is stopped when the current probability that the ED95* is sufficiently efficacious is sufficiently high.  If the posterior probability that the most likely ED95 dose is better than placebo reaches a high value (0.99) or higher then the trial stops early for success.  Note: Posterior (after updated data) probability drives this decision.

Stop for Cap  Cap: If the sample size reaches the maximum (the cap) defined for all dose groups the trial stops.  Refine definition based on application. Perhaps one dose group reaching max is of interest.  Almost always $$$ driven.

Continue  Continue: If none of the above three conditions hold then the trial continues to accrue.  Decision to continue or stop is made at each interim look at the data (accrual is in batches)

Benefits of Approach  Statistical: weighting by the variance of the response at each dose allows quicker resolution of dose-response relationship.  Medical: Integrating over the probability that each dose is ED95 allows quicker allocation to more efficacious doses.

Example of Approach  Reduction in average number of events  Y=reduction of number of events  D=6 (5 active, 1 placebo)  Potential exists that there is a non- monotonic dose-response relationship.  Let be the dose value for dose d.

Model for Example

Dynamic Model Properties  Allows for flexibility.  Borrows strength from “neighboring” doses and similarity of response at neighboring doses.  Simplified version of Gaussian Process Models.  Potential problem: semi- parametric, thus only considers doses within dose range:

Example Curves ?

Simulations  5000 simulated trials at each of the 5 scenarios  Fixed dose design,  Bayesian adaptive approach as outlined above  Compare two approaches for each of 5 cases with sample size, power, and type-I error

Results (power & alpha) CasePr(S)Pr(F)Pr(cap)P(Rej)

Results (n) Fixed130

Observations  Adaptive design serves two purposes:  Get patients to efficacious doses  More efficient statistical estimation  Sample size considerations  Dose expansion -- inclusion of safety considerations  Incorporation of uncertainties!!! Predictive inference is POWERFUL!!!

Conclusions  Science is subjective (what about the choice of a likelihood?)  Bayes uses all available information  Makes interpretation easier  BAD NEWS: I have showed very simple cases... they get much harder.  GOOD NEWS: They are possible (and practical) with advanced computational procedures