Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Brief introduction on Logistic Regression
Bayesian Estimation in MARK
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Chapter 4: Linear Models for Classification
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Classification and risk prediction
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference
Objectives of Multiple Regression
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
The Essentials of 2-Level Design of Experiments Part I: The Essentials of Full Factorial Designs The Essentials of 2-Level Design of Experiments Part I:
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Statistical Decision Theory
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Modeling Data Greg Beckham. Bayes Fitting Procedure should provide – Parameters – Error estimates on the parameters – A statistical measure of goodness.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Machine Learning 5. Parametric Methods.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Sampling considerations within Market Surveillance actions Nikola Tuneski, Ph.D. Department of Mathematics and Computer Science Faculty of Mechanical Engineering.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Bayesian Estimation and Confidence Intervals Lecture XXII.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Data Modeling Patrice Koehl Department of Biological Sciences
Bayesian Estimation and Confidence Intervals
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Probability Theory and Parameter Estimation I
Model Inference and Averaging
Special Topics In Scientific Computing
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Linear Regression.
CONCEPTS OF ESTIMATION
More about Posterior Distributions
Regression Models - Introduction
'Linear Hierarchical Models'
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Basics of the semiparametric frailty model
The Simple Linear Regression Model: Specification and Estimation
Mathematical Foundations of BME
Parametric Methods Berlin Chen, 2005 References:
The Essentials of 2-Level Design of Experiments Part I: The Essentials of Full Factorial Designs Developed by Don Edwards, John Grego and James Lynch.
Bayesian Statistics on a Shoestring Assaf Oron, May 2008
Regression Models - Introduction
Presentation transcript:

Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach, but the devil is in the details! Intuitive formulation and analytical approach, but the devil is in the details! Look at simplifying assumptions as we step through Box and Meyer’s approach Look at simplifying assumptions as we step through Box and Meyer’s approach One of the hottest areas in statistics for several years One of the hottest areas in statistics for several years

Bayesian Model Selection in Factorial Designs There are 2 k-p -1 possible (fractional) factorial models, denoted as a set {M l }. There are 2 k-p -1 possible (fractional) factorial models, denoted as a set {M l }. To simplify later calculations, we usually assume that the only active effects are main effects, two-way effects or three-way effects To simplify later calculations, we usually assume that the only active effects are main effects, two-way effects or three-way effects –This assumption is already in place for low- resolution fractional factorials

Bayesian Model Selection in Factorial Designs Each M l denotes a set of active effects (both main effects and interactions) in a hierarchical model. Each M l denotes a set of active effects (both main effects and interactions) in a hierarchical model. We will use X ik =1 for the high level of effect k and X ik =-1 for the low level of effect k. We will use X ik =1 for the high level of effect k and X ik =-1 for the low level of effect k.

Bayesian Model Selection in Factorial Designs We will assume that the response variables have a linear model with normal errors given model M We will assume that the response variables have a linear model with normal errors given model M X i and  are model-specific, but we will use a saturated model in what follows X i and  are model-specific, but we will use a saturated model in what follows

Bayesian Model Selection in Factorial Designs The likelihood for the data given the parameters has the following form The likelihood for the data given the parameters has the following form

Bayesian Paradigm Unlike in classical inference, we assume the parameters, , are random variables that have a prior distribution, f  (  ), rather than being fixed unknown constants. Unlike in classical inference, we assume the parameters, , are random variables that have a prior distribution, f  (  ), rather than being fixed unknown constants. In classical inference, we estimate  by maximizing the likelihood L(  |y) In classical inference, we estimate  by maximizing the likelihood L(  |y)

Bayesian Paradigm Estimation using the Bayesian approach relies on updating our prior distribution for  after collecting our data y. The posterior density, by an application of Bayes rule, is proportional to the familiar data density and the prior density: Estimation using the Bayesian approach relies on updating our prior distribution for  after collecting our data y. The posterior density, by an application of Bayes rule, is proportional to the familiar data density and the prior density:

Bayesian Paradigm The Bayes estimate of  minimizes Bayes risk—the expected value (with respect to the prior) of loss function L(  ). The Bayes estimate of  minimizes Bayes risk—the expected value (with respect to the prior) of loss function L(  ). Under squared error loss, the Bayes estimate is the mean of the posterior distribution: Under squared error loss, the Bayes estimate is the mean of the posterior distribution:

Bayesian Model Selection in Factorial Designs The Bayesian prior for models is quite straightforward. If r effects are in the model, then they are active with prior probability  The Bayesian prior for models is quite straightforward. If r effects are in the model, then they are active with prior probability 

Bayesian Model Selection in Factorial Designs Since we’re using a Bayesian approach, we need priors for  and  as well Since we’re using a Bayesian approach, we need priors for  and  as well

Bayesian Model Selection in Factorial Designs For non-orthogonal designs, it’s common to use Zellner’s g-prior for  : For non-orthogonal designs, it’s common to use Zellner’s g-prior for  : Note that we did not assign priors to  or  Note that we did not assign priors to  or 

Bayesian Model Selection in Factorial Designs We can combine f( , ,  ) and f(Y| , ,  ) to obtain the full likelihood L( , , ,Y) We can combine f( , ,  ) and f(Y| , ,  ) to obtain the full likelihood L( , , ,Y)

Bayesian Model Selection in Factorial Designs

Our goal is to derive the posterior distribution of M given Y, which first requires integrating out  and . Our goal is to derive the posterior distribution of M given Y, which first requires integrating out  and .

Bayesian Model Selection in Factorial Designs The first term is a penalty for model complexity (smaller is better) The first term is a penalty for model complexity (smaller is better) The second term is a measure of model fit (smaller is better) The second term is a measure of model fit (smaller is better)

Bayesian Model Selection in Factorial Designs  and  are still present. We will fix  ; the method is robust to the choice of   and  are still present. We will fix  ; the method is robust to the choice of   is selected to minimize the probability of no active factors  is selected to minimize the probability of no active factors

Bayesian Model Selection in Factorial Designs With L(M|Y) in hand, we can actually evaluate the P(M i |Y) for all M i for any prior choice of , provided the number of M i is not burdensome With L(M|Y) in hand, we can actually evaluate the P(M i |Y) for all M i for any prior choice of , provided the number of M i is not burdensome This is in part why we assume eligible M i only include lower order effects. This is in part why we assume eligible M i only include lower order effects.

Bayesian Model Selection in Factorial Designs Greedy search or MCMC algorithms are used to select models when they cannot be itemized Greedy search or MCMC algorithms are used to select models when they cannot be itemized Selection criteria include Bayes Factor, Schwarz criterion, Bayesian Information Criterion Selection criteria include Bayes Factor, Schwarz criterion, Bayesian Information Criterion Refer to R package BMA and bic.glm for fitting more general models. Refer to R package BMA and bic.glm for fitting more general models.

Bayesian Model Selection in Factorial Designs For each effect, we sum the probabilities for all M i that contain that effect and obtain a marginal posterior probability for that effect. For each effect, we sum the probabilities for all M i that contain that effect and obtain a marginal posterior probability for that effect. These marginal probabilities are relatively robust to the choice of . These marginal probabilities are relatively robust to the choice of .

Case Study Violin data* (2 4 factorial design with n=11 replications) Violin data* (2 4 factorial design with n=11 replications) Response: Decibels Response: Decibels Factors Factors –A: Pressure (Low/High) –B: Placement (Near/Far) –C: Angle (Low/High) –D: Speed (Low/High) *Carla Padgett, STAT 706 taught by Don Edwards

Case Study Fractional Factorial Design: A, B, and D significant A, B, and D significant AB marginal AB marginal *Carla Padgett, STAT 706 taught by Don Edwards Bayesian Model Selection: A, B, D, AB, AD, BD significant A, B, D, AB, AD, BD significant All others negligible All others negligible