Part 24: Bayesian Estimation 24-1/35 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Slides:



Advertisements
Similar presentations
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Advertisements

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
1 Estimating Heterogeneous Price Thresholds Nobuhiko Terui* and Wirawan Dony Dahana Graduate School of Economics and Management Tohoku University Sendai.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian statistics – MCMC techniques
Visual Recognition Tutorial
Part 12: Random Parameters [ 1/46] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Discrete Choice Modeling William Greene Stern School of Business New York University Lab Sessions.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Part 23: Simulation Based Estimation 23-1/26 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
[Topic 5-Bayesian Analysis] 1/77 Discrete Choice Modeling William Greene Stern School of Business New York University.
Bayes Factor Based on Han and Carlin (2001, JASA).
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Model Inference and Averaging
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
[Part 4] 1/43 Discrete Choice Modeling Bivariate & Multivariate Probit Discrete Choice Modeling William Greene Stern School of Business New York University.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Part 25: Bayesian [1/57] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Discrete Choice Modeling William Greene Stern School of Business New York University.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Discrete Choice Modeling William Greene Stern School of Business New York University.
Confidence Interval & Unbiased Estimator Review and Foreword.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stochastic Frontier Models
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Efficiency Measurement William Greene Stern School of Business New York University.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Probability Theory and Parameter Estimation I
Discrete Choice Modeling
Ch3: Model Building through Regression
Introduction to the bayes Prefix in Stata 15
Special Topics In Scientific Computing
Predictive distributions
OVERVIEW OF BAYESIAN INFERENCE: PART 1
More about Posterior Distributions
Econometric Analysis of Panel Data
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
Rong Zhang, Brett Inder, Xibin Zhang
Econometrics Chengyuan Yin School of Mathematics.
Parametric Methods Berlin Chen, 2005 References:
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

Part 24: Bayesian Estimation 24-1/35 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 24: Bayesian Estimation 24-2/35 Econometrics I Part 24 – Bayesian Estimation

Part 24: Bayesian Estimation 24-3/35 Bayesian Estimators  “Random Parameters” vs. Randomly Distributed Parameters  Models of Individual Heterogeneity Random Effects: Consumer Brand Choice Fixed Effects: Hospital Costs

Part 24: Bayesian Estimation 24-4/35 Bayesian Estimation  Specification of conditional likelihood: f(data | parameters)  Specification of priors: g(parameters)  Posterior density of parameters:  Posterior mean = E[parameters|data]

Part 24: Bayesian Estimation 24-5/35 The Marginal Density for the Data is Irrelevant

Part 24: Bayesian Estimation 24-6/35 Computing Bayesian Estimators  First generation: Do the integration (math)  Contemporary - Simulation: (1) Deduce the posterior (2) Draw random samples of draws from the posterior and compute the sample means and variances of the samples. (Relies on the law of large numbers.)

Part 24: Bayesian Estimation 24-7/35 Modeling Issues  As N  , the likelihood dominates and the prior disappears  Bayesian and Classical MLE converge. (Needs the mode of the posterior to converge to the mean.)  Priors Diffuse  large variances imply little prior information. (NONINFORMATIVE) INFORMATIVE priors – finite variances that appear in the posterior. “Taints” any final results.

Part 24: Bayesian Estimation 24-8/35 A Practical Problem

Part 24: Bayesian Estimation 24-9/35 A Solution to the Sampling Problem

Part 24: Bayesian Estimation 24-10/35 The Gibbs Sampler  Target: Sample from marginals of f(x 1, x 2 ) = joint distribution  Joint distribution is unknown or it is not possible to sample from the joint distribution.  Assumed: f(x 1 |x 2 ) and f(x 2 |x 1 ) both known and samples can be drawn from both.  Gibbs sampling: Obtain one draw from x 1,x 2 by many cycles between x 1 |x 2 and x 2 |x 1. Start x 1,0 anywhere in the right range. Draw x 2,0 from x 2 |x 1,0. Return to x 1,1 from x 1 |x 2,0 and so on. Several thousand cycles produces the draws Discard the first several thousand to avoid initial conditions. (Burn in)  Average the draws to estimate the marginal means.

Part 24: Bayesian Estimation 24-11/35 Bivariate Normal Sampling

Part 24: Bayesian Estimation 24-12/35 Gibbs Sampling for the Linear Regression Model

Part 24: Bayesian Estimation 24-13/35 Application – the Probit Model

Part 24: Bayesian Estimation 24-14/35 Gibbs Sampling for the Probit Model

Part 24: Bayesian Estimation 24-15/35 Generating Random Draws from f(X)

Part 24: Bayesian Estimation 24-16/35 ? Generate raw data Calc ; Ran(13579) $ Sample ; $ Create ; x1 = rnn(0,1) ; x2 = rnn(0,1) $ Create ; ys =.2 +.5*x1 -.5*x2 + rnn(0,1) ; y = ys > 0 $ Namelist; x = one,x1,x2$ Matrix ; xxi = $ Calc ; Rep = 200 ; Ri = 1/(Rep-25)$ ? Starting values and accumulate mean and variance matrices Matrix ; beta=[0/0/0] ; bbar=init(3,1,0);bv=init(3,3,0)$$ Proc = gibbs $ Markov Chain – Monte Carlo iterations Do for ; simulate ; r =1,Rep $ ? [ Sample y* | beta ] Create ; mui = x'beta ; f = rnu(0,1) ; if(y=1) ysg = mui + inp(1-(1-f)*phi( mui)); (else) ysg = mui + inp( f *phi(-mui)) $ ? [ Sample beta | y*] Matrix ; mb = xxi*x'ysg ; beta = rndm(mb,xxi) $ ? [ Sum posterior mean and variance. Discard burn in. ] Matrix ; if[r > 25] ; bbar=bbar+beta ; bv=bv+beta*beta'$ Enddo ; simulate $ Endproc $ Execute ; Proc = Gibbs $ Matrix ; bbar=ri*bbar ; bv=ri*bv-bbar*bbar' $ Probit ; lhs = y ; rhs = x $ Matrix ; Stat(bbar,bv,x) $

Part 24: Bayesian Estimation 24-17/35 Example: Probit MLE vs. Gibbs --> Matrix ; Stat(bbar,bv); Stat(b,varb) $ |Number of observations in current sample = 1000 | |Number of parameters computed here = 3 | |Number of degrees of freedom = 997 | |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | BBAR_ BBAR_ BBAR_ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | B_ B_ B_

Part 24: Bayesian Estimation 24-18/35 A Random Effects Approach  Allenby and Rossi, “Marketing Models of Consumer Heterogeneity” Discrete Choice Model – Brand Choice “Hierarchical Bayes” Multinomial Probit  Panel Data: Purchases of 4 brands of Ketchup

Part 24: Bayesian Estimation 24-19/35 Structure

Part 24: Bayesian Estimation 24-20/35 Bayesian Priors

Part 24: Bayesian Estimation 24-21/35 Bayesian Estimator  Joint Posterior=  Integral does not exist in closed form.  Estimate by random samples from the joint posterior.  Full joint posterior is not known, so not possible to sample from the joint posterior.

Part 24: Bayesian Estimation 24-22/35 Gibbs Cycles for the MNP Model  Samples from the marginal posteriors

Part 24: Bayesian Estimation 24-23/35 Results  Individual parameter vectors and disturbance variances  Individual estimates of choice probabilities  The same as the “random parameters model” with slightly different weights.  Allenby and Rossi call the classical method an “approximate Bayesian” approach. (Greene calls the Bayesian estimator an “approximate random parameters model”) Who’s right?  Bayesian layers on implausible uninformative priors and calls the maximum likelihood results “exact” Bayesian estimators  Classical is strongly parametric and a slave to the distributional assumptions.  Bayesian is even more strongly parametric than classical.  Neither is right – Both are right.

Part 24: Bayesian Estimation 24-24/35 Comparison of Maximum Simulated Likelihood and Hierarchical Bayes  Ken Train: “A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit”  Mixed Logit

Part 24: Bayesian Estimation 24-25/35 Stochastic Structure – Conditional Likelihood Note individual specific parameter vector,  i

Part 24: Bayesian Estimation 24-26/35 Classical Approach

Part 24: Bayesian Estimation 24-27/35 Bayesian Approach – Gibbs Sampling and Metropolis-Hastings

Part 24: Bayesian Estimation 24-28/35 Gibbs Sampling from Posteriors: b

Part 24: Bayesian Estimation 24-29/35 Gibbs Sampling from Posteriors: Ω

Part 24: Bayesian Estimation 24-30/35 Gibbs Sampling from Posteriors:  i

Part 24: Bayesian Estimation 24-31/35 Metropolis – Hastings Method

Part 24: Bayesian Estimation 24-32/35 Metropolis Hastings: A Draw of  i

Part 24: Bayesian Estimation 24-33/35 Application: Energy Suppliers  N=361 individuals, 2 to 12 hypothetical suppliers  X= (1) fixed rates, (2) contract length, (3) local (0,1), (4) well known company (0,1), (5) offer TOD rates (0,1), (6) offer seasonal rates (0,1).

Part 24: Bayesian Estimation 24-34/35 Estimates: Mean of Individual  i MSL EstimateBayes Posterior Mean Price-1.04 (0.396)-1.04 (0.0374) Contract (0.0240) (0.0224) Local2.40 (0.127)2.41 (0.140) Well Known1.74 (0.0927)1.71 (0.100) TOD-9.94 (0.337)-10.0 (0.315) Seasonal-10.2 (0.333)-10.2 (0.310)

Part 24: Bayesian Estimation 24-35/35 Reconciliation: A Theorem (Bernstein-Von Mises)  The posterior distribution converges to normal with covariance matrix equal to 1/N times the information matrix (same as classical MLE). (The distribution that is converging is the posterior, not the sampling distribution of the estimator of the posterior mean.)  The posterior mean (empirical) converges to the mode of the likelihood function. Same as the MLE. A proper prior disappears asymptotically.  Asymptotic sampling distribution of the posterior mean is the same as that of the MLE.