[Topic 5-Bayesian Analysis] 1/77 Discrete Choice Modeling William Greene Stern School of Business New York University.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods

MCMC estimation in MlwiN

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.

Bayesian Estimation in MARK

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.

Part 24: Bayesian Estimation 24-1/35 Econometrics I Professor William Greene Stern School of Business Department of Economics.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Bayesian statistics – MCMC techniques

Maximum likelihood (ML) and likelihood ratio (LR) test

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Maximum likelihood (ML) and likelihood ratio (LR) test

Presenting: Assaf Tzabari

Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Introduction to Bayesian Parameter Estimation

Maximum likelihood (ML)

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Further advanced methods Chapter 17.

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

The Paradigm of Econometrics Based on Greene’s Note 1.

Discrete Choice Modeling William Greene Stern School of Business New York University.

Part 1: Introduction 1-1/22 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Bayes Factor Based on Han and Carlin (2001, JASA).

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Statistical Decision Theory

Model Inference and Averaging

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Part 25: Bayesian [1/57] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.

STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University

1 Bayesian Essentials Slides by Peter Rossi and David Madigan.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Stochastic Frontier Models

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Univariate Gaussian Case (Cont.)

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Computacion Inteligente Least-Square Methods for System Identification.

Jump to first page Bayesian Approach FOR MIXED MODEL Bioep740 Final Paper Presentation By Qiang Ling.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Efficiency Measurement William Greene Stern School of Business New York University.

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Canadian Bioinformatics Workshops

Univariate Gaussian Case (Cont.)

Advanced Statistical Computing Fall 2016

Parameter Estimation 主講人：虞台文.

Introduction to the bayes Prefix in Stata 15

STA 216 Generalized Linear Models

STA 216 Generalized Linear Models

Econometric Analysis of Panel Data

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Chengyaun yin School of Mathematics SHUFE

Econometrics Chengyuan Yin School of Mathematics.

LECTURE 09: BAYESIAN LEARNING

Parametric Methods Berlin Chen, 2005 References:

Presentation transcript:

[Topic 5-Bayesian Analysis] 1/77 Discrete Choice Modeling William Greene Stern School of Business New York University

[Topic 5-Bayesian Analysis] 2/77 5. BAYESIAN ECONOMETRICS

[Topic 5-Bayesian Analysis] 3/77 Bayesian Estimation Philosophical underpinnings: The meaning of statistical information How to combine information contained in the sample with prior information

[Topic 5-Bayesian Analysis] 4/77 Classical Inference Population Measurement Econometrics Characteristics Behavior Patterns Choices Imprecise inference about the entire population – sampling theory and asymptotics

[Topic 5-Bayesian Analysis] 5/77 Bayesian Inference Population Measurement Econometrics Characteristics Behavior Patterns Choices Sharp, ‘exact’ inference about only the sample – the ‘posterior’ density.

[Topic 5-Bayesian Analysis] 6/77 Paradigms Classical Formulate the theory Gather evidence  Evidence consistent with theory? Theory stands and waits for more evidence to be gathered  Evidence conflicts with theory? Theory falls Bayesian Formulate the theory Assemble existing evidence on the theory Form beliefs based on existing evidence (*) Gather new evidence Combine beliefs with new evidence Revise beliefs regarding the theory Return to (*)

[Topic 5-Bayesian Analysis] 7/77 On Objectivity and Subjectivity Objectivity and “Frequentist” methods in Econometrics – The data speak Subjectivity and Beliefs Priors Evidence Posteriors Science and the Scientific Method

[Topic 5-Bayesian Analysis] 8/77 Foundational Result A method of using new information to update existing beliefs about probabilities of events Bayes Theorem for events. (Conceived for updating beliefs about games of chance)

[Topic 5-Bayesian Analysis] 9/77 Likelihoods (Frequentist) The likelihood is the density of the observed data conditioned on the parameters Inference based on the likelihood is usually “maximum likelihood” (Bayesian) A function of the parameters and the data that forms the basis for inference – not a probability distribution The likelihood embodies the current information about the parameters and the data

[Topic 5-Bayesian Analysis] 10/77 The Likelihood Principle The likelihood embodies ALL the current information about the parameters and the data Proportional likelihoods should lead to the same inferences, even given different interpretations.

[Topic 5-Bayesian Analysis] 11/77 “Estimation” Assembling information Prior information = out of sample. Literally prior or outside information Sample information is embodied in the likelihood Result of the analysis: “Posterior belief” = blend of prior and likelihood

[Topic 5-Bayesian Analysis] 12/77 Bayesian Investigation No fixed “parameters.”  is a random variable. Data are realizations of random variables. There is a marginal distribution p(data) Parameters are part of the random state of nature, p(  ) = distribution of  independently (prior to) the data, as understood by the analyst. (Two analysts could legitimately bring different priors to the study.) Investigation combines sample information with prior information. Outcome is a revision of the prior based on the observed information (data)

[Topic 5-Bayesian Analysis] 13/77 The Bayesian Estimator The posterior distribution embodies all that is “believed” about the model. Posterior = f(model|data) = Likelihood(θ,data) * prior(θ) / P(data) “Estimation” amounts to examining the characteristics of the posterior distribution(s). Mean, variance Distribution Intervals containing specified probabilities

[Topic 5-Bayesian Analysis] 14/77 Priors and Posteriors The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors P(θ) is uniform over the allowable range of θ Cannot integrate to 1.0 if the range is infinite. Salvation – improper, but noninformative priors will fall out of the posterior.

[Topic 5-Bayesian Analysis] 15/77 Symmetrical Treatment of Data and Parameters Likelihood is p(data|  ) Prior summarizes nonsample information about  in p(  ) Joint distribution is p(data,  ) P(data,  ) = p(data|  )p(  ) Use Bayes theorem to get p(  |data) = posterior distribution

[Topic 5-Bayesian Analysis] 16/77 The Posterior Distribution

[Topic 5-Bayesian Analysis] 17/77 Priors – Where do they come from? What does the prior contain? Informative priors – real prior information Noninformative priors Mathematical complications Diffuse  Uniform  Normal with huge variance Improper priors Conjugate priors

[Topic 5-Bayesian Analysis] 18/77 Application Estimate θ, the probability that a production process will produce a defective product. Sampling design: Choose N = 25 items from the production line. D = the number of defectives. Result of our experiment D = 8 Likelihood for the sample of data is L( θ | data) = θ D (1 − θ) 25−D, 0 < θ < 1. Maximum likelihood estimator of θ is q = D/25 = 0.32, Asymptotic variance of the MLE is estimated by q(1 − q)/25 =

[Topic 5-Bayesian Analysis] 19/77 Application: Posterior Density

[Topic 5-Bayesian Analysis] 20/77 Posterior Moments

[Topic 5-Bayesian Analysis] 21/77 Informative prior

[Topic 5-Bayesian Analysis] 22/77 Mixing Prior and Sample Information

[Topic 5-Bayesian Analysis] 23/77 Modern Bayesian Analysis Bayesian Estimate of Distribution of  (Posterior mean was ) Observations = 5000 (Posterior variance was ) Sample Mean = Sample variance = Standard Deviation = Skewness = Kurtosis-3 (excess)= Minimum = Maximum = Percentile = Percentile

[Topic 5-Bayesian Analysis] 24/77 Bayesian Estimator First generation: Do the integration (math)

[Topic 5-Bayesian Analysis] 25/77 The Linear Regression Model

[Topic 5-Bayesian Analysis] 26/77 Marginal Posterior for 

[Topic 5-Bayesian Analysis] 27/77 Modern Bayesian Analysis Multiple parameter settings Derivation of exact form of expectations and variances for p(  1,  2,…,  K |data) is hopelessly complicated even if the density is tractable. Strategy: Sample joint observations (  1,  2,…,  K ) from the posterior population and use marginal means, variances, quantiles, etc. How to sample the joint observations??? (Still hopelessly complicated.)

[Topic 5-Bayesian Analysis] 28/77 A Practical Problem

[Topic 5-Bayesian Analysis] 29/77 A Solution to the Sampling Problem

[Topic 5-Bayesian Analysis] 30/77 Magic Tool: The Gibbs Sampler Problem: How to sample observations from the a population, p(  1,  2,…,  K |data). Solution: The Gibbs Sampler. Target: Sample from f(x 1, x 2 ) = joint distribution Joint distribution is unknown or it is not possible to sample from the joint distribution. Assumed: Conditional distributions f(x 1 |x 2 ) and f(x 2 |x 1 ) are both known and marginal samples can be drawn from both. Gibbs sampling: Obtain one draw from x 1,x 2 by many cycles between x 1 |x 2 and x 2 |x 1. Start x 1,0 anywhere in the right range. Draw x 2,0 from x 2 |x 1,0. Return to x 1,1 from x 1 |x 2,0 and so on. Several thousand cycles produces a draw Repeat several thousand times to produce a sample Average the draws to estimate the marginal means.

[Topic 5-Bayesian Analysis] 31/77 Bivariate Normal Sampling

[Topic 5-Bayesian Analysis] 32/77 Application: Bivariate Normal Obtain a bivariate normal sample (x,y) from Normal[(0,0),(1,1,  )]. N = Conditionals: x|y is N[  y,(1-  2 )] y|x is N[  x,(1-  2 )]. Gibbs sampler: y0=0. x1 =  y0 + sqr(1-  2 )v where v is a N(0,1) draw y1 =  x1 + sqr(1-  2 )w where w is a N(0,1) draw Repeat cycle 60,000 times. Drop first 10,000. Retain every 10 th observation of the remainder.

[Topic 5-Bayesian Analysis] 33/77 Gibbs Sampling for the Linear Regression Model

[Topic 5-Bayesian Analysis] 34/77 More General Gibbs Sampler Objective: Sample joint observations on  1,  2,…,  K. from p(  1,  2,…,  K |data) (Let K = 3) Derive p(  1 |  2,  3,data) p(  2 |  1,  3,data) p(  3 |  1,  2,data) Gibbs Cycles produce joint observations 0. Start  1,  2,  3 at some reasonable values 1. Sample a draw from p(  1 |  2,  3,data) using the draws of  1,  2 in hand 2. Sample a draw from p(  2 |  1,  3,data) using the draw at step 1 for  1 3. Sample a draw from p(  3 |  1,  2,data) using the draws at steps 1 and 2 4. Return to step 1. After a burn in period (a few thousand), start collecting the draws. The set of draws ultimately gives a sample from the joint distribution. Order within the chain does not matter.

[Topic 5-Bayesian Analysis] 35/77 Using the Gibbs Sampler to Estimate a Probit Model

[Topic 5-Bayesian Analysis] 36/77 Strategy: Data Augmentation Treat y i * as unknown ‘parameters’ with  ‘Estimate’  = ( ,y 1 *,…,y N *) = ( ,y*) Draw a sample of R observations from the joint population ( ,y*). Use the marginal observations on  to estimate the characteristics (e.g., mean) of the distribution of  |y,X

[Topic 5-Bayesian Analysis] 37/77 Gibbs Sampler Strategy p(  |y*,(y,X)). If y* is known, y is known. p(  |y*,(y,X)) = p(  |y*,X). p(  |y*,X) defines a linear regression with N(0,1) normal disturbances. Known result for  |y*: p(  |y*,(y,X),  =N[0,I]) = N[b*,(X’X) -1 ] b* = (X’X) -1 X’y* Deduce a result for y*| 

[Topic 5-Bayesian Analysis] 38/77 Gibbs Sampler, Continued y i *| ,x i is Normal[x i ’ ,1] y i is informative about y i *: If y i = 1, then y i * > 0; p(y i *| ,x i y i = 1) is truncated normal: p(y i *| ,x i y i = 1) =  ( x i ’  )/[1-  ( x i ’  )] Denoted N + [x i ’ ,1] If y i = 0, then y i * < 0; p(y i *| ,x i y i = 0) is truncated normal: p(y i *| ,x i y i = 0) =  ( x i ’  )/  ( x i ’  ) Denoted N - [x i ’ ,1]

[Topic 5-Bayesian Analysis] 39/77 Generating Random Draws from f(x)

[Topic 5-Bayesian Analysis] 40/77 Sampling from the Truncated Normal

[Topic 5-Bayesian Analysis] 41/77 Sampling from the Multivariate Normal

[Topic 5-Bayesian Analysis] 42/77 Gibbs Sampler Preliminary: Obtain X’X then L such that LL’ = (X’X) -1. Preliminary: Choose initial value for  such as  0 = 0. Start with r = 1. (y* step) Sample N observations on y*(r) using  r-1, x i and y i and the transformations for the truncated normal distribution. (  step) Compute b*(r) = (X’X) -1 X’y*(r). Draw the observation on  (r) from the normal population with mean b*(r) and variance (X’X) -1. Cycle between the two steps 50,000 times. Discard the first 10,000 and retain every 10 th observation from the retained 40,000.

[Topic 5-Bayesian Analysis] 43/77 Frequentist and Bayesian Results 0.37 Seconds 2 Minutes

[Topic 5-Bayesian Analysis] 44/77 Appendix

[Topic 5-Bayesian Analysis] 45/77 Bayesian Model Estimation Specification of conditional likelihood: f(data | parameters) = L(parameters|data) Specification of priors: g(parameters) Posterior density of parameters: Posterior mean = E[parameters|data]

[Topic 5-Bayesian Analysis] 46/77 The Marginal Density for the Data is Irrelevant

[Topic 5-Bayesian Analysis] 47/77 Bayesian Estimators Bayesian “Random Parameters” vs. Classical Randomly Distributed Parameters Models of Individual Heterogeneity Sample Proportion Linear Regression Binary Choice Random Effects: Consumer Brand Choice Fixed Effects: Hospital Costs

[Topic 5-Bayesian Analysis] 48/77 A Random Effects Approach Allenby and Rossi, “Marketing Models of Consumer Heterogeneity” Discrete Choice Model – Brand Choice Hierarchical Bayes Multinomial Probit Panel Data: Purchases of 4 brands of ketchup

[Topic 5-Bayesian Analysis] 49/77 Structure

[Topic 5-Bayesian Analysis] 50/77 Priors

[Topic 5-Bayesian Analysis] 51/77 Bayesian Estimator Joint Posterior = Integral does not exist in closed form. Estimate by random samples from the joint posterior. Full joint posterior is not known, so not possible to sample from the joint posterior. Gibbs sampler is used to sample from posterior

[Topic 5-Bayesian Analysis] 52/77 Gibbs Cycles for the MNP Model

[Topic 5-Bayesian Analysis] 53/77 Results Individual parameter vectors and disturbance variances Individual estimates of choice probabilities The same as the “random parameters probit model” with slightly different weights. Allenby and Rossi call the classical method an “approximate Bayesian” approach. (Greene calls the Bayesian estimator an “approximate random parameters model”) Who’s right?  Bayesian layers on implausible uninformative priors and calls the maximum likelihood results “exact” Bayesian estimators.  Classical is strongly parametric and a slave to the distributional assumptions.  Bayesian is even more strongly parametric than classical.  Neither is right – Both are right.

[Topic 5-Bayesian Analysis] 54/77 A Comparison of Maximum Simulated Likelihood and Hierarchical Bayes Ken Train: “A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit” Mixed Logit

[Topic 5-Bayesian Analysis] 55/77 Stochastic Structure – Conditional Likelihood

[Topic 5-Bayesian Analysis] 56/77 Classical Approach

[Topic 5-Bayesian Analysis] 57/77 Mixed Model Estimation MLWin: Multilevel modeling for Windows Uses mostly Bayesian, MCMC methods “Markov Chain Monte Carlo (MCMC) methods allow Bayesian models to be fitted, where prior distributions for the model parameters are specified. By default MLwin sets diffuse priors which can be used to approximate maximum likelihood estimation.” (From their website.)

[Topic 5-Bayesian Analysis] 58/77 Bayesian Approach – Gibbs Sampling and Metropolis-Hastings

[Topic 5-Bayesian Analysis] 59/77 Gibbs Sampling from Posteriors: b

[Topic 5-Bayesian Analysis] 60/77 Gibbs Sampling from Posteriors: Ω

[Topic 5-Bayesian Analysis] 61/77 Gibbs Sampling from Posteriors:  i

[Topic 5-Bayesian Analysis] 62/77 Metropolis – Hastings Method

[Topic 5-Bayesian Analysis] 63/77 Metropolis Hastings: A Draw of  i

[Topic 5-Bayesian Analysis] 64/77 Application: Energy Suppliers N=361 individuals, 2 to 12 hypothetical suppliers X= (1) fixed rates, (2) contract length, (3) local (0,1), (4) well known company (0,1), (5) offer TOD rates (0,1), (6) offer seasonal rates]

[Topic 5-Bayesian Analysis] 65/77 Estimates: Mean of Individual  i MSL Estimate (Asymptotic S.E.) Bayes Posterior Mean (Posterior Std.Dev.) Price-1.04 (0.396)-1.04 (0.0374) Contract (0.0240) (0.0224) Local2.40 (0.127)2.41 (0.140) Well Known1.74 (0.0927)1.71 (0.100) TOD-9.94 (0.337)-10.0 (0.315) Seasonal-10.2 (0.333)-10.2 (0.310)

[Topic 5-Bayesian Analysis] 66/77 Nonlinear Models and Simulation Bayesian inference over parameters in a nonlinear model: 1. Parameterize the model 2. Form the likelihood conditioned on the parameters 3. Develop the priors – joint prior for all model parameters 4. Posterior is proportional to likelihood times prior. (Usually requires conjugate priors to be tractable.) 5. Draw observations from the posterior to study its characteristics.

[Topic 5-Bayesian Analysis] 67/77 Simulation Based Inference

[Topic 5-Bayesian Analysis] 68/77 Large Sample Properties of Posteriors Under a uniform prior, the posterior is proportional to the likelihood function Bayesian ‘estimator’ is the mean of the posterior MLE equals the mode of the likelihood In large samples, the likelihood becomes approximately normal – the mean equals the mode Thus, in large samples, the posterior mean will be approximately equal to the MLE.

[Topic 5-Bayesian Analysis] 69/77 Conclusions Bayesian vs. Classical Estimation In principle, some differences in interpretation As practiced, just two different algorithms The religious debate is a red herring Gibbs Sampler. A major technological advance Useful tool for both classical and Bayesian New Bayesian applications appear daily

[Topic 5-Bayesian Analysis] 70/77 Applications of the Paradigm Classical econometricians doggedly cling to their theories even when the evidence conflicts with them – that is what specification searches are all about. Bayesian econometricians NEVER incorporate prior evidence in their estimators – priors are always studiously noninformative. (Informative priors taint the analysis.) As practiced, Bayesian analysis is not Bayesian.

[Topic 5-Bayesian Analysis] 71/77 Methodological Issues Priors: Schizophrenia Uninformative are disingenuous (and not Bayesian) Informative are not objective Using existing information? Received studies generally do not do this. Bernstein von Mises theorem and likelihood estimation. In large samples, the likelihood dominates The posterior mean will be the same as the MLE

[Topic 5-Bayesian Analysis] 72/77 Standard Criticisms O f the Classical Approach Computationally difficult (ML vs. MCMC) No attention is paid to household level parameters. There is no natural estimator of individual or household level parameters Responses: None are true. See, e.g., Train (2003, ch. 10) O f Classical Inference in this Setting Asymptotics are “only approximate” and rely on “imaginary samples.” Bayesian procedures are “exact.” Response: The inexactness results from acknowledging that we try to extend these results outside the sample. The Bayesian results are “exact” but have no generality and are useless except for this sample, these data and this prior. (Or are they? Trying to extend them outside the sample is a distinctly classical exercise.)

[Topic 5-Bayesian Analysis] 73/77 As N  , the likelihood dominates and the prior disappears  Bayesian and Classical MLE converge. (Needs the mode of the posterior to converge to the mean.) Priors Diffuse  large variances imply little prior information. (NONINFORMATIVE) INFORMATIVE priors – finite variances that appear in the posterior. “Taints” any final results. Modeling Issues

[Topic 5-Bayesian Analysis] 74/77 Reconciliation: Bernstein-Von Mises Theorem The posterior distribution converges to normal with covariance matrix equal to 1/N times the information matrix (same as classical MLE). (The distribution that is converging is the posterior, not the sampling distribution of the estimator of the posterior mean.) The posterior mean (empirical) converges to the mode of the likelihood function. Same as the MLE. A proper prior disappears asymptotically. Asymptotic sampling distribution of the posterior mean is the same as that of the MLE.

[Topic 5-Bayesian Analysis] 75/77 Sources Lancaster, T.: An Introduction to Modern Bayesian Econometrics, Blackwell, 2004 Koop, G.: Bayesian Econometrics, Wiley, 2003 … “Bayesian Methods,” “Bayesian Data Analysis,” … (many books in statistics) Papers in Marketing: Allenby, Ginter, Lenk, Kamakura,… Papers in Statistics: Sid Chib,… Books and Papers in Econometrics: Arnold Zellner, Gary Koop, Mark Steel, Dale Poirier,…

[Topic 5-Bayesian Analysis] 76/77 Software Stata, Limdep, SAS, etc. R, Matlab, Gauss WinBUGS B ayesian inference U sing G ibbs S ampling

[Topic 5-Bayesian Analysis] 77/77 bsu.cam.ac.uk/bugs/welcome.shtml