National Accounts and SAM Estimation Using Cross-Entropy Methods Sherman Robinson.

Slides:



Advertisements
Similar presentations
Advanced topics in Financial Econometrics Bas Werker Tilburg University, SAMSI fellow.
Advertisements

MCMC estimation in MlwiN
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Generalized Method of Moments: Introduction
Treatments of Risks and Uncertainty in Projects The availability of partial or imperfect information about a problem leads to two new category of decision-making.
Probabilistic models Haixu Tang School of Informatics.
Statistical Decision Theory Abraham Wald ( ) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Pattern Recognition and Machine Learning
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
1 Introducing IMPACT 3: Modeling Philosophy and Environment Sherman Robinson Daniel Mason-D’Croz Shahnila Islam.
Visual Recognition Tutorial
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
Lecture 5: Learning models using EM
Statistical Methods Chichang Jou Tamkang University.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Copyright ©2011 Pearson Education 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers using Microsoft Excel 6 th Global Edition.
Maximum likelihood (ML)
1. Entropy as an Information Measure - Discrete variable definition Relationship to Code Length - Continuous Variable Differential Entropy 2. Maximum Entropy.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Education Transition Matrices in Vietnam (work in progress) Study Team: Channing Arndt, Pham Lan Huong, Simon McCoy and Tran Binh Minh Central Institute.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Basic Probability (Chapter 2, W.J.Decoursey, 2003) Objectives: -Define probability and its relationship to relative frequency of an event. -Learn the basic.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 G Lect 2M Examples of Correlation Random variables and manipulated variables Thinking about joint distributions Thinking about marginal distributions:
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Chap 8-1 Chapter 8 Confidence Interval Estimation Statistics for Managers Using Microsoft Excel 7 th Edition, Global Edition Copyright ©2014 Pearson Education.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Univariate Gaussian Case (Cont.)
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Ariel Caticha on Information and Entropy July 8, 2007 (16)
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Univariate Gaussian Case (Cont.)
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Bayes Net Learning: Bayesian Approaches
Special Topics In Scientific Computing
Bias and Variance of the Estimator
Information Based Criteria for Design of Experiments
More about Posterior Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Undergraduated Econometrics
LECTURE 09: BAYESIAN LEARNING
Presentation transcript:

National Accounts and SAM Estimation Using Cross-Entropy Methods Sherman Robinson

Estimation Problem Partial equilibrium models such as IMPACT require balanced and consistent datasets the represent disaggregated production and demand by commodity Estimating such a dataset requires an efficiency method to incorporate and reconcile information from a variety of sources 2

Primary Data Sources for IMPACT Base Year FAOSTAT for country totals for: – Production: Area, Yields and Supply – Demand: Total, Food, Intermediate, Feed, Other Demands – Trade: Exports, Imports, Net Trade – Nutrition: Calories per capita, calories per kg of commodity AQUASTAT for country irrigation and rainfed production SPAM pixel level estimation of global allocation of production 3

Estimating a Consistent and Disaggregated Database 4 Estimate IMPACT Country Database FAOSTAT Estimate Technology Disaggregated Production IMPACT Country Database FAO AQUASTAT Estimate Geographic Disaggregated Production Technology Disaggregated SPAM

5 Bayesian Work Plan

Information Theory Approach Goal is to recover parameters and data we observe imperfectly. Estimation rather than prediction. Assume very little information about the error generating process and nothing about the functional form of the error distribution. Very different from standard statistical approaches (e.g., econometrics). – Usually have lots of data 6

Estimation Principles Use all the information you have. Do not use or assume any information you do not have. Arnold Zellner: “Efficient Information Processing Rule (IPR).” Close links to Bayesian estimation 7

Information Theory Need to be flexible in incorporating information in parameter/data estimation – Lots of different forms of information In classic statistics, “information” in a data set can summarized by the moments of the distribution of the data – Summarizes what is needed for estimation We need a broader view of “estimation” and need to define “information” 8

9 An analogy from physics initial state of motion. final state of motion. Force Force is whatever induces a change of motion:

10 Inference is dynamics as well old beliefs new beliefs information “Information” is what induces a change in rational beliefs.

Information Theory Suppose an event E will occur with probability p. What is the information content of a message stating that E occurs? If p is “high”, event occurrence has little “information.” If p is low, event occurrence is a surprise, and contains a lot of information – Content of the message is not the issue: amount, not meaning, of information 11

Information Theory Shannon (1948) developed a formal measure of “information content” of the arrival of a message (he worked for AT&T) 12

Information Theory For a set of events, the expected information content of a message before it arrives is the entropy measure: 13

14 Claude Shannon

E.T. Jaynes Jaynes proposed using the Shannon entropy measure in estimation Maximum entropy (MaxEnt) principle: – Out of all probability distributions that are consistent with the constraints, choose the one that has maximum uncertainty (maximizes the Shannon entropy metric) Idea of estimating probabilities (or frequencies) – In the absence of any constraints, entropy is maximized for the uniform distribution 15

16 E.T. Jaynes

Estimation With a Prior The estimation problem is to estimate a set of probabilities that are “close” to a known prior and that satisfy various known moment constraints. Jaynes suggested using the criterion of minimizing the Kullback-Leibler “cross entropy” (CE) “divergence” between the estimated probabilities and the prior. 17

18 Cross Entropy Estimation “Divergence”, not “distance”. Measure is not symmetric and does not satisfy the triangle inequality. It is not a “norm”.

MaxEnt vs Cross-Entropy If the prior is specified as a uniform distri-bution, the CE estimate is equivalent to the MaxEnt estimate Laplace’s Principle of Insufficient Reason: In the absence of any information, you should choose the uniform distribution, which has maximum uncertainty – Uniform distribution as a prior is an admission of “ignorance”, not knowledge 19

Cross Entropy Measure Two kinds of information – Prior distribution of the probabilities – Moments of the distribution Can know any moments – Can also specify inequalities – Moments with error will be considered – Summary statistics such as quantiles 20

21 Cross-Entropy Measure

22 Lagrangian

23 First Order Conditions

24 Solution

Cross-Entropy (CE) Estimates Ω is called the “partition function”. Can be viewed as a limiting form (non- parametric) of a Bayesian estimator, transforming prior and sample information into posterior estimates of probabilities. Not strictly Bayesian because you do not specify the prior as a frequency function, but a discrete set of probabilities. 25

From Probabilities to Parameters From information theory, we now have a way to use “information” to estimate probabilities But in economics, we want to estimate parameters of a model or a “consistent” data set How do we move from estimating probabilities to estimating parameters and/or data? 26

Types of Information Values: – Areas, production, demand, trade Coefficients: technology – Crop and livestock yields – Input-output coefficients for processed commodities (sugar, oils) Prior Distribution of measurement error: – Mean – Standard error of measurement – “Informative” or “uninformative” prior distribution 27

Data Estimation Generate a prior “best” estimate of all entries: Values and/or coefficients. A “prototype” based on: – Values and aggregates Historical and current data Expert Knowledge – Coefficients: technology and behavior Current and/or historical data Assumption of behavior and technical stability 28

Estimation Constraints Nationally – Area times Yield = Production by crop – Total area = Sum of area over crops – Total Demand = Sum of demand over types of demand – Net trade = Supply – Demand Globally – Net trade sums to 0 29

Measurement Error Error specification – Error on coefficients or values – Additive or multiplicative errors Multiplicative errors – Logarithmic distribution – Errors cannot be negative Additive – Possibility of entries changing sign 30

Error Specification 31

Error Specification Errors are weighted averages of support set values – The v parameters are fixed and have units of item being estimated. – The W variables are probabilities that need to be estimated. Convert problem of estimating errors to one of estimating probabilities. 32

Error Specification The technique provides a bridge between standard estimation where parameters to be estimated are in “natural” units and the information approach where the parameters are probabilities. – The specified support set provides the link. 33

Error Specification Conversion of a “standard” stochastic specification with continuous random variables into a specification with a discrete set of probabilities – Golan, Judge, Miller Problem is to estimate a discrete probability distribution 34

Uninformative Prior Prior incorporates only information about the bounds between which the errors must fall. Uniform distribution is the continuous uninformative prior in Bayesian analysis. – Laplace: Principle of insufficient reason We specify a finite probability distribution that approximates the uniform distribution. 35

Uninformative Prior Assume that the bounds are set at ±3s where s is a constant. For uniform distribution, the variance is: 36

37 7-Element Support Set

Uninformative Prior Finite uniform prior with 7-element support set is a conservative uninformative prior. Adding more elements would more closely approximate the continuous uniform distribution, reducing the prior variance toward the limit of 3s 2. Posterior distribution is essentially unconstrained. 38

Informative Prior Start with a prior on both mean and standard deviation of the error distribution – Prior mean is normally zero. – Standard deviation of e is the prior on the standard error of measurement of item. Define the support set with s=σ so that the bounds are now ±3σ. 39

40 Informative Prior, 2 Parameters Variance Mean

41 3-Element Support Set

42 Informative Prior, 2 Parameters

Informative Prior: 4 Parameters Must specify prior for additional statistics – Skewness and Kurtosis Assume symmetric distribution: – Skewness is zero. Specify normal prior: – Kurtosis is a function of σ. Can recover additional information on error distribution. 43

44 Informative Prior, 4 Parameters Variance Kurtosis Mean Skewness

45 5-Element Support Set

46 Informative Prior, 4 Parameters

Implementation Implement program in GAMS – Large, difficult, estimation problem – Major advances in solvers. Solution is now robust and routine. CE minimand similar to maximum likelihood estimators. Excel front end for GAMS program – Easy to use 47

Implementation 48 IMPACT 3 FAOSTAT Database Data Estimation with Cross Entropy Nationally: Trade = Supply - DemandNationally: Area X Yield = SupplyGlobally: Supply = Demand Data Cleaning and Setting Priors Crop ProductionLivestock Production Commodity Demand and Trade Processed Commodities (oilseeds, sugar, etc.) Data Collection Commodity BalanceFood Balance