Sampling bias in multi-agent simulation (MAS) models Buysse, J 1., Frija, A 1., Van der Straeten, B 1., Nolte, S. 1, Lauwers, L. 1,2, Claeys, D. 2 and Van Huylenbroeck G. (1)Ghent University, Department of Agricultural Economics (2)Institute for Agricultural and Fisheries Research, Merelbeke, Belgium 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Overview of the presentation - Why MAS models? - Problem statement: the sampling bias in MAS - Objectives - Methodology - Results - Perspectives 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Why MAS models? - Heterogeneity of opportunities and constraints at the individual level -Accurate estimation of policy distributional effects - Accurate estimation of agents interactions (spatial effects, TC, propensity of innovation, etc.) - But… 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Problem statement - The need for full population data - In case of sampling, farms in sample cannot interact with the real-world farm 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Problem statement 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics Farmer i Full population Farmer j
Problem statement 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics Farmer i Sample Farmer k
Problem statement -Systematic bias when TC between agents are simulated in MAS - Most MAS empirical models rely on sample data - Future large scale MAS models on sampled data ! 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Problem statement Illustration of sampling bias on real model (Van der Straeten et al. 2010) - A MAS model used to simulation PR exchange between 30,000 farmers in Flanders - The bias is correlated to the sample size 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics BootstrapNumber of repetitions Average cost simulated SDAverage cost simulated/Average cost of population S= 100 (0.26%) % S= 200 (0.52%) % S= 500 (1.31%) % S= 750 (2%) % Full population (100%)
Objectives ‣ To test, illustrate, and quantify the sampling biases resulting in cases of existence of TC ‣ To develop and to discuss mechanisms that can remove such sampling biases 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Methodology - Simplified MAS model - minimizes transport costs of emissions and the cost of emission abatement: Minimize Σ n (Σ m c nm τ nm + ω n p) s.t. e n + Σ m τ mn - Σ m τ nm ≤ r n + ω n where ‣ n and m are farm indices, ‣ τ nm is the amount of transported emission form n to m, ‣ ω n is the amount of emission abatement of agent n, ‣ e n is the amount of emission of farm n, ‣ r n is the amount of emission rights of farm n, ‣ c nm is the transport cost per transported emission from farm n to m, ‣ p is the penalty per overused emission right. 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Methodology -Applied on synthetic population data (500 farmers): average cost per farm: ‣ Emission (e n ) random sampled from normal distribution, ‣ Emission right (r n ) random sampled from normal distribution, ‣ Transport costs (c nm ) is random sampled from uniform distribution - We bootstrap on different sample sizes: nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Results 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics Average cost of 27.7 Variations
Results - The order of magnitude of the sampling bias can be very large - Nonlinear effect of the sample size on the bias Cause: - Subsamples do not always satisfy the real population balance - Motivation for sampling bias correction via macrobalance coefficients - The amount of emission is smaller than the total amount of emission rights (Σn en < Σn rn) 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Results 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics Average cost of 27.7 Average costs of samples
Remove remaining bias with calibration - Calibration is the comparison of two measurements: -the measurement of a device with known correctness: full population model -is used to correct another measurement made by another device: sample model -Once calibrated, the second device can make correct measurements: sample model can be used in for simulations - Resampling data is used to estimate the calibration function: prediction of the bias as a function of the sample size 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Results coefficients of the polynomial of the simulated average costs on the sample size 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics EstimateStd. Errort valuePr(>|t|) (sample size)^06,11E+052,68E < 2e-16*** (sample size)^1-2,53E+042,62E < 2e-16*** (sample size)^25,64E+029,35E e-09*** (sample size)^3-7,46E+001,69E e-05*** (sample size)^46,24E-021,78E *** (sample size)^5-3,42E-041,16E ** (sample size)^61,24E-064,84E * (sample size)^7-2,95E-091,29E * (sample size)^84,42E-122,13E * (sample size)^9-3,78E-151,98E (sample size)^101,41E-187,91E
Results 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics Average cost of 27.7
Conclusion - Macrobalance correction is very useful -Only macrobalance is necessary -Also usefull in models without heterogenous interactions - Calibration correction is promising -such corrections are not possible if we do not have full population data -necessity to assign correction factors based on information available in sample datasets - Corrected sampling in MAS is important - more complex analysis become possible - more datasets at sample level could be used - MAS can be applied in large scale empirical models 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
Further research - Check for: -Impact on variance -Impact of changes in model structure -Impact of using synthetic full population as calibration reference - Search for: -Calibration correction without availability of full population (see first attempts in paper) 122 nd EAAE Seminar – Ancona February 2011 UGent – Faculty of Bioscience Engineering– Department Agricultural Economics
THANK YOU