Ghent University, Belgium Land productivity and plot size: Is measurement error driving the inverse relationship? Sam Desiere Ghent University, Belgium sam.desiere@ugent.be & Dean Jolliffe World Bank, IZA and NPC Slides prepared for the Land and Poverty Conference 2018: Land Governance in an Interconnected World, Washington, DC: 2018
Outline Background – why this matters Understanding land productivity is central to improving wellbeing of the poor Extensive empirical history on the inverse relationship between productivity & land size (IR) Data and methodology Yield = output / cultivated land size Focus on output measure Compare crop cuts with self reports of crop yields Results Systematic measurement error in self reports => IR Implications 1 Introduction Data & methods Results Implications Conclusion
Wellbeing & land productivity 40% of global extreme poverty in sub-Saharan Africa Ferreira et al (2016) Large majority of these people are engaged in activities linked to farm land Livingston et al (2001) In rest of the world, the vast majority of those living in extreme poor are in rural areas Povcalnet 10% increase in land productivity estimated to reduce poverty by 7% Irz et al (2001), Africa 1 Introduction Data & methods Results Implications
The inverse relationship between yields and land size (IR) Yields are declining in farm size Decades of empirical studies => fragmentation improves productivity => Suggests no tradeoff between equity and efficiency in terms of land redistribution Yields are declining in plot size (within a farm) Accepted as stylized fact in developing countries => Farm household irrationally applying labor and other inputs inefficiently Yields Cultivated land area 2 Introduction Data & methods Results Implications Conclusion
IR, some candidate explanations: Missing markets – land, labor, credit, inputs labor markets (can’t hire in/out) => nonseparable production and consumption decisions, household size determines labor inputs => can explain IR for farm size, but not plot size. Soil quality Barret (2010) evidence against soil quality as explanation, marginally linked to IR Measurement error in land size Carletto et al. provides some evidence supporting this, but also evidence from at least one country contradicting 3 Introduction Data & methods Results Implications Conclusion
IR, a new explanation: systematic error in self-reports of production Most/all studies about the IR estimate yields by dividing self-reported production by plot size Self-reported production is measured with error, and the error can be systematically correlated with plot size If farmers over report production on small plots and under report it on larger plots, the IR is an artifact of the measurement 4 Introduction Data & methods Results Implications Conclusion
Alternate measure of yield allows us to test this: crop cuts Crop cuts viewed as a gold standard for measuring yield Crop cut methodology Stages 1 & 2: randomly select EA, randomly select plots Stage 3: randomly select subplot (2m x 2m or 4m x 4m) Subplot is harvested and weighed (both wet and dry) during harvest by a trained enumerator Caveat: Crop cuts also measure yield with error Maximum yield, ignores crop loss Sampling error (random sub plot selected from plot) Instrument precision error Variation in ‘dryness’ BUT we argue there is no reason for the expectation of this error to be correlated with plot size. This is a key assumption. 5 Introduction Data & methods Results Implications Conclusion
Data ESS1 and ESS2 are pooled for power This also allows us to check temporal external validity Crop cuts are implemented on a subsample of plots during harvest for 23 crops 2m x 2m subplot in wave 1 4m x 4m subplot in wave 2 Self-reported production reported by plot during third visit (January- March) and plot size measured with GPS Average recall period for self reports is 85 days 5248 plots with both crop cuts and self-reported production 6 Introduction Data & methods Results Implications Conclusion
Descriptive statistics: maize yields Median maize yields (kg/ha) by plot size and measurement method (wave 2 only) 1. The IR disappears if yields are based on crop cuts (blue) 2. Yields based on self-reported production (red) are systematically over estimated on small plots Similar finding for different crops, for wave 1, for means. Also similar finding if we control for household characteristics, labor intensity, EA attributes as seen in regression analysis. 7 Introduction Data & methods Results Implications Conclusion
Methodology, generalizing the descriptive findings 1. Does the IR disappear if yields are measured with crop cuts? log(𝑦𝑖𝑒𝑙𝑑 𝑖𝑗 )=α log 𝐴 𝑖𝑗 +𝛿 𝑋 𝑖𝑗 +𝛽 𝐿 𝑖 +𝑢 𝑟 +𝜀 𝑖𝑗 Estimate separately for yield measures (crop cut & self-reports), compare α where A is plot size; X: plot and household characteristics (controlling for possibly nonrandom allocation of plots); L: labor (controlling for possibly nonrandom effort level by plot size); 𝑢 𝑟 : enumeration area fixed effects (controlling for possibility that EA is correlated with plot size and yield – eg. Soil, hilliness, rainfall) 8 Introduction Data & methods Results Implications Conclusion
Self-reported measurement Regression Results The IR is strong if yields are measured with self-reported production, but disappears if yields are measured with crop cuts Self-reported measurement Crop cuts (1) (2) (3) (4) (5) (6) Log plot size (m²) -0.397*** -0.396*** -0.303*** -0.161*** 0.104*** 0.149*** (-33.26) (-36.11) (-12.65) (-6.22) (8.63) (9.36) Household fixed effects X Enumeration area fixed effects Household characteristics Labor input Observations 25811 25004 5248 5059 R-squared 0.194 0.201 0.110 0.215 0.120 0.140 Notes: ***, **, * denote statistical significance at the 1%, 5% and 10% levels. T-statistics in parentheses. Errors clustered at the enumeration area. Full results are provided in the appendix. All regressions include a dummy for the wave and plot characteristics.
Self-reported production Regression Results The IR is strong if yields are measured with self-reported production, but disappears if yields are measured with crop cuts Self-reported production Crop cuts Log of plot size -0.303*** -0.161*** 0.104*** 0.149*** Family labor during harvest 0.292*** 0.0591*** Observations 5248 5059 R² 11% 22% 12% 14% Regressions include plot and household characteristics and enumeration area fixed effects. ***,**,* denote statistical significance at 1%, 5% and 10% 9 Introduction Data & methods Results Implications Conclusion
Core set (sensitivity to extreme values) Robustness checks The findings hold in different subsamples Wave 1 Wave 2 Maize Core set (sensitivity to extreme values) Self-reported Crop cuts Self- reported Crop cuts Log of plot size -0.16*** 0.17*** -0.13*** 0.14*** -0.44*** 0.06 -0.10*** 0.11*** Observations 1860 3152 722 4664 4557 R² 20% 10% 25% 36% 18% 17% 13% Regressions include plot and household characteristics, labor, and enumeration area fixed effects. ***,**,* denote statistical significance at 1%, 5% and 10% The core set discards bottom and top 5% of yields, 10 Introduction Data & methods Results Implications Conclusion
Supplemental regression analysis, supporting evidence of correlated reporting error and plot size Are yields based on self-reported production systematically over estimated on small plots relative to crop cuts? log( 𝑆𝑒𝑙𝑓−𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑦𝑖𝑒𝑙 𝑑 𝑖 𝐶𝑟𝑜𝑝−𝑐𝑢𝑡 𝑦𝑖𝑒𝑙𝑑 𝑖 )=α log 𝐴 𝑖 +𝛿 𝑋 𝑖 +𝛽 𝐿 𝑖 +𝑢 𝑟 +𝜀 𝑖 Estimate separately for yield measures (crop cut & self-reports), compare α where A is plot size; X: plot and household characteristics (controlling for possibly nonrandom allocation of plots); L: labor (controlling for possibly nonrandom effort level by plot size); 𝑢 𝑟 : enumeration area fixed effects (controlling for possibility that EA is correlated with plot size and yield – eg. Soil, hilliness, rainfall) 8 Introduction Data & methods Results Implications Conclusion
Results, supplemental evidence on systematic measurement error in self-reported production Assuming self-reports over estimate true production (as suggested by existing empirical literature), a decline in the value of the ratio of yields indicates a decline in the measurement error in self reports. => The regression results provide support to the suggestion that measurement error in self reports is declining in plot size. Estimation of systematic measurement error in self-reported production Dependent variable: Ratio of self-report over crop-cut yields Log of plot size -0.295*** -0.287*** Family labor during harvest 0.226*** Observations 5914 5053 Regressions include plot and household characteristics and enumeration area fixed effects. ***,**,* denote statistical significance at 1%, 5% and 10% 11 Introduction Data & methods Results Implications Conclusion
ME in self report explains IR in ET, Is the finding valid for other countries? 1. Findings hold across crops and waves 2. ESS data is representative of the rural population in Ethiopia Our analysis is based on a random subsample of the ESS – A household is in our sample if one of their plots was randomly selected for crop cuts Subsample with crop cuts looks similar to rest of population (except for landholdings, which is to be expected since household with more plots more likely to be in sample) No crop cuts Crop cuts difference (t-value) Landholdings (ha) 1.18 1.34 3.07 *** Applied chemical fertilizer (%) 47 50 1.29 Asset index 0.16 0.17 2.00 ** Household size 5.75 5.90 1.62 Age household head 46.47 45.67 1.42 Household head can read & write (%) 38 39 0.73 Female headed household (%) 20 17 1.50 N (min/max) 2954/3018 878/890 Notes: Number of observations differs by variable due to missing variables. T-values estimated by clustering the standard errors at the enumeration area. ***, **, * denote statistical significance at the 1%, 5% and 10% levels.
Implications for the IR Rejection of a negative correlation between plot size and land productivity implies that: Rejecting plot-level IR aligns with conventional wisdom that fragmentation increases (not reduces) inefficiencies. Also refutes inference that farmers behave irrationally in allocation of inputs across plots With no IR at plot level, missing markets may explain the inverse productivity farm-size relationship Any regression with farm output as a regressor, needs to be attentive to bias induced by the correlated measurement error (output and anything related to plot size). 12 Introduction Data & methods Results Implications Conclusion
Systematic measurement error and dynamics, one example Systematic measurement error can invalidate time trends Example: strong population growth -> smaller plots per capita -> production is over reported on smaller plots -> yields increase Statistician concludes: population growth -> increases yields (i.e Boserup is right, Malthus is wrong) But this conclusion is false here 14 Introduction Data & methods Results Implications Conclusion
Conclusion Measurement error in self-reported production is the primary causal source of the inverse-relationship between productivity and plot size. Our estimates suggest that production is over-stated by 40% to 100% for plots of 100m², and under-stated by 10% to 35% on plots of 3000m² in the case of rural Ethiopian households Rejecting IR at plot level resurrects missing-market explanations for IR and farm-size level. Of course, more research is needed to establish whether this is the explanation for plot-level IR is unique to Ethiopia (or countries ‘like’ Ethiopia, in some relevant dimension). 15 Introduction Data & methods Results Implications Conclusion
Acknowledgments & Data Ethiopia Socioeconomic Survey (ESS) data was collected in collaboration with Ethiopia’s Central Statistical Agency This research was generously funded by UK’s Department for International Development – Ethiopia Office ESS publicly available: go.wordlbank.org/ZK2ZDZYDD0 ESS3 (third wave) coming soon
Implications beyond the IR A. Random error reduces the precision B. Systematic error implies a bias between the ‘true’ value and the outcome of the measurement Bias is constant Bias correlates with observable and unobservable characteristics Systematic error correlated with important (household) characteristics is the nightmare of every statistician 13 Introduction Data & methods Results Implications Conclusion
(Disturbing) implications of the IR Land redistribution improves equality and efficiency Policymakers should focus on small-scale farmers (rather than large-scale agriculture) to improve ag productivity Land consolidation will not increase total agricultural production Farmers behave irrationally since they do not allocate labor and inputs efficiently across plots Because of the important policy implications, an extensive theoretical and empirical literature about the IR has emerged 3 Introduction Data & methods Results Implications Conclusion
Analysis based on ESS, publicly available at: Desiere, S., Jolliffe, D., 2018. “Land productivity and plot size: Is measurement error driving the inverse relationship?” Journal of Development Economics 130, 84–98. doi.org/10.1016/j.jdeveco.2017.10.002 Analysis based on ESS, publicly available at: http://go.worldbank.org/HWKE6FXHJ0