Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component Michael E. Bellow, USDA/NASS.

Similar presentations


Presentation on theme: "Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component Michael E. Bellow, USDA/NASS."— Presentation transcript:

1 Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component Michael E. Bellow, USDA/NASS

2 Outline Background Background Simulation Methodology Simulation Methodology Results of Ten State Study Results of Ten State Study Convergence Evaluation Convergence Evaluation Summary Summary

3 County Level Commodity Estimation NASS program since 1917 NASS program since 1917 Estimates used by private sector, academia, government Estimates used by private sector, academia, government Data from various sources used Data from various sources used NASS County Estimates System developed to facilitate the estimation process NASS County Estimates System developed to facilitate the estimation process

4 Available Data Sources Voluntary response surveys of farm operators Voluntary response surveys of farm operators List frame control data (lists of known farming operations) List frame control data (lists of known farming operations) Previous year official estimates Previous year official estimates Census of Agriculture data (NASS conducts Census every five years) Census of Agriculture data (NASS conducts Census every five years) Earth resources satellite data Earth resources satellite data

5 County Crop Yield Estimation Yield is ratio of crop production to harvested area (acres) Yield is ratio of crop production to harvested area (acres) Accurate estimation challenging due to Accurate estimation challenging due to - reliable administrative data seldom available - reliable administrative data seldom available - high year-to-year variability of yields (weather - high year-to-year variability of yields (weather sensitive) sensitive) - lack of adequate sample survey data - lack of adequate sample survey data

6 Desirable Features of a County Yield Estimation Method Repeatability Repeatability Accurate variance estimation Accurate variance estimation Produce estimates for counties having no survey data Produce estimates for counties having no survey data

7 Ratio (R) Estimator Traditional crop yield estimator used by NASS Traditional crop yield estimator used by NASS Computed as ratio between production and harvested area estimates (with minor adjustment) Computed as ratio between production and harvested area estimates (with minor adjustment) Can produce inconsistent yields due to fluctuations in harvested acreage Can produce inconsistent yields due to fluctuations in harvested acreage No utilization of survey data from counties No utilization of survey data from counties other than the one being estimated other than the one being estimated

8 Model-Based County Estimation Methods Based on linear or non-linear models relating true yields to survey reported values Based on linear or non-linear models relating true yields to survey reported values Generally fit using an iterative algorithm Generally fit using an iterative algorithm Convergence not always guaranteed Convergence not always guaranteed Estimates can be adjusted for consistency with published state figures Estimates can be adjusted for consistency with published state figures

9 Stasny-Goel (SG) Method Developed at Ohio State University under cooperative agreement with NASS Developed at Ohio State University under cooperative agreement with NASS Assumes mixed effects model with farm size group as fixed effect and county as random effect Assumes mixed effects model with farm size group as fixed effect and county as random effect Random effect assumed multivariate normal with covariance matrix reflecting spatial correlation among neighboring counties - Random effect assumed multivariate normal with covariance matrix reflecting spatial correlation among neighboring counties - corr i j = if county i borders county j corr i j = if county i borders county j = 0 otherwise = 0 otherwise EM algorithm used to fit model EM algorithm used to fit model

10 Stasny-Goel Method (cont.) Previous year county yields used to derive initial estimates of county and size group effectsPrevious year county yields used to derive initial estimates of county and size group effects Processing continues until at least one of the following two conditions is satisfied –Processing continues until at least one of the following two conditions is satisfied – relative group and log-likelihood distances fall below preset limits relative group and log-likelihood distances fall below preset limits maximum allowable number of iterations reached maximum allowable number of iterations reached County yield estimates computed as weightedCounty yield estimates computed as weighted averages of individual farm level estimates averages of individual farm level estimates (weights derived from Census of Agriculture data) (weights derived from Census of Agriculture data)

11 Griffith (G) Method Developed by Dr. Dan Griffith at Syracuse University under cooperative agreement with NASS Developed by Dr. Dan Griffith at Syracuse University under cooperative agreement with NASS Predicts yield values using published number of farms producing crop of interest Predicts yield values using published number of farms producing crop of interest Assumes autoregressive model Assumes autoregressive model Employs Box-Cox and Box-Tidwell transformations Employs Box-Cox and Box-Tidwell transformations Spatial imputation routine can compute estimates for counties with missing survey data Spatial imputation routine can compute estimates for counties with missing survey data

12 Previous Research on Model-Based Methods Stasny, Goel and Rumsey (1991) – early version of SG method tested on Kansas wheat production data Stasny, Goel and Rumsey (1991) – early version of SG method tested on Kansas wheat production data Stasny et al (1995) – improved version of SG tested on Ohio corn yield data Stasny et al (1995) – improved version of SG tested on Ohio corn yield data Crouse (2000) – SG evaluated for Michigan corn and barley yield Crouse (2000) – SG evaluated for Michigan corn and barley yield Griffith (2000) – Griffith method tested on Michigan Griffith (2000) – Griffith method tested on Michigan corn yield data corn yield data Bellow (2004) – SG and Griffith methods compared for North Dakota oats and barley yield (presented at FCSM Research Conference) Bellow (2004) – SG and Griffith methods compared for North Dakota oats and barley yield (presented at FCSM Research Conference)

13 Ten-State Research Study Compare performance of Stasny-Goel, Griffith and ratio methods for various crops in ten geographically dispersed states: Compare performance of Stasny-Goel, Griffith and ratio methods for various crops in ten geographically dispersed states: NY, OH, MI, TN, MS, FL, ND, OK, NY, OH, MI, TN, MS, FL, ND, OK, CO, WA CO, WA Criteria for comparison – bias, variance, MSE, outlier properties, convergence percentage Criteria for comparison – bias, variance, MSE, outlier properties, convergence percentage

14 States In Study Area

15 Post-Stratification Size Groups NASS statewide survey data post-stratified by county and farm size based on COA data NASS statewide survey data post-stratified by county and farm size based on COA data (two or three size groups defined) (two or three size groups defined) Percentages of Census farm acres by size group used as weights for SG algorithm Percentages of Census farm acres by size group used as weights for SG algorithm Equal total land in farms criterion used to Equal total land in farms criterion used to form groups form groups

16 Data Sources For Research Study 2002-03 Quarterly Agricultural Survey 2002-03 Quarterly Agricultural Survey 2001-03 County Estimates Survey 2001-03 County Estimates Survey 2001-02 official crop yield estimates 2001-02 official crop yield estimates (previous year data) (previous year data) 2002 Census of Agriculture (number of 2002 Census of Agriculture (number of farms, land in farms) farms, land in farms)

17 Simulation Procedure Multiple regression performed on survey reported yield vs. official county yields, Multiple regression performed on survey reported yield vs. official county yields, weighted average neighbor yields, size group membership variables weighted average neighbor yields, size group membership variables Artificial population of 10,000 simulated survey data sets used to compute true population parameter values Artificial population of 10,000 simulated survey data sets used to compute true population parameter values 250 sample data sets selected at random from population 250 sample data sets selected at random from population

18 Simulation Procedure (cont.) Morans I computed to test whether simulated data sets reflect spatial correlation of real Morans I computed to test whether simulated data sets reflect spatial correlation of real survey data survey data SG, G and R methods applied to each of the SG, G and R methods applied to each of the 250 sampled data sets 250 sampled data sets Average simulated parameter values compared with corresponding population values for each estimation method Average simulated parameter values compared with corresponding population values for each estimation method

19 Measures of Estimator Performance Absolute Bias - average absolute difference between simulated yield estimates and true (population) yield Absolute Bias - average absolute difference between simulated yield estimates and true (population) yield Variance – sample variance of simulated yield estimates Variance – sample variance of simulated yield estimates Mean Square Error – average squared deviation between simulated estimates and true yield (SG program also computes analytic MSE) Mean Square Error – average squared deviation between simulated estimates and true yield (SG program also computes analytic MSE) Lower (Upper) Tail Proximity – average absolute difference between 5 th (95 th ) percentile of simulated yield estimates and true yield Lower (Upper) Tail Proximity – average absolute difference between 5 th (95 th ) percentile of simulated yield estimates and true yield

20 Pairwise Estimator Comparison for Absolute Bias (* - better method) Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Percent of Counties Favoring Percent of Counties Favoring SG SG R G Barley 90* 90* 10 10 82* 82* 18 18 Corn 92* 92* 8 66* 66* 34 34 Cotton (upland) 86* 86* 14 14 58* 58* 42 42 Dry Beans 93* 93* 7 73* 73* 27 27 Oats 88* 88* 12 12 63* 63* 37 37 Rye 83* 83* 17 17 47 47 53* 53* Sorghum 84* 84* 16 16 59* 59* 41 41 Soybeans 88* 88* 12 12 62* 62* 38 38 Sunflower 94* 94* 6 69* 69* 31 31 Tobacco (burley) 98* 98* 2 56* 56* 44 44 Wheat (spring) 83* 83* 17 17 78* 78* 22 22 Wheat (winter) 83* 83* 17 17 66* 66* 34 34

21 Pairwise Estimator Comparison for Variance Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Percent of Counties Favoring Percent of Counties Favoring SG SG R G Barley 100* 100* 0 51* 51* 49 49 Corn 99.9* 99.9* 0.1 0.1 33 33 67* 67* Cotton (upland) 100* 100* 0 13 13 87* 87* Dry Beans 100* 100* 0 20 20 80* 80* Oats 100* 100* 0 36 36 64* 64* Rye 97* 97* 3 77* 77* 23 23 Sorghum 98* 98* 2 25 25 75* 75* Soybeans 100* 100* 0 40 40 60* 60* Sunflower 100* 100* 0 56* 56* 44 44 Tobacco (burley) 100* 100* 0 49 49 51* 51* Wheat (spring) 100* 100* 0 62* 62* 38 38 Wheat (winter) 100* 100* 0 43 43 57* 57*

22 Pairwise Estimator Comparison for MSE Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Percent of Counties Favoring Percent of Counties Favoring SG SG R G Barley 92* 92* 8 77* 77* 23 23 Corn 94* 94* 6 62.5* 62.5* 37.5 37.5 Cotton (upland) 89* 89* 11 11 55* 55* 45 45 Dry Beans 96* 96* 4 75* 75* 25 25 Oats 90* 90* 10 10 61* 61* 39 39 Rye 87* 87* 13 13 40 40 60* 60* Sorghum 84* 84* 16 16 51* 51* 49 49 Soybeans 89* 89* 11 11 57* 57* 43 43 Sunflower 95.5* 95.5* 4.5 4.5 65* 65* 35 35 Tobacco (burley) 100* 100* 0 53* 53* 47* 47* Wheat (spring) 85* 85* 15 15 80* 80* 20 20 Wheat (winter) 86* 86* 14 14 64* 64* 36 36

23 Pairwise Estimator Comparison for LTP Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Percent of Counties Favoring Percent of Counties Favoring SG SG R G Barley 92* 92* 8 55* 55* 45 45 Corn 93* 93* 7 41 41 59* 59* Cotton (upland) 84* 84* 16 16 41 41 59* 59* Dry Beans 96* 96* 4 64* 64* 36 36 Oats 94* 94* 6 52* 52* 48 48 Rye 90* 90* 10 10 40 40 60* 60* Sorghum 97* 97* 3 59* 59* 41 41 Soybeans 85* 85* 15 15 38 38 62* 62* Sunflower 96* 96* 4 56* 56* 44 44 Tobacco (burley) 100* 100* 0 31 31 69* 69* Wheat (spring) 99* 99* 1 69* 69* 31 31 Wheat (winter) 89* 89* 11 11 50 50 50* 50*

24 Pairwise Estimator Comparison for UTP Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Percent of Counties Favoring Percent of Counties Favoring SG SG R G Barley 93* 93* 7 61* 61* 39 39 Corn 98* 98* 2 56* 56* 44 44 Cotton (upland) 97* 97* 3 53* 53* 47 47 Dry Beans 98* 98* 2 49 49 51* 51* Oats 92* 92* 8 43 43 57* 57* Rye 97* 97* 3 33 33 67* 67* Sorghum 84* 84* 16 16 32 32 68* 68* Soybeans 99* 99* 1 53* 53* 47 47 Sunflower 91* 91* 9 43 43 57* 57* Tobacco (burley) 98* 98* 2 69* 69* 31 31 Wheat (spring) 85* 85* 15 15 47 47 53* 53* Wheat (winter) 90* 90* 10 10 53* 53* 47 47

25 Additional Bias Evaluation Wilcoxon Rank Sum Test – compare median absolute error (over simulation runs) of SG vs. R, SG vs. G for each county Wilcoxon Rank Sum Test – compare median absolute error (over simulation runs) of SG vs. R, SG vs. G for each county Wilcoxon Signed Rank Test – assess whether median error of SG, G, R is negative, positive or zero (two one-sided tests performed for each county) Wilcoxon Signed Rank Test – assess whether median error of SG, G, R is negative, positive or zero (two one-sided tests performed for each county)

26 Results of Rank Sum Tests on Absolute Bias Crop Stasny-Goel vs. Ratio Stasny-Goel vs. Ratio Stasny-Goel vs. Griffith Stasny-Goel vs. Griffith Percent of Counties Favoring Percent of Counties Favoring SG SG R Neither Neither SG SG G Neither Neither Barley 82* 82* 9 10 10 74* 74* 13 13 Corn 85* 85* 7 8 62* 62* 27 27 11 11 Cotton (upland) 78* 78* 13 13 9 54* 54* 33 33 13 13 Dry Beans 84* 84* 7 9 67* 67* 22 22 11 11 Oats 76* 76* 11 11 13 13 61* 61* 30 30 10 10 Rye 63* 63* 10 10 27 27 40 40 20 20 Sorghum 65* 65* 13 13 22 22 56* 56* 35 35 10 10 Soybeans 80* 80* 12 12 9 60* 60* 32 32 7 Sunflower 85* 85* 5 10 10 66* 66* 25 25 8 Tobacco (burley) 95* 95* 2 3 45* 45* 38 38 16 16 Wheat (spring) 78* 78* 15 15 7 75* 75* 17 17 9 Wheat (winter) 72* 72* 16 16 12 12 61* 61* 27 27 12 12 All 79* 79* 11 11 10 10 62* 62* 27 27 11 11

27 Summary of Signed Rank Test Results (All Crops Combined) Method Test Result Test Result Bias < 0 Bias < 0 Bias > 0 Bias > 0 Bias = 0 Bias = 0 No. Counties % % % Stasny-Goel 1607 160759 887 88732 243 243 9 Griffith 1456 145654 1174 117443 82 82 3 Ratio 292 29211 245 245 9 2200 220080

28 Percent of Counties With Average Underestimate Less Than 10% of True Yield (* - best method) Crop Method Method Stasny-Goel Stasny-Goel Griffith Griffith Ratio Ratio Barley 81* 81* 62 62 46 46 Corn 83* 83* 71 71 42 42 Cotton (upland) 79* 79* 78 78 64.5 64.5 Dry Beans 95* 95* 74 74 62.5 62.5 Oats 70.5* 70.5* 54 54 21 21 Rye 41 41 52* 52* 13 13 Sorghum 52* 52* 41 41 11 11 Soybeans 84* 84* 76 76 62 62 Sunflower 80* 80* 63.5 63.5 50 50 Tobacco (burley) 93 93 98* 98* 27 27 Wheat (spring) 94* 94* 55 55 54 54 Wheat (winter) 86* 86* 75 75 51.5 51.5

29 Convergence Issues SG algorithm not guaranteed to converge within fixed limit on number of iterations SG algorithm not guaranteed to converge within fixed limit on number of iterations Non-convergence associated with numerical instability conditions Non-convergence associated with numerical instability conditions Yield estimates produced for non-convergent runs may be suspect Yield estimates produced for non-convergent runs may be suspect Convergence generally most reliable for highly prevalent crops, least reliable for rare crops Convergence generally most reliable for highly prevalent crops, least reliable for rare crops

30 Algorithm Convergence Percentage By Crop (Limit of 5000 Iterations) Crop Method Method Stasny-GoelGriffith Barley 93 93 68 68 Corn 87 87 77 77 Cotton (upland) 81 81 89 89 Dry Beans 89 89 75 75 Oats 80 80 71 71 Rye 74 74 83 83 Sorghum 85 85 66 66 Soybeans 93 93 73 73 Sunflower 90.5 90.5 80 80 Tobacco (burley) 41 41 52 52 Wheat (spring) 63 63 52.5 52.5 Wheat (winter) 88 88 65 65

31 Two Approaches to Dealing With SG Non-Convergence SG(1) - use estimate generated at final allowableSG(1) - use estimate generated at final allowable iteration (N 0 ) iteration (N 0 ) SG(2) - keep track of which iteration (i*) maximizedSG(2) - keep track of which iteration (i*) maximized the log-likelihood the log-likelihood - if i* < N 0, rerun algorithm to i* and use that estimate - if i* < N 0, rerun algorithm to i* and use that estimate - if i* = N 0, resume processing at iteration (N0+1) and continue - if i* = N 0, resume processing at iteration (N0+1) and continue until either - until either - o convergence occurs (use that estimate) OR o convergence occurs (use that estimate) OR o log-likelihood decreases from one iteration to next (use estimate o log-likelihood decreases from one iteration to next (use estimate at next-to-last iteration) at next-to-last iteration)

32 Non-Convergence Study Does SG(1) or SG(2) outperform ratio estimator in Does SG(1) or SG(2) outperform ratio estimator in cases where SG failed to converge? cases where SG failed to converge? Six cases with high non-convergence percentage selected for comparison of SG(1), SG(2) and R Six cases with high non-convergence percentage selected for comparison of SG(1), SG(2) and R - 2002 CO barley (37 simulation runs) - 2002 MS soybeans (105) - 2002 NY winter wheat (39) - 2002 ND dry beans (38) - 2002 OH oats (50) - 2003 OK rye (59)

33 Combined Pairwise Estimator Comparison for Non-Convergence Test Cases Measure SG(1) vs. Ratio SG(1) vs. Ratio SG(2) vs. Ratio SG(2) vs. Ratio SG(1) vs. SG(2) SG(1) vs. SG(2) Percent of Counties Percent of Counties Favoring Favoring Percent of Counties Percent of Counties Favoring Favoring Percent of Counties Percent of Counties Favoring Favoring SG(1) SG(1) R SG(2) SG(2) R SG(1) SG(1) SG(2) SG(2) Absolute Bias 78* 78* 22 22 80* 80* 20 20 23 23 77* 77* Variance 95* 95* 5 99* 99* 1 0 100* 100* MSE 81* 81* 19 19 83* 83* 17 17 15 15 85* 85* LTP 74* 74* 26 26 88* 88* 12 12 13 13 87* 87* UTP 84* 84* 16 16 90* 90* 10 10 15 15 85* 85*

34 Summary SG yield estimation method outperforms R in all efficiency categories and G in most categories (G outperforms R) SG yield estimation method outperforms R in all efficiency categories and G in most categories (G outperforms R) Convergence problems can be alleviated using enhanced SG approach Convergence problems can be alleviated using enhanced SG approach SG method recommended for integration into NASS County Estimates System SG method recommended for integration into NASS County Estimates System


Download ppt "Improved County Level Estimation of Crop Yield Using Model-Based Methodology With a Spatial Component Michael E. Bellow, USDA/NASS."

Similar presentations


Ads by Google