General linear models in small area estimation: an assessment in agricultural surveys Carlo Russo, Massimo Sabbatini, and Renato Salvatore University of.

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Applied Bayesian Inference for Agricultural Statisticians Robert J. Tempelman Department of Animal Science Michigan State University 1.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Barbara M. Altman Emmanuelle Cambois Jean-Marie Robine Extended Questions Sets: Purpose, Characteristics and Topic Areas Fifth Washington group meeting.
BASIC SAMPLING ISSUES Nur ÖZKAN Tuğba TURA.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
1.2.4 Statistical Methods in Poverty Estimation 1 MEASUREMENT AND POVERTY MAPPING UPA Package 1, Module 2.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Small area Estimation of Italian poverty and social exclusion indicators Stefano Falorsi Michele D’Alò Loredana Di Consiglio Fabrizio Solari Matteo Mazziotta.
Resampling techniques
Maximum likelihood (ML) and likelihood ratio (LR) test
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
STAT262: Lecture 5 (Ratio estimation)
Maximum likelihood (ML)
Formalizing the Concepts: Simple Random Sampling.
Arun Srivastava. Small Areas What is a small area? Sub - population Domain The Domain need not necessarily be geographical. Examples Geographical Subpopulations.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Bayes Factor Based on Han and Carlin (2001, JASA).
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.
Understanding Statistics
RESEARCH IN MATH EDUCATION-3
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Role of Statistics in Geography
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Managerial Economics Demand Estimation & Forecasting.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Sampling Techniques 19 th and 20 th. Learning Outcomes Students should be able to design the source, the type and the technique of collecting data.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Question paper 1997.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
1/25 Introduction to Econometrics. 2/25 Econometrics Econometrics – „economic measurement“ „May be defined as the quantitative analysis of actual economic.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 7. Classification and Prediction
Bayesian data analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
1/18/2019 ST3131, Lecture 1.
OVERVIEW OF LINEAR MODELS
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

General linear models in small area estimation: an assessment in agricultural surveys Carlo Russo, Massimo Sabbatini, and Renato Salvatore University of Cassino, Italy The MEXSAI Conference

Some small area estimation references Ghosh M., Rao J. N. K. (1994), Small area estimation: an appraisal, Statistical Science, Vol. 9, No. 1, pp He Z., Sun D. (2000), Hierarchical Bayes estimation of hunting success rates with spatial correlations, Biometrics, 56, Malec D., Sedransk J., Moriarity C. L., LeClere F. B. (1997), Small area inference for binary variables in the National Health Intgerview Survey, Journal of the American Statistical Association, Vol. 92, 439, Rao J. N. K. (2002), Small area estimation with applications to Agriculture, Proceedings of the Conference on agricultural and environmental statistical appications in Rome, Vol. III, Rao J. N. K. (2003), Small area estimation, Wiley, London The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline The term small area usually denote a small geographical area, such as a county, a province, an administrative area or a census division From a statistical point of view the small area is a small domain, that is a small subpopulation constituted by specific demographic and socioeconomic group of people, within a larger geographical areas Sample survey data provide effective reliable estimators of totals and means for large areas and domains. But it is recognized that the usual direct survey estimators performing statistics for a small area, have unacceptably large standard errors, due to the circumstance of small sample size in the area GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS The MEXSAI Conference

Small area estimation: a simple outline In fact, sample sizes in small areas are reduced, due to the circumstance that the overall sample size in a survey is usually determined to provide specific accuracy at a macro area level of aggregation, that is national territories, regions ad so on (Ghosh and Rao, 1994) Small area statistics are important tools for planning agricultural policies in specific regional and administrative areas But important is also the information demand from other sectors, such as private, especially for questions related to local social and economics conditions, in local area marketing research, and so on The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline The small area statistics are based on a collection of statistical methods that “borrow strength” form related or similar small areas through statistics models that connect variables of interest in small areas with vectors of supplementary data, such as demographic, behavioral, economic notices, coming from administratvive, census and specific sample surveys records Small area efficient statistics provide, in addition of this, excellent statistics for local estimation of population, farms, and other characteristics of interest in post-censual years The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline The most commonly used tecniques for small area estimation are the empirical Bayes (EB) procedures, the hierarchical Bayes (HB) and the empirical best linear unbiased prediction (EBLUP) procedures (Rao, 2003) Some utilization of this tecniques in agrigultural statistics are related to the implementation of satellite data, and, in general, of differently- oriented sumpley surveys in model-based frameworks There are two types of small area models that include random area- specific effects: in the first type, the basic area level model, connection through response and area specific auxiliary variables is established, because the limited availability at such type of data at unit level The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline The second type are the unit level area models, in which element-specific auxiliary data are available for the population elements (Ghosh and Rao, 1994; Rao, 2002) The simplest way to perform small area statistics is, however, to derive synthetic estimates from large area data assumptions on related local areas: sinthetic estimators are generally used because of their applicability to general sampling designs and of their improving efficiency in relation to exploiting information from similar small areas The problem is that such type of estimators are potentially design-biased. Following the composite estimate approach to small area analyis, the way of balance the bias of synthetic estimator against the instability of a direct estimator is to take a weighted average of the two estimators. The result is a composite estimator The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline Small area models include random area-specific effects in regression-synthetic area estimators The basic area level model 1) 2) 3) 1) is the vector of auxiliary data 2) the parameters of interest that are assumed to be related to the vector 1) 3) iid random effects (normal) The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline 4) 4) is the direct area estimator with sampling errors Combining 2) and 4): 5) 5) this model involves design random variables and, at the same time, the model-based random variables. It is an example of general linear mixed model (GLMM) with diagonal covariance structure The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline The BLUP (best linear unbiased prediction) estimator is a weighted average of the design-based estimator and the regression-synthetic estimator The MSE of the BLUP estimator depends on the variance parameter of the random area effects In practical applications this parameter is unknown, and it is replaced by an estimator Then, we have a two-stage estimator, called empirical BLUP (EBLUP) Since the MSE of the EBLUP estimator is insensitive to the choiche of the random area effect varaince estimator, it is larger than the BLUP estimator Assuming normality of random effects, the related variance area parameters can be estimated either by maximum likelihood (ML) or restricted maximum likelihood (REML) methods The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline The first step of the EB (empirical Bayes) approach is derive the posterior distribution of the parameters of interest, given the data, assuming that the model parameters are known Then the model parameters are estimated from the marginal distribution of the data, and within-small area parameters of interest analysis is based on the estimated posterior distribution Various methods have been proposed for avoid some problem related to the EB approach, connected to the underestimation of the true posterior variance, like bootstrap methods The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline Instead of EBLUP and EB, if we follow the HB (hierarchical Bayes) approach, first a prior distribution on the model parameters is specified, and then the posterior distribution of the parameters of interest is obtained The usually estimation small area problem are solved exploiting the posterior distribution framework. The evaluation of parameters of interest is obtained by its posterior mean-based estimate, and the precision of the estimate in terms of its MSE is measured with the posterior variance The HB approach is computationally intensive, involving in much cases high dimensional integration The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline Some tools, such as Gibbs sampling and importance sampling, the latter jointly employed with Monte Carlo numerical integration methods, are commonly used in order to overcome some computational problems In the recent years, comparative studies concerning the EBLUP, EB, and HB approaches lead in general to close values of predictors. All of the three in certain particular situations can work better than others The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Qualitative data Qualitative data are becoming relevant in the agricultural economics field for two major reasons: firstly theoretical development stress the relevance of discrete and intrinsically qualitative phenomena, secondly the increasing sophistication of the statistics approach in the field allows economist to draw quantitative conclusions from discrete data The qualitative data about households, including the role of women, services availability, presence/absence of infrastructures are considered as relevant factor in the analysis that require close consideration in any economic model in the field The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Qualitative data Agricultural economists are also interested to the social analysis of the rural territory. The segmentation of the universe, based on qualitative variables (such as gender, age, education) becomes relevant to define the dynamics of specific groups and to analyze issues of interest The shift of the policy focus from producers’ support to rural development in high income countries is one of the major factor determining the new interest in the analysis of the qualitative aspects of agriculture In the contest of qualitative data analyses, both continuos and binary or nonnumeric data are available by the large data sets exploration of some arrays, such as in agricultural census data. The complete exploitation of that large number of informations about farms is often feasible only with some explorative data analyses, in particular homogeneity and correspondence analysis The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Qualitative data On the other hand, it is recognized that some complex aspects of farms structure are correctly pointed out if we implement in economic models, at the same time, all possible information Small area statistics are powerful methods in estimating small area farms characteristics, but some agricultural policies need further information, especially those related with particular classes of farms The apparteinance of farms in well-recognized classes, jointly used with other area information, is then a basic policy-makers tool From this standpoint is very useful try to achieve small area random effects models that combine continuous and categorical predictors and use binary response variables. The goal is to estimate proportions of farms falling in some qualitative classes and in certain small areas The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model small area analysis The extension of the GLM models to binary response variables small area analysis is given in Malec et al., 1997 and The related unit level model combine, in the paper application example, small area-specific covariates with unit level demographic and socioeconomic data. Then estimates was stated relating individuals and classes, using a HB approach In that GLM model, it is assumed that each individual in the population is assigned to one of mutually exclusive and exhaustive classes, based on the individual’s demographic and socioeconomic status Given a vector of random effects the estimation of parameters of GLMM model for binary responses requests computation of high dimensional integrals, with dimension equal to the number of levels of the random factors The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model small area analysis One approach in literature was done in the contest of HB framework. He and Sun, 2000, given an example of hierarchical Bayes estimation procedure of a logistic-linear mixed model in hunting success rates at the sub-area level for post-season harvest surveys The model implements fixed week effects and random geographic effects, in the contest of autoregressive (AR) and conditional autoregressive (CAR) approach to the analysis of spatial correlations between neighboring sub-areas. The process of estimation needs, as in the case of the GLM represented by the logistic-linear model above, Gibbs sampling procedures The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model small area analysis We introduce in the paper a Monte Carlo Newton-Raphson ML procedure (McCulloch, 1997) in estimating parameters in the following general logistic-linear mixed model The estimation problem in closed form likelihood integral expressions is proposed to solve numerically via Monte Carlo approach. Another problem is how to generate starting values of the parameters in likelihood expressions if, previously, we don’t specify the vector of random effects. A natural way to solve the problem is to adopt the Metropolis algorithm, that is a simple Markov Chain Monte Carlo (MCMC) algorithm The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model small area analysis The basic characteristic of a MCMC is that the sequence of generated points takes a kind of random walk in parameter space, instead of each point being generated, one independently from another Moreover, the probability of jumping from one point to an other depends only on the last point and not on the entire previous history (this is the peculiar property of a Markov chain) The paper shows the Monte Carlo approach to the Newton-Raphson procedure of estimating logistic linear parameters estimation via an iterative procedure that leads to convergent MLE estimates, under assumption of normality The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Conclusions In this paper, a Monte Carlo Newton-Raphson algorithm has been outlined, assuming normality of random area effects, in order to approach the MLE estimation issues related to the logistic-linear mixed model, in the context of qualitative small area estimation As generally recognized, the focus of the recent economic theory on qualitative data can be summarize in two major points: the increasing interest in the analysis of discrete phenomena, and the explanatory power of qualitative variable in describing the current trend in the agricultural sector Statistical methods able to convey the qualitative information in the estimation models are able to increase efficiency The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Conclusions Due to the availability of large sets of informations about sample units and, in general, local areas of interest, provided by the full exploitation and analysis of the survey questionnaires, one of the most important questions is how to implement large sets of continuous and categorical variables in small area models In fact, many basic informations about units and areas are both continuous and categorical, and in many cases only the categorical ones can lead to appropriate assessments of specific issues From this poin of view, the logistic-linear mixed model can be an useful tool, measuring random area-specific effects and performing satisfactory area level analyses The MEXSAI Conference GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Thank you Please find much more methodological details in the paper available on the conference website to: