Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.

Similar presentations


Presentation on theme: "New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level."— Presentation transcript:

1 New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level Information Annamaria Bianchi, Natalie Shlomo, Barry Schouten, Damiao Da Silva and Chris Skinner 1

2 Contents Introduction Population Based Response Propensities
Population Based R-indicators Evaluation Study Real Application Discussion 2

3 Introduction Indirect measures of nonresponse bias supplement the response rate Measures come at a time where there is an increased interest in adapting data collection: the level of effort targeted at different subgroups varied over time, possibly through a change of strategy, according to patterns of response Taxonomy of measures: indicators that include only observed auxiliary variables and indicators that also include observed survey variables which may or may not account for non-response weighting Indicators that use observed auxiliary variables are R-indicators (Schouten, Cobben and Bethlehem 2009, Schouten, Shlomo and Skinner 2011) and balance indicators (Särndal 2011, Lundquist and Särndal 2013) R-indicators presume availability of auxiliary variables through linked data from sample frames, registers, etc. which is not always available, especially to users outside of NSIs We develop R-indicators that are based on population statistics and that can be computed without any knowledge about the non- respondents 3

4 Introduction R-indicators and their statistical properties (Shlomo, Skinner and Schouten, 2012) relate to the case where we have linked sample level auxiliary information for non-respondents For R-indicators based on population statistics, we propose a new method for estimating response propensities that does not need auxiliary information for non-respondents: population-based response propensities. Auxiliary information for population-based response propensities is obtained from population tables and population counts Distinguish two settings. (1) is known for all sample units, respondents and non-respondents (sample based auxiliary information) , and (2) is known only at the aggregate level, i.e. the population total and/or the population cross-products (population-based auxiliary information) 4

5 Population Based Response Propensities
response propensities where assume auxiliary variables missing at random holds (Little and Rubin, 2002) Generally, response propensities are modelled by generalized linear model, eg. logistic regression In the population-based setting, it is convenient to consider the identity link function Identity link function good approximation to logistic link function when response rates are mid-range, between 30% and 70%, which is typical response rate obtained in national and other surveys Identity link function also form the basis for other representativeness indicators, such as the imbalance and distance indicators proposed by Särndal (2011) true response propensities satisfy the linear probability model and estimated by weighted least squares, where di is the design weight 5

6 Population Based R-indicators
Replace sums and/or cross-products with population based estimates: where or: where and Population based R-indicator: where and takes values on the interval: (we linearize for ease of bias computation) - Population based CV: 6

7 R-indicators Estimated R-indicator: Sample based: where
Population based: and according to T1 or T2 type of information and estimated CV: Empirical results show that population based R-indicators have standard errors and biases that increase with higher response rates: ignore the sampling which causes sample covariances in the denominator of the estimated response propensities to vary along with the numerator. By ‘plugging’ in a fixed population covariance in the denominator, there is no variation arising from sampling. Propose: where 𝜆 should be an increasing function of the response rate and converge to 1 with higher response rates (estimated response propensities greater than 1 due to the linear link function under high response rates will be closer to 1) 7

8 R-indicators In addition:
Analytical expressions for bias correction under SRS and complex survey designs Variance estimation using resampling methods Estimation of an optimal 𝜆 opt for the composite response propensity Evaluation Study based on 1995 Census Sample of Israel N=322,411 households where we defined probabilities of response and a 1-response, 0 non-response indicator for different overall response rates . Next, ran a linear and logistic regression model on population where response variable is the {0,1} indicator and under the real (Model 1) and mis-specified model (Model 2) RR1 RR2 RR3 Overall response rate 27.1 67.0 87.0 Population R-indicator (logistic) Model 1 0.9031 0.9005 0.9063 Model 2 0.9103 0.9074 0.9137 Population R-indicator (linear) 0.9033 0.9006 0.9076 0.9104 0.9145 8

9 Draw 500 samples under 3 sampling fractions: 1%, 2% and 4%
Evaluation Study Draw 500 samples under 3 sampling fractions: 1%, 2% and 4% 1% and 4% Samples Model 1 and RR1 1% and 4% Samples Model 1 and RR3 9

10 - 2002 HS data. The net sample size is 33,584 persons.
Dutch Health Survey HS data. The net sample size is 33,584 persons. We see differences in respondents vs sample and population based distributions. This will impact on the use of population estimates in the R-indicators as seen in the estimation of Variables Categories Respon-dents Sample Popula-tion Age 20-24 7.5 7.9 8.1 25-29 7.3 8.2 8.9 30-34 9.9 10.2 10.9 35-39 10.8 11 40-44 10.3 10.4 45-49 9.7 9.4 9.6 50-54 9.5 55-59 8.8 8 60-64 7.1 6.7 6.3 65-69 5.9 5.6 5.4 70-74 4.7 4.6 75+ 7.7 7.8 7.2 Gender Male 48.9 49.8 49.2 Female 51.1 50.2 50.8 Marital status Not married 23.7 26.8 26.9 Married 63.3 59.3 58.8 Widowed 6.5 Divorced 6.4 7.6 𝜆 𝑜𝑝𝑡 Smoothing parameter 𝜆 𝑜𝑝𝑡 Type 1 Type 2 Population-based response propensities 0.043 0.038 Sample-based response propensities 0.076 0.095 10

11 Dutch Health Survey Unadjusted Bias-adjusted  Estimator R-indicator 95% CI Sample-based 0.899 0.888 0.909 0.901 0.890 0.912 Type 1 – original 0.876 0.860 0.891 0.879 0.864 0.895 Type 1 – composite population-based 0.880 0.865 0.896 Type 1 – composite sample-based 0.883 0.868 0.898 Type 2 - original 0.873 0.858 0.889 0.877 0.861 0.894 Type 2 – composite population-based 0.878 0.863 0.862 0.893 Type 2 – composite sample-based 0.881 0.866 0.897 Population-based R-indicators are lower than sample based R-indicators as a result of the large differences between sample and population distributions to the respondent distributions of the auxiliary variables 11

12 Discussion Caveats from this work:
The survey measures have to be the same quantities as in the population information, i.e. the survey questions have the same definitions and classifications as the population tables Best to avoid questions that are prone to measurement errors, such as questions that require a strong cognitive effort or that may lead to socially desirable answers Strongly recommended to use population statistics that are based on registrations or administrative data. The population-based R-indicators can be used for population statistics that are based on surveys, but these statistics may not reflect the true population distribution accurately and one would draw erroneous conclusions about the representativeness of the response if the population estimates are biased 12

13 Discussion Caveats from this work:
In settings where only population information is available, options to improve representativeness during data collection through adaptive survey designs are much more limited; for the non-respondents no individual auxiliary information is available In these settings, assessments of representativeness may still be useful in the design of advance and reminder letters, in interviewer training and in paradata collection Extensions that are relatively straightforward for future research: Consider hybrid settings where the R-indicator is based on both linked data and population tables Develop the case where if there is no aggregated population information available, we can use weighted survey estimates. This will impact on the bias and variance estimates for the population based R-indicators 13

14 Thank you for your attention
14


Download ppt "New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level."

Similar presentations


Ads by Google