ICES 2007 Labour Cost Index and Sample Allocation Outi Ahti-Miettinen and Seppo Laaksonen Statistics Finland (+ University of Helsinki) Labour cost index.

Slides:



Advertisements
Similar presentations
By: Saad Rais, Statistics Canada Zdenek Patak, Statistics Canada
Advertisements

Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Annual growth rates derived from short term statistics and annual business statistics Dr. Pieter A. Vlag, Dr. K. van Bemmel Department of Business Statistics,
Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,
SADC Course in Statistics General approaches to sample size determinations (Session 12)
United Nations Statistics Division/DESA
CHAPTER 14: Confidence Intervals: The Basics
Mean, Proportion, CLT Bootstrap
STATISTICS FOR MANAGERS LECTURE 2: SURVEY DESIGN.
Chapter 10: Estimating with Confidence
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
2006 August Labour statistics The usage of administrative data sources for Lithuanian data of earnings Milda Šličkutė-Šeštokienė Statistics Lithuania.
Confidence Intervals for Proportions
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 18, Slide 1 Chapter 18 Confidence Intervals for Proportions.
Confidence Intervals for
Sampling.
QBM117 Business Statistics Statistical Inference Sampling Distribution of the Sample Mean 1.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
ISSUES RELATED TO SAMPLING Why Sample? Probability vs. Non-Probability Samples Population of Interest Sampling Frame.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V
Chapter 10: Estimating with Confidence
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
Chapter Outline  Populations and Sampling Frames  Types of Sampling Designs  Multistage Cluster Sampling  Probability Sampling in Review.
Developing Business Data Collection and measuring Response Burden European Conference on Quality in Official Statistics Helsinki 4-6 May 2010 Johanna Leivo,
Sample Design.
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 9 Section 1 – Slide 1 of 39 Chapter 9 Section 1 The Logic in Constructing Confidence Intervals.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Statistics and Quantitative Analysis Chemistry 321, Summer 2014.
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,
A Strategy for Prioritising Non-response Follow-up to Reduce Costs Without Reducing Output Quality Gareth James Methodology Directorate UK Office for National.
Compilation of Distributive Trade Statistics in African Countries Workshop for African countries on the implementation of International Recommendations.
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 8.1.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Statistics 19 Confidence Intervals for Proportions.
Chapter 8: Estimating with Confidence
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Copyright 2010, The World Bank Group. All Rights Reserved. Producer prices, part 2 Measurement issues Business Statistics and Registers 1.
Questionnaire-Part 2. Translating a questionnaire Quality of the obtained data increases if the questionnaire is presented in the respondents’ own mother.
Graduate School of Business Leadership
CHAPTER 14: Confidence Intervals The Basics
A new fantastic source for updating the Statistical Business Register
A New Business Statistics in Finland - Quarterly Investments
Chapter 8: Estimating with Confidence
Sampling and estimation
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Workshop on best practices for EU-SILC revision, −
Presentation transcript:

ICES 2007 Labour Cost Index and Sample Allocation Outi Ahti-Miettinen and Seppo Laaksonen Statistics Finland (+ University of Helsinki) Labour cost index and its measurement requirements Sample allocation: Power, Neyman-Tschuprow Empirical results and which principles are followed in the first year sample selection. Content

ICES 2007 Labour Cost Index (1) is going to be a new quarterly index that will be published first time in 2009 in Finland. The first base year will be 2007 and hence the data collection needs to be covered this year its principles are to be followed an EU regulation but this does not give any detailed instructions for many basic methodologies such as sampling and questionnaire; the regulation however presents at which level the index should be published, that is, the NACE Rev.1 classification established by European Council. In Finland, LCI is based on 23 industries of the private sector due to some wishes of future users of index.

ICES 2007 Labour Cost Index (2) The structure of the index formula as it is in the regulation: = labour costs per worked and paid hour per worker at industry i of quarter t of year j = labour costs per worked and paid hour at industry i in year k = worked and paid hours of all workers at industry i in year k = labour costs of all workers at industry i in year k

ICES 2007 Labour Cost Index (3) The formula can be further developed as follows: Here: = sampling weight following from the number of respondents m ih calculated by strata ih in which h = size band. Symbol c l refers to labour costs of a sample enterprise l.

ICES 2007 Labour Cost Index (4) The index formula needs to be used at industry level but the data are collected using a sample. There is a certain amount of money available and consequently, the sampling design requires a careful planning. This paper is focused on sample allocation in which we try to take into account the requirements of the LC Index. But how to do it, since the index is mainly longitudinal in its nature? Have we any possibilities to include this feature in sampling design? Our answer is: Not completely but if we are able to well estimate the key cross-sectional para- meters, we believe to succeed also in index calculations. What are these key parameters? Next page!

ICES 2007 Labour Cost Index (5) Key parameters in LCI, all at each industry level: - Totals of worked and paid hours - Each quarter - Each calendar year = some type of a sum of quarters Note that it is for many enterprises difficult to get information about worked and paid hours but this is easier for paid hours and hence we ask only one of these for monthly paid workers. This leads to a demanding imputation task but this has not been considered in this paper. - Totals of labour costs (wages etc.) - Each quarter - Each calendar year = some type of a sum of quarters - Changes from one quarter to the next keeping the same enter- prises in the sample with a reasonable probability.

ICES 2007 Labour Cost Index (6) The index formula already includes size band h of each index industry i. This is usual in enterprise surveys since the formula also implicitly indicates that larger enterprises have a higher weight for the index, both cross-sectionally and longi- tudinally. Hence when we try to allocate a gross sample size to strata, we will have both industries i and size bands h in these strata. Our target in the rest of the paper is to find a best possible and realistic allocation strategy for the given gross sample size that was approximately 2000 enterprises. This should be made so that the industry level estimates are unbiased and reasonably accurate while, naturally, the whole index should be of high quality too.

ICES 2007 Sample Allocation (1) Naturally, there are many possible strategies for allocation in the case that the sampling design would follow simple random (srs) sampling within strata. We thus exclude pps as often has been made in business surveys due to problems in sampling frames, i.e. any business register has not been well up-dated and hence many changes in businesses have been found in survey field work. Especially, if a size variable of pps is not correct, we will have problems to well estimate figures. In LCI such changes are more expected than usually due to a four-quarters follow-up period. Indeed, these problems are present in srs but however easier to handle.

ICES 2007 Sample Allocation (2) We exclude the simplest allocations such as proportional or absolute but instead try to take into account enterprise size and variation in target variable within strata. In addition, some type of minimum allocation is required to use since it is known that some strata are small in the numbers of frame units, and hence maybe a take-all sample is required. After the key allocation we revised the gross sample sizes with anticipated response rates in each strata. Here, the experience of structural earnings statistics (SES) has been exploited although this new LCI survey is not completely similar to SES. Our response-rate allocation is rather rough. We do not present its results in this paper.

ICES 2007 Sample Allocation (3) The literature gives the three alternatives for sample allocation for this case (with our interpretation): - Neyman-Tschuprow allocation - Power allocation (Bankier 1988, et al) - Lavallée-Hidiroglou algorithm (1988) We looked at the latest approach but could not follow it completely. However, their strategy is also made as ours in two phases. So, we tested mainly the two first alternatives but our suggestion for the practice is based on a two-phase allocation so that - first an allocation for industries is made, and - secondly the first-phase allocation is continued to size bands. Note, that we tested one-phase allocation but this did not work as well as the two-phase approach did.

ICES 2007 Data set in pilot study (1) We could not get any data set that would be similar to the data set being available, but we had opportunity to create something enough similar for evaluating the behavior of this phenomenon and when deciding which sample allocation could work in practice, and finally to draw a real sample for the first year. The ongoing year sample cannot be much different but it will be decided later which revision is needed. Our data set for tests is constructed from the business register and the structural earnings statistics that was not an easy job, but we do not go to details. Its variables are annual, not quarterly. We could not construct a good longitudinal data set for analysing change effects due to too many incorrect values. So, our tests are cross-sectional.

ICES 2007 Data set in pilot study (2) For testing the sample, we constructed a sampling frame from year 2004 out of register and previous surveys. The precision of the point-estimates was evaluated by coefficient of variation (cv). cv is defined as proportion of standard error in the estimate. A total of K (=1000) independent samples were drown from the frame population. Measure of efficiency:

ICES 2007 Power and Neyman-Tschuprow allocation (1) for the reference year 2007 of the LCI in Finland concerning the first phase Bankier 1988: X = size variable that is ‘number of employees’ in our example Y = target variable, ‘earnings’ in our example S h = standard deviation of Y. q = power varying from 0 to 1

ICES 2007 Power and Neyman-Tschuprow allocation (2) We tested q - values from 0 to 1 and also q=1 and X h =Y h that leads to Neyman-Tschuprow allocation. In the following four pages we present our basic results from the first phase of sampling. The X-axis shows the different powers q and in the end there is Neyman-Tschuprow (N-T). The Y-axis indicates the accuracy in coefficients of variation (cv). There are different types of industries in different pages.

ICES 2007 Power and Neyman-Tschuprow allocation (3) These figures are most strange, but often due to a small industry. Even so that cv = 0 in some cases since all enterprises are needed to draw.

ICES 2007 Power and Neyman-Tschuprow allocation (4) In these cases cv’s are decreasing while q is increasing and N- T gives the lowest values. These are the large industry classes where N-T gives large sampling sizes.

ICES 2007 Power and Neyman-Tschuprow allocation (5) In these cases the results are rather opposite to the previous ones, including rather high values for N-T.

ICES 2007 Power and Neyman-Tschuprow allocation (6) In these cases the results are mainly similar to the previous ones except that cv’s are increasing more and N-T results are close to those with large q.

ICES 2007 Conclusion The allocation results are not uniform over industries and hence it is not automatically clear which allocation should be used in practice. Maybe the best solution could be to apply a bit different allocation for different groups of industries. This can be problematic in practice and hence the sample allocation for the first phase (for industries) was decided to use the allocation with q = 0.5 that is also called square-root allocation. Everything will be checked again after the first-year field work period (=2007) but the sampling design cannot be changed much.