The Challenge of Integrating New Surveys into an Existing Business Survey Infrastructure Éric Pelletier Statistics Canada ICES-III Montréal, Québec, Canada June 18-21, 2007
2 Outline Introduction to the Unified Enterprise Survey (UES) Culture surveys environment Integration steps to UES From Culture to UES: Frame, sampling, etc. Special case: Film Production survey UES Estimation process Back-casting for the previous two years Conclusion and future work
3 Unified Enterprise Survey (UES) UES comprises many business surveys which use unified concepts and processes 1997: 7 surveys … 2005: 45 surveys 2006: 54 surveys 2007: 62 surveys The goal of UES: produce reliable estimates at the provincial and industrial levels
4 Objectives of the UES Promote an increasing use of tax data Reduce the cost of the surveys Reduce the response burden Produce estimates for the financial variables (revenue, expenses, salaries and wages, etc.) and non-financial variables for all UES industrial sectors
5 UES Sampling Process Sampling frame: Business Register of Statistics Canada (list of establishments) Sampling unit: Within a given enterprise, a cluster of establishments within the same province and industrial group For example: establishments A and B in the same province and industry sampling unit Simple units (activity in one province and one industry) and complex units
6 UES Sampling Process Stratification: Province, Industry, Revenue Strata 1 take-all stratum 2 take-some strata 1 take-none stratum below thresholds, tax data Exclusion thresholds Delimit the take-none units from the take-some units (no questionnaire is sent to the take-none)
7 UES Sample Design T2 (corporations)T1 (unincorporated) Take-alls Take-some Take-none Survey Tax Stratum=2 Stratum=1
8 UES schedule For example, for reference year 2006 (RY2006): Sampling: October 2006 Collection: February to October 2007 Edit & Imputation: July 2007 to December 2007 Estimation: November 2007 to March 2008 The estimates are produced within 15 months (January 2007 to March 2008) The estimation is done one year after the selection of the sample
9 Culture surveys environment Activity based frames (e.g. list of books) Census surveys Occasional surveys (annual surveys, not necessarily every year) Maintained by Culture Division The Culture Streamlining Initiative was put in place to reduce the duplication in annual survey processes while promoting the use of the business survey infrastructure
10 Culture environment versus UES environment In the UES, the frame is based on industrial structure (economic survey) rather than activity (e.g. list of books, list of films, etc.) For the analysts, its a change in the way they are analysing the data More flexibility in the UES environment All the steps of a survey were compared to facilitate the integration
11 Advantages of the integration to UES Common methodologies for all annual enterprise surveys Possible to adapt some of the parameters for the needs of the surveys (at the sampling, imputation or estimation process) Infrastructure was established in 1997 with the Enterprise Statistics Division Relatively easy to integrate new surveys
12 Integration of surveys into UES Two sets of surveys: Wave 1 surveys in RY2006 (Book Publishers, Heritage Institutions and Performing Arts) Wave 2 surveys in RY2007 (Film Distribution, Film Production, Film Post-Production, Movie Theatres and Sound Recording) Integration in two steps: Step 1: From culture environment to industry-based survey, the years before UES (called UES_lite) Step 2: Integration to UES
13 Integration schedule RY2004RY2005RY2006RY2007 Wave 1 surveys UES_lite UES Wave 2 surveys CultureUES_lite UES
14 UES_lite environment Concepts are similar to the UES surveys The processing is done outside the UES infrastructure The surveys are processed by the subject matter division and the methodology division As opposed to UES processing, which is primarily handled by another Statistics Canada division called the Enterprise Statistics Division
15 From Culture to UES Sampling, Frame: 1.Culture: Census - Activity based 2.UES_lite: Sample - Establishments 3.UES: Sample - Establishments within the same enterprise, same province, same industry code The analysts were able to create reconciliation files between the frames Some other minor differences
16 Special case: Film Production survey Collection: Special case with the Film Production survey for RY2005 The Business Register (BR) is not up-to-date enough for this survey Links were discovered between the sampled establishments and establishments outside the sampling frame
17 Special case: Film Production survey Pre-contact was done for all the units Approximately 400 units were added to the sample (these units were not on the Business Register) Indirect sampling was used to address this problem A different estimation program was created for this survey
18 UES and UES_lite Estimation Process Total estimate = Survey portion + Tax portion Survey portion: Horvitz-Thompson estimator Outlier detection and treatment Final weight calculation Tax portion (take-none portion): Below the exclusion thresholds: Tax data Domain estimations: Industry, Province, etc. Variance and coefficient of variation (CV)
19 Special case: Film Production survey Estimation: The Film Production survey RY2005 was a special case Due to the application of indirect sampling, the inverse probability method was implemented (see Choudhry (2006)) Without going into all the details, The inverse probability method determines the probability that at least one sampling unit on the frame which leads to the reporting unit would be sampled The base weight is computed as the inverse of the selection probability
20 Special case: Film Production survey The complex weighting procedure led to the use of replicates in estimating the variance of the estimates More precisely, the jackknife replication method is used to calculate the variance The estimates will be produced within the next few weeks: the release date for RY2005 is July 2007 (same release date as the other Wave 2 surveys), a little bit behind schedule…
21 Special case: Film Production survey The Film Production survey for RY2007 (integration year in UES) could not be put into the UES process because: Cost of the post-selection additions Timeliness Different processes, like the jackknife replication method for the variance calculations Instead of the inverse probability method, the weight share method will be used With this method, we assign an average weight based on the sampled units and the number of links
22 Special case: Film Production survey The weight share method cannot be integrated directly into the UES process A way to integrate the weight share method into the UES process was derived (see Beaumont (2007)) With this, it will be adaptable to the regular UES estimation program The difference from the inverse probability method is that with the weight share method, we expect a slight increase in the variance This special integration will be done at the end of 2007 / beginning of 2008
23 Estimation – Back-casting For RY2005 (first year in UES_lite) for the Wave 2 surveys, the previous estimates were produced in the Culture environment As was previously shown, the frame is different for RY2005 (Business Register) Potential break in the series Back-casting procedure is used to reproduce historical estimates using the Business Register
24 Estimation – Back-casting Back-casting is done for the two previous reference years (for example, RY2003 and RY2002) A match between the units from the RY2005 sample and the units from the previous culture files is done using the reconciliation files If the unit is not matched to the previous years culture files, the data is imputed
25 Estimation – Back-casting Adjustments to the weights will be done based on the population counts from the Business Register for the two back-casting years (for example, RY2003 and RY2002) Estimates are produced by domains, and the CV are calculated for the two back- casting years for the Wave 2 surveys (released date is July 2007)
26 Infrastructure - Processing One of the main challenges in the integration of those surveys is the communication between the three parties: Methodology division (responsible for the survey methods) Subject matter division (responsible for the content, the analysis and the publication) Enterprise Statistics Division (responsible for the business survey infrastructure) Started in October 2006, the process will be completed in March 2009
27 Conclusion and future work Presently, three Wave 1 surveys are being integrated into UES for RY2006 (sample was selected in October 2006, estimation is being prepared) Next year, for RY2007, the Wave 2 surveys will be integrated Because of the infrastructure, some modifications will be made to the UES estimation program for the Film production survey, in order to integrate this survey into UES
28 Thanks Special thanks to everyone who worked on those surveys, and who helped me in the preparation of this presentation
For more Information please contact Pour plus dinformation, veuillez contacter Visit our web site at Éric Pelletier (613)