Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011
Outline Of The Presentation Overview of the Manufacturing Program Centralized Process Surveys Overview of the UES Survey Process Post Collection Processing Inputs & Tools Use of Tax Data The many phases of UES Post Collection Process Managing the UES Post Collection Process 2
Statistics Canada 3
Business and Trade Statistics IndustryStatisticsEconomy-wideStatistics Agriculture, Technology and Transportation Statistics Manufacturing and Energy DistributiveTrades Service Industries Enterprise Statistics Consumer Prices International Trade Producer Prices Investment and Capital Stock Enterprise Statistics Agriculture Small Business And Special Surveys Science, Innovation And Electronic Information Transportation 4
Manufacturing Distribution Of Sales 5
Establishments primarily engaged in the physical or chemical transformation of materials and substances into new products Includes assembly of the component parts of manufactured goods, blending of materials, finishing of manufactured products by dyeing, heat treating, plating and similar operations Transformation of own materials or those owned by others Service outputs: custom work, repair and maintenance Product outputs: finished goods, intermediate goods Who Are Manufacturers? 6
Monthly Survey of Manufacturing (MSM) Annual Survey of Manufactures and Logging (ASML) Series of sub-annual commodity surveys Manufacturing Program At Statistics Canada (STC) 7
Monthly indicator of manufacturing activity Last Redesign in 1999 Designed to be a reliable indicator for both trends and levels Establishment Survey (n= 10,500) Stratified by Province, NAICS and Size General Characteristics Of The MSM 8
Sales Goods of own manufacture Inventories Raw materials Goods-in-process Finished products Orders New orders Unfilled orders Goods purchased for resale (revenue and inventory) These data are collected but not released Sales is the main concept, exceptionally production for some industries (aerospace and shipbuilding) MSM Concepts 9
SimpleComplex Total number of establishments on the business register 2,278,730110,557 Value of sales of all establishments on the Business Register $2,214.9 billion $1,859.1 billion Total number of manufacturing establishments on the business register 84,2156,648 Value of sales of manufacturing establishments on the Business Register $340.8 billion $234.5 billion Frame And Coverage 10
MSM Sampling Plan Take-Some Take-All Take-None 11 Tax replaced Survey
Background The Goods and Services Tax (GST) is the federal Value Added Tax GST is collected by the Canada Revenue Agency (CRA) The CRA provides tax data to Statistics Canada Information received includes the Business Number, revenue, tax remitted and input tax credit MSM Sampling Plan: Use Of Tax 12
Who is replaced? Single establishment enterprises Replace 50% of sampled data with GST data Chronic refusals Who are not replaced? Very large single enterprise establishments Complex units (i.e. multiple establishments) – as it is found in the GST database Use Of Tax Data 13
Measures the contribution of manufacturing industries to economic activity in Canada In 2010, manufacturing accounted for 15% of GDP and 12% of total employment (SEPH) Key input to SNA Input-Output tables Survey collects data on what commodities are produced (Make matrix) where commodities are destined (provincial I/O tables) what commodities and primary inputs are used in production (Use matrix) What Is The Annual Survey Of Manufactures And Logging (ASML)? 14
ASML is conducted under the umbrella of Statistics Canada’s Unified Enterprise Survey Program (UES) Same as MSM Establishments primarily engaged in manufacturing and logging activities and classified to NAICS 31, 32 and 33 as well as NAICS 113 Estimates produced for 261 NAICS6 level industries Estimates produced for the 10 provinces and 3 territories. Survey Coverage 15
Revenue variables (16), expense variables (43), detailed opening and closing inventories (12), other financial (5) Sales or outputs variables are valued at producer or FOB factory gate prices required by SNA Commodities consumed (inputs) and produced (outputs) both goods and services Collect commodity values and quantities (for selected goods) Services produced and consumed collected as expense items and classified based on COA Content: Commodity Variables 16
Types Of Administrative (Tax) Data From the Canadian Revenue Agency (CRA) Agreement between two agencies T1 (unincorporated businesses) T2 (incorporated businesses) T4 (pay slips) GST (goods and service tax) PD7 (payroll deduction accounts) 17
Editing And Imputation For Manufacturing Surveys
Why A Centralized Process? Best Practices Standardization of Processes Cross Survey Comparisons Enterprise Centric Processing/Coherence Analysis Efficient use of Resources Transportable Knowledge Across Survey Programs 19
Challenges Of A Centralized Process Remain Centralized Distribute processing Priority Setting Communication and Coordination 20
Pre-Grooming Allocation / Estimation Edit & Imputation “Clean” Records Central Data Store Subject Matter Review & Correction Tool Tax Data USTART UES Post-Collection Processing 21
Collection Collection Period: February to early October Collection Processing System: Blaise Blaise can be seen as being a Collection Control Center Blaise has many functions: Call Scheduler Transaction history files Audit Trail Files And more 22
Blaise: Variables Questionnaire number Mail-out date Number of calls Length of the call Number of contact attempts Response code And more 23
Blaise: Bonuses Over The Years Blaise Transaction History (BTH) Files Collection data analysis: Produced a paper on best time to call Produced a paper on maximum # of attempts Audit Trail Files Find outliers Difficult to answer questions 24
Collection Precontact (Dec-Jan) –Mostly for Business Register (BR) births; verification of contact information (name, address, …) –By phone (in a few cases, a letter or a fact sheet is sent) Mail-out of questionnaires (Jan-March) –2 or 3 mail-out dates Follow-up in case of non-response for some units (begins about a month atfer mail-out) –Phone call, r or fax Mail-back of questionnaires Verifications of received questionnaires / Edits –Is the questionnaire complete or are some key variables missing? (Edit follow-up by phone in some cases) 25
Collection Coding of questionnaires (about 20 response codes) Response, non-response, out-of-scope, … Imaging / Data capture (CADI - Computer Assisted Data Input) 26
Centralized Collection Mailout (38K CEs) Pre-Contact (17K Businesses) Edit / Verification (BLAISE) Receipt (75% target) Delinquent Follow-Up Capture / Imaging “Clean” Records Score Function 27
UES: Data Collection / Score Function Introduced in 2002, the UES score function is the main tool used at the collection stage to determine which priority to give for the follow-up of about 23,000 Collection Entities (CE) each year. Reduces collection costs yet retains data quality Similar to the collection goal of obtaining a high weighted coverage response rate. PRIORITY 1: Extensive follow-up for the larger revenue CEs in cases of non-response. PRIORITY 0: Minimum follow-up for the smaller CEs in cases of non-response. 28
DISSEMINATION COLLECTION Chart Of Accounts Sales Operating revenue Cost of sales Gross profit Expenses EBIT Outputs Inputs Value added Shipments Operating Surplus GDP LINK, BRIDGE, CONCORDANCE 29
Expected Benefits Of A Chart Of Accounts Standardization in business data collection Higher survey response Increase in quality of data Comparison of data from various sources Increase efficiency in using administrative data 30
Links To Chart Of Accounts CHART OF ACCOUNT Establishment Legal entity Enterprise 31
UES: Use Of Tax Data Validation (comparison) Verify dubious collected data against the equivalent tax data record Imputation One of the methods used for non-response Estimation Below take-none Direct Data Replacement Update Business Register Allocation of survey data (use tax revenues, salaries and expenses)
Develop centralized systems Move away from stand-alone Single point of access for security Integrated Questionnaire Metadata System Edit and imputation Allocation and Estimation Data Warehouse Centralized Processing Systems And Databases
Enterprise Portfolio Managers Top 350 enterprises in Canada Status Platinum, Gold, Silver, Bronze Personal visits Enterprise Profiling Coordination of mail-out and collection Enterprise/ Establishment coherence Holistic Response Management Strategic Response Unit Escalation Process / Statistics Act 34
Review and Correction (Post-Capture) Done via an application which is a micro-editing tool Opportunity to perform edits and to manually correct data before the automated edit and imputation process Opportunity to gain an understanding of the quality of data coming in from the field 35
What Is Generally Done By SMOs During This Process? Ensure that industry codes are valid and response code are correct Ensure that equivalent survey cells have consistent data Enter data for records that came in after the collection cut-off date Review high impact outliers in terms of profit, average salary, etc. Check comments made by respondents and collection staff 36
Why Is This Process Necessary? Reviewing and correcting records will increase the number and quality of donors for the automated edit and imputation (E&I) stage. This will improve the quality of data coming out of E&I. Need to assess the quality of collected data Determine if problems with questionnaire Inability of respondent to provide a given data point Determine if enough data for E&I 37
What Should Not Be Done During This Process? Do not plug data for non-response records. They will be imputed during the automated E&I. 38
What Is E & I? Editing Verify that parts add-up to total Ensure that there are no missing values where parts add up to total There must be consistency between related variables Imputation Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed Impute for missing data or partially responded data Impute entire records in the case of total non- response 39
Why Is E&I Necessary? To produce a complete and consistent data file that accounts for all sampled units Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed Correct erroneous responses 40
E&I Terminology Data Group Groupings (defined by SM) of records that will be kept together for imputation purposesGroupings (defined by SM) of records that will be kept together for imputation purposes These groupings are based on multi dimensions:These groupings are based on multi dimensions: industry (NAICS) geography (province) Data groups that will be used for a specific survey will depend on: initial sample design (number of units sampled and the level of stratification used)initial sample design (number of units sampled and the level of stratification used) number of records that respond to the survey (a minimum of 5 or 10 records are required in a data group)number of records that respond to the survey (a minimum of 5 or 10 records are required in a data group) May be changed during production if not enough donors 41
E&I Terminology (continued) Edit Group Grouping of variables within a record that will be processed together in an imputation method Generally edit groups may be defined as follows for most surveys: revenue and expense sections employment section and provincial distribution of goods/services sold Allows for a record to be a donor if it has clean data in one section even when other sections are blank; this increases the donor pool 42
E&I Terminology (continued) Key variables Total operating revenue Total operating expenses Salaries Cost of goods sold 43
The Stages Of The E&I System Pre-processing BANFF E & I System Post-Processing Allocation 44
Preprocessing Deterministic Edits Conditional edits - If A then B Sum of Parts (SOP) Assign 100% to percentage totals Impute reporting period Donor Outlier Detection 45
BANFF E & I System Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses) Impute for other missing variables: Apply Historical Trend Apply Current Year Trend Use donor (for partial imputation), Select a donor for massive imputation for total non-response 46
BANFF Algorithms DIFTREND - Historical trend imputation CURRATIO - Current ratio imputation PREVALUE – Value from the previous period for the same unit is imputed PREAUX – Historical value of a proxy variable for the same unit CURAUX – Current value of a proxy variable for the same unit 47
Post-Processing Prorate components to ensure that they sum exactly to totals Perform a number of consistency checks to ensure that micro-data are valid Assign customer location (percentage cells) Massive Imputation (donor selected during processor but applied in the post-processor) 48
Allocation - Definition & Purpose Definition: Allocation is the distribution of survey and administrative data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame. Purpose: To provide fully-processed micro data on a fiscal year basis, for establishments or locations in-sample for the UES Determine the distribution of value added by province 49
Sample Survey Allocation 50
Post Collection Operations Committee Discuss production issues of common interest Provide status reports on production and production readiness Divisional Production meetings Working group level dealing with production issues relating to a specific subject matter division, including planning and adhoc requests Post Collection Processing Teams Structured by Subject Matter Division to provide the best support and to maximise subject matter expertise Change Management Requests Improvements Service Request Management Portal (SRM) Corrections Managing The UES Post Collection Process 51
Future Directions IBSP (Integrated Business Statistics Project) New and Improved UES, to consolidate and standardise processing for more annual and sub- annual business surveys Start RY2013. To be completed for RY2015 Number of surveys to increase from 60 annual surveys to 120 annual and sub-annual surveys. 52