1 1 International Collaboration on Industrialization of Editing: Business Case (Part 1, WP38) Li-Chun Zhang Statistics Norway
2 Industrialization of Editing: Some issues to be dealt with Overall objective, principles and guidelines (e.g. the “new” paradigm of editing) Conceptual reference framework with regard to GSBPM Conceptual reference framework with regard to GSIM to-be Design of generic functionality Minimum set of standard methods IT tools and platforms
3 Objectives & principles Example: Objectives (the “new” paradigm) –Error-source identification and error prevention –Collect information about quality –Identification and adjustment of critical errors in data Example: Objectives (SNZ proposal) –Efficiency as quality against cost –Continuous quality improvement –Provide quality information Example: Principles –Original data as much as possible (“old” Felligi-Holt paradigm) –Maximum automated processing –Analysis of (editing) process efficiency –Training, documentation –…
4 Generic Statistical Data Editing Process (GSDEP) GSBPM ≠ Flow Chart An example from EDIMBUS Mapping GSDEP with GSBPM –Micro vs. macro editing –“Editing & Imputation” (E&I) vs. “Editing & Estimation” (E&E) Connections to GSIM to-be
5 Common Statistical Data Reference (CSDR): Interface btw. SDE and GSIM to-be Statistical production as transformations of data => steady / major states of data Common Micro Data Format for database management Common Functional Data Format for method library
6 Design of generic functionality Databases –Micro database of CMDF data files (M-Base) –Functional database of functional data files and alignment tables (F-Base) –Function library (F-Lib) contains all available standardized generic (program) tools. Builders –Functional data builder (D-Build) transforms relevant CMDF data files into the required functional data files, and updates the relevant alignment tables. –Function builder (F-Build) takes functional data files as the input data and tools from the F-Lib, and configures the necessary parameters according to a given specification for machine-based or automated data processing. –Screen builder (S-Build) takes fnctional and/or CMDF data files as the input data, and configures an environment for manual inspection/editing of individual records/questionnaires according to a given specification. Runners: –Batch processor is the environment for executing automated/machined-based SDE processes, chiefly relying on functions that are configured in the F-Build. –Manual processor is the environment for manually executing SDE processes, chiefly relying on the interface provided through the S-Build. –Selection and Drilling are the dedicated environments for carrying out selective editing and drilling up-and-down among hierarchically structured aggregations. –Data processor supports the necessary administration of data and metadata. Managers: –ANOPE is the environment for quality assessment of the editing processes. –Response manager provides the interface for re-contact with the data providers, and other generally related production processes (such as Process 4 Collect).
Claude Poirier Statistics Canada Next steps Objectives, guidelines and principles –Finalize user requirements –Identify existing methods –React to functional gaps –Set up the framework –Develop the toolset –Deliver training 7
Finalizing user requirements Prioritizing edit and imputation requirements –Micro-editing methods Automated E&I on numerical and categorical data –Macro-editing methods Selective editing; Macro editing; Editing of macro data –On-line editing Collection edits and self-administered edits –Data confrontation and certification Methods using multiple data sources –Standardized platform Common architecture 8
Existing tools and Platforms Identifying and analysing existing products –SigEE (Australia) –BANFF, CANCEIS (Canada) –BEST, POSS (New Zealand) –ISEE, DYNAREV (Norway) –TRITON, SELEKT (Sweden) 9
Reacting to functional gaps Not all requirements will be satisfied Brainstorming sessions are being organised Development priorities will be discussed Developing the tool set Consolidate preferred tools –Adapt existing tools to the environment –Develop pre/post processors to fit the environment Develop missing functions 10
Delivering training material User guide Methodology documentation System documentation Comments / Questions It’s your turn 11
Frequently asked questions (FAQ) Q1:What governance model drives the project? Q2:When do we expect the suite of editing functions to be delivered? Q3:As a member of the collaboration network, will my agency have to pay any fees for accessing and using released functions? Q4:My statistical agency is not part of the network. Are there any fees that are planned to let me use the products? Q5:My agency would like join the network. Is this possible? How? 12
Frequently asked questions (FAQ) Q6:I understand from your presentation that a common environment is being planned? Would I be able to use the functions in another environment? Q7:My agency is willing to share a system but its foundation software is not compliant with the proposed environment. What will happen? Q8:My agency is willing to offer a system or a module for the network. Who will own the module? Q9:Will the resulting products become open-source? 13