Metadata used throughout statistics production Max Booleman Statistics Netherlands
Content Recapitulate: functions of metadata Generic Statistical Business Process Model Statistical production cycle Phases of the process
Functions of metadata (1) (once more) Input data + transformation = output data Describing Data Process Quality (data and process)
Functions of metadata (2) (once more) Information for users, producers inside and outside the office What does it mean? (Automatic) Rules for producers inside the office Ex ante vs ex post: What should you do? What did you do?
2-5-2019 Quality Management / Metadata Management 1 Specify Needs 2 Design 3 Build 4 Collect 5 Process 6 Analyse 7 Disseminate 8 Archive 1.1 Determine need for information 1.2 Consult and confirm need 1.3 Establish output objectives 1.5 Check data availability 1.6 Prepare business case 2.1 Design outputs 2.2 Design frame and sample methodology 2.3 Design data acquisition methodology 2.4 Design statistical processing methodology 2.5 Design processing systems and workflow 3.1 Build data collection instrument 3.2 Build process components 3.3 Configure workflows 3.4 Test production system 3.6 Finalize production system 4.1 Select sample 4.2 Set up collection 4.3 Run collection 4.4 Finalize collection 5.1 Integrate data 5.2 Classify and code 5.3 Validate and edit 5.5 Derive new variables and statistical units 5.7 Calculate aggregates 6.1 Prepare draft outputs 6.2 Verify outputs 6.3 Scrutinize and explain 6.4 Apply disclosure control 6.5 Finalize outputs for dissemination 7.1 Update output systems 7.2 Produce dissemination products 7.3 Manage release of dissemination products 7.5 Manage user support 7.4 Promote dissemination products 8.1 Define archive rules 8.2 Manage archive repository 8.3 Preserve data and associated metadata 8.4 Dispose of data and associated metadata 5.6 Calculate weights 1.4 Identify concepts and variables 9 Evaluate 9.1 Gather evaluation inputs 9.2 Conduct evaluation 9.3 Agree action plan 5.4 Impute 3.5 Test statistical business process 5.8 Finalize data files 2-5-2019
Generic Statistical Business Process Model Non linear!!! All kinds of processes: stove pipe, register based International communication Internal communication Generic Tools development
Design phase (1) Develop metadata: Conceptual: tune with users Process: tune with IT, methodology, producers Quality: tune with users and producers Ex ante metadata: what should you do
Design phase (2) The fundamentals NSI’s are describing Reality A model of reality Registrations (input = output)? (target population versus survey population)
Design phase (3) The Statistical Cube: Timeliness by coherence by revision
The statistical cube (1) Single source Multiple source Integrated Before (tendency) Month Quarter Annual Populaties en begrippen vaak verschillend Maar wel te relateren aan elkaar Enkelvoudige bron meestal: inputconcepten=outputconcepten Herkenbaar voor respondenten Kleine correcties Vertelllen wat de bron levert Geïntegreerd meestal: Nieuwe outputconcepten Herkenbaar voor specifieke gebruikers Grotere correcties Theoretisch model Relatie tussen enkelvoudige bron en geïntegreerd: Check kwaliteit enkelvoudige bron
The statistical cube (2) Third dimension: versions, corrections, revisions Multiple indicators Later = more accuracy Later = more coherence Later = more comparability Later: should be ‘better’
The statistical cube (3) past today time tendency monthly quarterly annual integrated
Use of Statistical Cube (1) Coherence: Presentation guide (related indicators) Explanation guide (how does it work) Conceptual differences
Use of Statistical Cube (2) Consistency and Coherence: Quality declaration inside office Quality declaration outside office ‘Allowed’ differences between indicators
Use of Statistical Cube (3) The easy part: Building cubes based on equal concepts Presentation of differences Challenges: Building cubes based on fuzzy relations between concepts Appointments between departments Changing concepts
Pre-input phase Receive data and metadata from the outside world Primary inputs Secondary inputs External terminology External observation units External formats Information layers (paper, files, cd, etc) External metadata, external data formats
Input phase input + process = output Internal data format External metadata Ex post metadata: what did you do Quality metadata: compare ex ante with ex post External metadata, internal data formats
Micro phase input + process = output Internal metadata (re-use metadata) Linking sources Linking to Population base on individual level Editing data Internal metadata, internal data formats
Analytical phase input + process = output Combining data (re-use data) Rising, sum up, etc Indicators, indexes, averages, etc Statistical Quality (confidence intervals, etc)
Output phase Transform into output format: Presenting Statistical tables (data and metadata) Metadata itself Methods Disclosure Special language for external users (external terminology)
Lessons learned (1) Internal and external use of Homonyms and Synonyms Code lists Classifications Populations
Lessons learned (2) The fundamentals NSI’s are describing Reality A model of reality Registrations?