Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.

Slides:



Advertisements
Similar presentations
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Advertisements

Metadata to Support the Survey Life Cycle Alice Born, Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Geneva,
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
United Nations Statistics Division Principles and concepts of classifications.
OASIS Reference Model for Service Oriented Architecture 1.0
Software Requirements
1 CES IASSIST 2002, June 2002 University of Connecticut MetaNet: Standardising Statistical Metadata Methodology Karen Brannen University of Edinburgh,
Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
The Statistical Metadata System: its role in a statistical organization Jana Meliskova Joint UNECE / Eurostat / OECD Work Session on Statistical Metadata.
WP.5 - DDI-SDMX Integration
Copyright 2010, The World Bank Group. All Rights Reserved. Integrating Agriculture into National Statistical Systems Section A 1.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Metadata management and statistical business process at Statistics Estonia Work Session on Statistical Metadata (Geneva, Switzerland 8-10 May 2013) Kaja.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
M ETADATA OF NATIONAL STATISTICAL OFFICES B ELARUS, R USSIA AND K AZAKHSTAN Miroslava Brchanova, Moscow, October, 2014.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Copyright © 2013 Curt Hill The Zachman Framework What is it all about?
1 Item 7: National Accounts And Employment Data Using Employment Statistics in the Russian National Accounts Alexander Surinov Deputy Head of Rosstat Joint.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Recent Developments of the OECD Business Tendency and Consumer Opinion Surveys Portal coi/coordination
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
Technical Overview of SDMX and DDI : Describing Microdata Arofan Gregory Metadata Technology.
Statistics Sweden Results from operations in 2006: 146 publications 356 press releases commissions 3,7 million visitors at
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
Joseph Lukhwareni Statistics South Africa Reengineering projects focusing on metadata and the statistical cycle Statistics South Africa, South Africa 3-5.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing processes Core business of the NSO Part 1 Strengthening Statistics Produced in Collaboration.
Towards a Process Oriented View on Statistical Data Quality Michaela Denk, Wilfried Grossmann.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
Elaborating on the Business Architecture of SN Robbert Renssen Statistics Netherlands Standard Process Steps.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
Relationship between Short-term Economic Statistics Expert Group Meeting on Short-Term Statistics February 2016 Amman, Jordan.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS Istituto Nazionale di Statistica ITALY.
Metadata models to support the statistical cycle: IMDB
Topic 2 (ii) Metadata concepts, standards, models and registries
DATA MODELS.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
2. An overview of SDMX (What is SDMX? Part I)
Metadata flows within the Mexican technical norm for generation of basic statistics Eric Rodriguez.
2. An overview of SDMX (What is SDMX? Part I)
Max Booleman Statistics Netherlands
Metadata use in the Statistical Value Chain
Presentation to SISAI Luxembourg, 12 June 2012
Chapter 2 Database Environment Pearson Education © 2009.
2.7 Annex 3 – Quality reports
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

METIS 2004 Geneva2 Contents Metanet Requirements for Models Key Features of the Model Implications for Terminology

METIS 2004 Geneva3 METANET 1 A network of excellence funded by EUROSTAT 2000/01 – 2003, 5 work groups WG 1: Methodology and Tools WG 2: Harmonisation of Metadata – Structure and Definitions WG 3: Best Practice for Migration WG 4: Adoption Issues WG 5: Terminology (ad hoc)

METIS 2004 Geneva4 METANET 2 Within WG 2 two different approaches: Terminology Model (cf. WP 12) Unified Metadata Architecture for Statistics (UMAS model)

METIS 2004 Geneva5 METANET 3 Intention of the UMAS model Statistics deals with different kinds of data, e.g. surveys, registers, classifications, … These data show a dynamic defined by statistical processing activities Define a model which supports besides description of data description of the statistical dynamic

METIS 2004 Geneva6 Requirement Analysis 1 Method Requirement analysis is based on Examination of a number of activities in survey processing Documentation of these activities inside statistical systems, in particular proposal of  Banca d’Italia, DDI, OECD, SCB-DOK, SDDS, Statistics Netherlands (Input-Throughput-Output model) General methods for documentation, e.g.  Dublin Core, Facet Classifications, ISO-Standards

METIS 2004 Geneva7 Requirement Analysis 2 Example A: Sampling Terminology view Sampling is the process of selecting a number of cases from all the cases in a particular group or universe Operational view Input: Sampling Frame Output: Sample

METIS 2004 Geneva8 Requirement Analysis 3 Example A: Sampling Details of operational view What is an appropriate definition for the sampling frame given the problem (e.g. coverage)? What kind of additional information should be available for the sampling frame (e.g. auxiliary variables)? How can we obtain an appropriate representation of the desired sampling frame (e.g. merging existing frames, selecting from existing frames) Who is responsible for the frame in the future?

METIS 2004 Geneva9 Requirement Analysis 4 Example A: Sampling Which sampling technique is appropriate for our problem (Note that there are some relations between structure of sampling frame and possible sampling techniques) In which form is the output (i.e. the sample) represented in the system? Who is responsible for the sampling procedure?

METIS 2004 Geneva10 Requirement Analysis 5 Example B: Editing Terminology view Editing is the process of detecting and adjusting individual errors in data records resulting from data collection and capture… Operational view Input: A variable together with a set of admissible values for the variable within a specific context Output: A summary statement about quality of the variable or a listing of errors for each case

METIS 2004 Geneva11 Requirement Analysis 6 Example B: Editing Details of operational view Context may be defined in various ways  Subject matter considerations, e.g. there is only one person in a household, who can claim to be head of household  Context may be defined by some more technical reasons, e.g. use as measurement unit for annual income 1000€  Context may be defined by pure technical reasons, e.g. “f” for female and “m” for male

METIS 2004 Geneva12 Requirement Analysis 7 Example B: Editing Context defines rules for the admissible values of the variable  Within one data set,  Within one infological model (e.g. person-household)  Within a time series Rules may be formulated  As strong constraints, i.e. logical conditions on combination of values  As soft constraints, i.e. statistical conditions on combinations of values Rules have to be processed in algorithmic form and maintained by an administrative procedure

METIS 2004 Geneva13 Requirement Analysis 8 Example C: Weighting Terminology view Weight is the importance of an object in relation to a set of objects to which it belongs; …. Operational view Input: A statistical dataset together with appropriate information Output: Statistical dataset augmented by the weight information

METIS 2004 Geneva14 Requirement Analysis 9 Example C: Weighting Details of operational view Which subject matter problem should be solved by weighting (e.g. representation of strata, post- stratification, …)? Which procedure should be used for weighting (e.g. base weights, calibration weights, ….)? In which form are the data and the additional information about the population available (e.g. population data as summary table or as register with auxiliary variables)?

METIS 2004 Geneva15 Requirement Analysis 10 Example C: Weighting How can we access and combine the different data? Who is responsible for the different datasets? How is the output represented (e.g. as weight for the dataset, as weights for the sampling procedure, as summary table)? Are we interested in reuse of the procedure for new data sets (e.g. the same weighting procedure within a series)?

METIS 2004 Geneva16 Requirement Analysis 11 Example D: Analytical Units Terminology view Analytical units represent real or artificially constructed units for which statistics are compiled Operational view Input: Two or more statistical units Output: A new statistical unit

METIS 2004 Geneva17 Requirement Analysis 12 Example D: Analytical Units Details of operational view What is the conceptual definition of the statistical units? How are the conceptual definitions captured by operational characteristics (e.g. auxiliary variables)? How can we access and manipulate the operational characteristics in order to produce the new analytical unit? How is the new analytical unit embedded into an existing administrative framework?

METIS 2004 Geneva18 Requirement Analysis 13 Summary The examples show that Models should be based on terminology but are more than terminology Models have to consider different types of “statistical” objects For these objects we have to know the concepts represented as data, together with the relations between the concepts We have to know the statistical meaning of the objects, together with their statistical relations

METIS 2004 Geneva19 Requirement Analysis 14 Summary We have to take into account the specific format of realisation of the objects as physical datasets We must include statements about responsibility, access rights and other administrative details We need a flexible coupling mechanism for the objects according to processing needs We have to develop a description formalism for statistical processing We have to take into account information requirements of external users

METIS 2004 Geneva20 Key Features of the Model 1 In order to meet the different aspects of the requirement analysis a model with four different facets, resembling the idea of facet classifications used by librarians and archivist, was designed

METIS 2004 Geneva21 Key Features of the Model 2 “Structure Facet”  The objects of interest, so called “statistical categories”: statistical unit, statistical population, statistical variables, statistical values together with a number of related objects like classifications, statistical datasets,

METIS 2004 Geneva22 Key Features of the Model 3 “statistical domains” for coupling objects according to processing needs (basically a system of catalogues for the other objects)  Each instance of the structure has a twofold representation inside a system As data (“Category-Instance data”) As description (“Category-Instance model”, i.e. metadata)

METIS 2004 Geneva23 Key Features of the Model 4 “View Facet” describe the instances  “Conceptual point of view” subject matter definition  “Statistical point of view” The statistical properties of the instances necessary for processing  “Data management point of view” All information necessary for machine supported storage and manipulation  “Administrative point of view” Management and bookkeeping of the structures

METIS 2004 Geneva24 Key Features of the Model 5 “Stage Facet” describes processing at the data as well as at the metadata level

METIS 2004 Geneva25 Key Features of the Model 6  “Production blueprint” Keeps the information how the instance is set up inside the system according to the four different views of the view facet  “Processing blueprint” Describes the processing activities for the instances according to the four different views of the view facet

METIS 2004 Geneva26 Key Features of the Model 7 “Function facet”  All aspects of communication and usage of meta-information by humans inside the system as well as in connection with dissemination and exchange Who is involved in communication? What information is communicated? How is the information communicated ?

METIS 2004 Geneva27 Implications for Terminology 1 Statistical information systems use terminology from different sources Statistics, Computer Science, Economy, Social Sciences,… We can at best collect terminology and bring it into an order according to some model This ordering defines a “statistical ontology” which is above terminology and has to use few common agreed terms

METIS 2004 Geneva28 Implications for Terminology 2 Activities of METANET WG 4 showed that there is rather large agreement between statisticians about the main terms for important structures: Statistical Unit Statistical Variable Statistical Values

METIS 2004 Geneva29 Implications for Terminology 3 Based on such agreement we can assign each terminology item a specification according to the ontology UMAS proposes the following classification What is the source of terminology (e.g. statistics, general standards, application area,…)? For which structure applies the term? For which view is the term used? In which processing stage is the term used? Which function aspects covers the term?

METIS 2004 Geneva30 Implications for Terminology 4 MCV (SDMX) proposes the following classification Administration (close relation to administrative view) Concepts, Definitions, Standards (close relation to conceptual view and structure) Data Collection, manipulating/accounting convention (close relation to stage facet) Quality and performance metadata (close relation to function facet in connection with dissemination and exchange)

METIS 2004 Geneva31 Summary Statistical processing activities define metadata requirements usually not considered in traditional data modelling The processing activities require a model which supports flexible coupling of entities according to processing needs We need besides terminology also specification of the terminology in context of a statistical ontology

Thank you!