Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.

Similar presentations


Presentation on theme: "Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna."— Presentation transcript:

1 Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna

2 10.02.2004METIS 2004 Geneva2 Contents Metanet Requirements for Models Key Features of the Model Implications for Terminology

3 10.02.2004METIS 2004 Geneva3 METANET 1 A network of excellence funded by EUROSTAT 2000/01 – 2003, 5 work groups WG 1: Methodology and Tools WG 2: Harmonisation of Metadata – Structure and Definitions WG 3: Best Practice for Migration WG 4: Adoption Issues WG 5: Terminology (ad hoc)

4 10.02.2004METIS 2004 Geneva4 METANET 2 Within WG 2 two different approaches: Terminology Model (cf. WP 12) Unified Metadata Architecture for Statistics (UMAS model)

5 10.02.2004METIS 2004 Geneva5 METANET 3 Intention of the UMAS model Statistics deals with different kinds of data, e.g. surveys, registers, classifications, … These data show a dynamic defined by statistical processing activities Define a model which supports besides description of data description of the statistical dynamic

6 10.02.2004METIS 2004 Geneva6 Requirement Analysis 1 Method Requirement analysis is based on Examination of a number of activities in survey processing Documentation of these activities inside statistical systems, in particular proposal of  Banca d’Italia, DDI, OECD, SCB-DOK, SDDS, Statistics Netherlands (Input-Throughput-Output model) General methods for documentation, e.g.  Dublin Core, Facet Classifications, ISO-Standards

7 10.02.2004METIS 2004 Geneva7 Requirement Analysis 2 Example A: Sampling Terminology view Sampling is the process of selecting a number of cases from all the cases in a particular group or universe Operational view Input: Sampling Frame Output: Sample

8 10.02.2004METIS 2004 Geneva8 Requirement Analysis 3 Example A: Sampling Details of operational view What is an appropriate definition for the sampling frame given the problem (e.g. coverage)? What kind of additional information should be available for the sampling frame (e.g. auxiliary variables)? How can we obtain an appropriate representation of the desired sampling frame (e.g. merging existing frames, selecting from existing frames) Who is responsible for the frame in the future?

9 10.02.2004METIS 2004 Geneva9 Requirement Analysis 4 Example A: Sampling Which sampling technique is appropriate for our problem (Note that there are some relations between structure of sampling frame and possible sampling techniques) In which form is the output (i.e. the sample) represented in the system? Who is responsible for the sampling procedure?

10 10.02.2004METIS 2004 Geneva10 Requirement Analysis 5 Example B: Editing Terminology view Editing is the process of detecting and adjusting individual errors in data records resulting from data collection and capture… Operational view Input: A variable together with a set of admissible values for the variable within a specific context Output: A summary statement about quality of the variable or a listing of errors for each case

11 10.02.2004METIS 2004 Geneva11 Requirement Analysis 6 Example B: Editing Details of operational view Context may be defined in various ways  Subject matter considerations, e.g. there is only one person in a household, who can claim to be head of household  Context may be defined by some more technical reasons, e.g. use as measurement unit for annual income 1000€  Context may be defined by pure technical reasons, e.g. “f” for female and “m” for male

12 10.02.2004METIS 2004 Geneva12 Requirement Analysis 7 Example B: Editing Context defines rules for the admissible values of the variable  Within one data set,  Within one infological model (e.g. person-household)  Within a time series Rules may be formulated  As strong constraints, i.e. logical conditions on combination of values  As soft constraints, i.e. statistical conditions on combinations of values Rules have to be processed in algorithmic form and maintained by an administrative procedure

13 10.02.2004METIS 2004 Geneva13 Requirement Analysis 8 Example C: Weighting Terminology view Weight is the importance of an object in relation to a set of objects to which it belongs; …. Operational view Input: A statistical dataset together with appropriate information Output: Statistical dataset augmented by the weight information

14 10.02.2004METIS 2004 Geneva14 Requirement Analysis 9 Example C: Weighting Details of operational view Which subject matter problem should be solved by weighting (e.g. representation of strata, post- stratification, …)? Which procedure should be used for weighting (e.g. base weights, calibration weights, ….)? In which form are the data and the additional information about the population available (e.g. population data as summary table or as register with auxiliary variables)?

15 10.02.2004METIS 2004 Geneva15 Requirement Analysis 10 Example C: Weighting How can we access and combine the different data? Who is responsible for the different datasets? How is the output represented (e.g. as weight for the dataset, as weights for the sampling procedure, as summary table)? Are we interested in reuse of the procedure for new data sets (e.g. the same weighting procedure within a series)?

16 10.02.2004METIS 2004 Geneva16 Requirement Analysis 11 Example D: Analytical Units Terminology view Analytical units represent real or artificially constructed units for which statistics are compiled Operational view Input: Two or more statistical units Output: A new statistical unit

17 10.02.2004METIS 2004 Geneva17 Requirement Analysis 12 Example D: Analytical Units Details of operational view What is the conceptual definition of the statistical units? How are the conceptual definitions captured by operational characteristics (e.g. auxiliary variables)? How can we access and manipulate the operational characteristics in order to produce the new analytical unit? How is the new analytical unit embedded into an existing administrative framework?

18 10.02.2004METIS 2004 Geneva18 Requirement Analysis 13 Summary The examples show that Models should be based on terminology but are more than terminology Models have to consider different types of “statistical” objects For these objects we have to know the concepts represented as data, together with the relations between the concepts We have to know the statistical meaning of the objects, together with their statistical relations

19 10.02.2004METIS 2004 Geneva19 Requirement Analysis 14 Summary We have to take into account the specific format of realisation of the objects as physical datasets We must include statements about responsibility, access rights and other administrative details We need a flexible coupling mechanism for the objects according to processing needs We have to develop a description formalism for statistical processing We have to take into account information requirements of external users

20 10.02.2004METIS 2004 Geneva20 Key Features of the Model 1 In order to meet the different aspects of the requirement analysis a model with four different facets, resembling the idea of facet classifications used by librarians and archivist, was designed

21 10.02.2004METIS 2004 Geneva21 Key Features of the Model 2 “Structure Facet”  The objects of interest, so called “statistical categories”: statistical unit, statistical population, statistical variables, statistical values together with a number of related objects like classifications, statistical datasets,

22 10.02.2004METIS 2004 Geneva22 Key Features of the Model 3 “statistical domains” for coupling objects according to processing needs (basically a system of catalogues for the other objects)  Each instance of the structure has a twofold representation inside a system As data (“Category-Instance data”) As description (“Category-Instance model”, i.e. metadata)

23 10.02.2004METIS 2004 Geneva23 Key Features of the Model 4 “View Facet” describe the instances  “Conceptual point of view” subject matter definition  “Statistical point of view” The statistical properties of the instances necessary for processing  “Data management point of view” All information necessary for machine supported storage and manipulation  “Administrative point of view” Management and bookkeeping of the structures

24 10.02.2004METIS 2004 Geneva24 Key Features of the Model 5 “Stage Facet” describes processing at the data as well as at the metadata level

25 10.02.2004METIS 2004 Geneva25 Key Features of the Model 6  “Production blueprint” Keeps the information how the instance is set up inside the system according to the four different views of the view facet  “Processing blueprint” Describes the processing activities for the instances according to the four different views of the view facet

26 10.02.2004METIS 2004 Geneva26 Key Features of the Model 7 “Function facet”  All aspects of communication and usage of meta-information by humans inside the system as well as in connection with dissemination and exchange Who is involved in communication? What information is communicated? How is the information communicated ?

27 10.02.2004METIS 2004 Geneva27 Implications for Terminology 1 Statistical information systems use terminology from different sources Statistics, Computer Science, Economy, Social Sciences,… We can at best collect terminology and bring it into an order according to some model This ordering defines a “statistical ontology” which is above terminology and has to use few common agreed terms

28 10.02.2004METIS 2004 Geneva28 Implications for Terminology 2 Activities of METANET WG 4 showed that there is rather large agreement between statisticians about the main terms for important structures: Statistical Unit Statistical Variable Statistical Values

29 10.02.2004METIS 2004 Geneva29 Implications for Terminology 3 Based on such agreement we can assign each terminology item a specification according to the ontology UMAS proposes the following classification What is the source of terminology (e.g. statistics, general standards, application area,…)? For which structure applies the term? For which view is the term used? In which processing stage is the term used? Which function aspects covers the term?

30 10.02.2004METIS 2004 Geneva30 Implications for Terminology 4 MCV (SDMX) proposes the following classification Administration (close relation to administrative view) Concepts, Definitions, Standards (close relation to conceptual view and structure) Data Collection, manipulating/accounting convention (close relation to stage facet) Quality and performance metadata (close relation to function facet in connection with dissemination and exchange)

31 10.02.2004METIS 2004 Geneva31 Summary Statistical processing activities define metadata requirements usually not considered in traditional data modelling The processing activities require a model which supports flexible coupling of entities according to processing needs We need besides terminology also specification of the terminology in context of a statistical ontology

32 Thank you! wilfried.grossmann@univie.ac.at


Download ppt "Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna."

Similar presentations


Ads by Google