The implementation of a more efficient way of collecting data

Slides:



Advertisements
Similar presentations
Copyright © 2006 Data Access Technologies, Inc. Open Source eGovernment Reference Architecture Approach to Semantic Interoperability Cory Casanave, President.
Advertisements

10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing at STC MSIS 2007 Geneva, May 8-10, 2007 Karen Doherty Director General Informatics Branch Statistics Canada.
Metadata management and statistical business process at Statistics Estonia Work Session on Statistical Metadata (Geneva, Switzerland 8-10 May 2013) Kaja.
Web-Enabled Decision Support Systems
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
CONCEPTUAL MODELLING OF ADMINISTRATIVE REGISTER INFORMATION AND XML - TAXATION METADATA AS AN EXAMPLE Ottawa, May 2005.
Introduction to MDA (Model Driven Architecture) CYT.
Baba Piprani (SICOM Canada) Robert Henkel (Transport Canada)
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
© 2007 by Prentice Hall 1 Introduction to databases.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Andrew S. Budarevsky Adaptive Application Data Management Overview.
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
A modular metadata-driven statistical production system The case of price index production system at Statistics Finland Pekka Mäkelä, Mika Sirviö.
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
United Nations Economic Commission for Europe Statistical Division CSPA: The Future of Statistical Production Steven Vale UNECE
Metadata models to support the statistical cycle: IMDB
Object Management Group Information Management Metamodel
Database Management:.
CGS 2545: Database Concepts Fall 2010
Redesigning French structural business statistics, using more administrative data ICESIII, Montréal, june 2007.
General principles in building a predictive model
Statistical Data Analysis
UNH Programming Assistance Center Automation
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
SDMX Information Model
The Extensible Tool-chain for Evaluation of Architectural Models
Generic Statistical Business Process Model (GSBPM)
YTY − an integrated production system for business statistics
IST 318 Database Administration
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
ESSnet on Data Warehousing 4th Workshop Maia Ennok 20th. of March 2013
2. An overview of SDMX (What is SDMX? Part I)
Domestic extraction of mineral raw materials
Integrated Statistical Information System (ISIS) in Croatia By Maja Ledić Blažević, Senior Advisor, Research & Development Dept. and Branka Cimermanović,
Canada’s trade in services by industry
Chapter 8: Weighting adjustment
2. An overview of SDMX (What is SDMX? Part I)
NewCronos what policy and architecture contents consultation evolution
Metadata Framework as the basis for Metadata-driven Architecture
Max Booleman Statistics Netherlands
SDMX Information Model: An Introduction
Daniela Stan Raicu School of CTI, DePaul University
Social Research Methodology and Supplementary Documentation John Kallas University of the Aegean, Department of Sociology.
DEVELOPMENT OF IMPUTATION MODEL FOR SMALL ENTERPRISES
Database Design Hacettepe University
Metadata The metadata contains
Statistical Data Analysis
Data Warehousing Concepts
Sampling and estimation
The European Statistical Training Programme (ESTP)
The Database Environment
The Normal Distribution
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
Chapter 4 (cont.) The Sampling Distribution
Introduction to reference metadata and quality reporting
Software Architecture & Design
SDMX IT Tools SDMX Registry
Presentation transcript:

The implementation of a more efficient way of collecting data Prof. Ene-Margit Tiit Heli Jaago Senior Methodologist Leading Methodologist Methodology department Methodology Department

Our general aims: Development of estimation methods for SBS investments and financial leasing variables; Elaboration of target for data load programs (Extract, Transform, and Load process for data warehouses –ETL) Creating and developing of warehouses metadata (repository) and normalized data model for data collection from businesses. 3/12/18

Project overview (1) analysis and technical specifications of administrative data sources general description of Metadata Framework (data about describing of data) data warehouse dictionary - metadata normalized data model as the main structure for administrative data repository data load programs (Extract, Transform, and Load process for data warehouses -ETL) 3/12/18

Project overview (2) The improvement of the existing estimation methods for the production of the SBS investment and financial leasing variables. Creation of statistical models for estimating the distribution of investments in small enterprises (with less than 20 persons) The increased usage of administrative data for small enterprises with help of statistical models created. As a result the need for collecting empirical data will decrease. 3/12/18

The project is divided into two sub-projects development of estimation methods for SBS investments and financial leasing variables Project leader Prof. Ene-Margit Tiit) creating of metadata warehouse (repository) of collected (meta)data Project leader leading methodologist Heli Jaago 3/12/18

Prof. Ene-Margit Tiit Main target of the project: To develop the estimation methods for the SBS investment and financial leasing variables, i.e to create the statistical models for the breakdown of investments by kind of fixed assets. In the project we consider very small (employed <10 persons) and small (employed 10—19 persons) enterprises in areas of manufacturing, construction, trade and real estate.

Data For solving the task we have a sample gathered in years 2000—2006, totally 18 235 enterprises (about 3000 per year). In general, each enterprise represents 4—5 enterprises of the total population. From administrative sources it is possible to get information about their gross investments (E15000).

The components of investments to be estimated are the following: E15120 Gross investment in land E15130 Gross investment in existing buildings and structures E15140 Gross investment in construction and alteration of buildings E15150 Gross investment in machinery and equipment E15440 Gross investment in intangible goods.

To estimate the structure of investment the following ratios were calculated: y1= E15120/E15000, y2= E15130/E15000, y3= E15140/E15000, y4= E15150/E15000, y5= E15440/E15000 for each unit of the sample, where 0 ≤ yi ≤ 1; (1) y1 + y2 + y3 + y4 + y5 = 1. (2) The task is to create a model for the 5-dimensional vector Y = (y1, y2, y3, y4, y5).

Grouping the data The enterprises are grouped into 8 assumingly homogeneous groups. Size of enterprise <10 10—19 Manufacturing 2883 2086 Construction 983 889 Trade 4683 2346 Real estate 4021 344 In each group the structure of investments is somewhat different, soit is possible that the models have different parameters in different groups.

Possible types of model In principle, there are two main ways to create a model: Regression-type or prognostic models, where the values of the predictable variable are calculated by known values of explanatory or background variables. Simulation models, where the model consists of random numbers having the same distribution as the predictable variable or vector.

Choice between the types of model The regression-type models are useable only in the case when there exist statistical dependencies between measurable explanatory variables and predictable variables. In this task all dependencies between investment’s ratios and background variables (year, size of investment, number of employees etc) and also the description rates R2 of models were quite small, see the following Figure.

Average description rate of regression models for investment’s ratio was 6,1%

Creating a simulation model As the description rate of regression-type models is too small, it is rational to use simulation models. That means, it is necessary to create a series of 5-dimensional random vectors Y = (y1, y2, y3, y4, y5) the distribution that is having similar to the empirical distribution of the sample. The starting point is studying the structure of investments in the sample

Structure of investments in an enterprise It became evident that in most cases small enterprises have concentrated their investments into one sphere, that means, in many cases most components of the investments vector Y equal to zero. From all 31 possible combinations (5 single-component structures, 10 two-component structures, 10 three-component structures, 5 four-component structures and one structure with all 5 non-zero components about ¾ formed one-component structures, where the only non-zero ratio was equal to one, see the following Table.

The most frequent combinations of investments machinery 68,72 Construction, machinery 9,08 77,8 Buildings, machinery 3,66 81,46 Land, machinery 3,04 84,5 Construction 2,77 87,27 Machinery, intangible goods 2,36 89,63 Buildings 1,73 91,36 Land, construction, machinery 1,69 93,05 Land 1,44 94,49 Land, buildings, machinery 1,25 95,74 Buildings, construction, machinery 1,03 96,77

Modeling the structure of investments The simulation consists of 2 modelling steps. The structure of a random vector is modelled (using multinomial distribution and empirical probabilities); If the vector contains only one non-zero component, then this component equals to 1 and the others are equal to 0; If the vector contains several non-zero components, then their values are simulated (using either Normal, Beta or Uniform distribution) checking the conditions (1) and (2) are fulfilled; The parameters of these distributions are estimated by sample data.

Checking the model The model was checked using the data from 2007. The distributions of simulated data and empirical data were compared, the results were satisfactory.

Heli Jaago (Leading methodologist) * Over repeat– what is Metadata?* Metadata – data about data Concepts. Definitions.Data Processing rules.References. Classifications (description of structure, versions) XML-based ontologies (XBRL, SDMX, HL7 etc) Workflow descriptions Data model specifications Informational system specification 3/12/18

Metadata without metadata models

META-METAMODEL allows to describe the metamodel METAMODEL describes the data model objects and their relationships between DATA MODEL describes the data and the links between DATA describe a real-world objects

MMX Framework Architectural solution to the metadata Interpretation of Statistics Estonia MMX Framework Architectural solution to the metadata Neuchâtel model Allows the description of statistical acitivities Description of the survey sample, time-period, collected variables, statistical indicators Survey data Number, values, decimal points,..

Neuchâtel model variable system Key words: Subject area Concept family, conceptual variable Statistical activity, statistical activity instance Statistical unit type Statistical characteristic Variable (contextual variable) Measurement unit type 3/12/18

User interface- MMXMetadata Navigator This is video, wait a little bit or – click F5 on your keyboard if it doesn`t move 3/12/18

Related objects in Metadata Navigator This is video, wait a little bit or – click F5 on your keyboard if it doesn`t move 3/12/18

MMX Metadata Navigator Key points: Completely metadata driven Metamodel is not hard-coded into application New metamodels (classifications, workflows, etc) require no changes in the application Always in context of a single metadata object Full context of the object (details, properties, relations) is visible Simple navigation links to any related object via a single click 3/12/18

Why Metadata Repository and not Wiki? Structured not based on free text: formalized, can be queried Capturing of constraints and business rules possible Associations carry rich semantics and are not merely navigational links Can be linked to other systems (SQL for database apps, RDF etc. For semantic web apps) 3/12/18

What is MMX? Data layer (data model, database objects and APIs) Metamodels, methodology for creating and implementing them in data layer Technological stack (ORM, Application server, AJAX, ...) for creating Web applications based on data layer and metamodels Experience in creating and deploying such applications in specific customer environments 3/12/18

3/12/18