Download presentation
Presentation is loading. Please wait.
Published byWalter Tate Modified over 9 years ago
1
ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION RESULTS OF STOCKTAKING, CONCLUSIONS OF FIRST YEAR * Pieter Vlag Senior Statistical Researcher – Statistics Netherlands p.vlag@cbs.nl
2
ESSnet DWH: Main conclusions first year 1 Contents Answers on questionnaire Results of visit to Statistics Finland Results of visit to CSO-Ireland Conclusions of the ESSnet DWH - group Implications for work in 2012/2013
3
ESSnet DWH: Main conclusions first year 2 Questionnaire Send to all National Statistical Institutes of the ESS and Switserland 24 NSIs responded Respons is representative (no specific group of countries missing) In interpretation, distinction between questions on opportunities/barriers implementation definition DataWareHouse
4
3 Answers on questionnaire (opportunities/barries) Do you think that the results of this ESSnet are useful for your work ? ESSnet DWH: Main conclusions first year
5
4 Answers on questionnaire (opportunities/barriers) What do/did you see as the main motivation to start DWH in your business statistics systems ? ESSnet DWH: Main conclusions first year > 1 answer per NSI
6
5 Answers on questionnaire (opportunities/barriers) What do you see as the main general methodological barriers to implementing an integrated system ? ESSnet DWH: Main conclusions first year > 1 answer per NSI
7
6 Answers on questionnaire (opportunities/barriers) What do you see as the main technical methodological barriers to implementing an integrated system ? ESSnet DWH: Main conclusions first year > 1 answer per NSI
8
7 Answers on questionnaire (opportunities/barriers) What do you see as the main IT barriers to implementing an integrated system ? ESSnet DWH: Main conclusions first year > 1 answer per NSI
9
8 Answers on questionnaire (implementation) No NSI answers ‘YES’ on all these four questions -Do you have a single coherent system which covers most of your data in the production of business statistics ? -Is your metadata currently integrated into your data systems ? -Is your data input for current needs integrated into your data systems ? -Are your current output requirements integrated into your data systems ? CONCLUSION: No NSI has a finished DWH system ESSnet DWH: Main conclusions first year
10
9 Answers on questionnaire (implementation) On the other hand, the answers suggest that all responding NSIs are at the stage of either considering to develop an integrated datawarehouse system or developing a datawarehouse system or implementing parts of a (prototype) datawarehouse system ESSnet DWH: Main conclusions first year
11
10 Answers on questionnaire (1 st conclusions) NSIs -recognise the opportunities of DWH-systems -consider the high investments, or investment related issues, as most important barrier. -are considering or developing DWH-systems. -mention similar methodological and IT-issues -expect “sharing knowledge and experiences” as outcome from this ESSnet. Hence, Business Case for this ESSnet ESSnet DWH: Main conclusions first year
12
11 Answers on questionnaire (definition of a DWH) In questionnaire two extremes presented -Data model -Process model ESSnet DWH: Main conclusions first year
13
12 Questionnaire (‘process model’ DWH) ESSnet DWH: Main conclusions first year In the “process” model perspective, the DWH is primarily a set of databases to store the data between the statistical data-processing steps. Statistical processing (weighting, consistency) is done outside. The DWH system is not primarily designed to produce flexible output, but more intended to harmonise the statistical processes.
14
13 Questionnaire (‘data model’ DWH) ESSnet DWH: Main conclusions first year In the “data model” model perspective, the DWH is primarily a unit for storing, processing and linking all available data, irrespective of where they have come from or where they are going to. Data acquisition is driven by availability of sources; output production is driven by availability of data in the store. Business registers and metadata have are even more important in these model than in regular statistical processes, because they are essential for storing, processing, linking and flexible outputdata.
15
14 Answers on questionnaire (definition of a DWH) -How would you describe your single conceptual approach ? ESSnet DWH: Main conclusions first year
16
15 Answers on questionnaire (definition of a DWH) But, answers on this question in conflict with -follow-up inquiries -follow-up visits HENCE, -presented models were multi-interpretative -a straighter definition of a statistical DWH system was needed. ESSnet DWH: Main conclusions first year
17
16 Main conclusion from visit to Statistics Finland (figure) ESSnet DWH: Main conclusions first year Input I Input II Processing base Actual DWH Output I Output II Ect. integrated stat. data
18
17 Main conclusion from visit to Statistics Finland (in words) The Statistical DataWareHouse consists of two parts: A processing (data)base in which all used input data are processed and integrated. A publication (data)base, used for (micro)analyses and calculation of the aggregates (for publication). *Data are transferred to the publication base after they have been approved in the processing database. In contrast to the DataWareHouse concept at commercial enterprises, the processing part is much more emphasized at NSIs ESSnet DWH: Main conclusions first year
19
18 Main conclusion from visit to CSO-Ireland (figure) ESSnet DWH: Main conclusions first year Input I Input II Proc. base Actual DWH Output I Output II Ect. integrated stat. data Proc. base Proc. base Architecture for data processing (depending on data ?)
20
19 Main conclusion from visit to CSO-Ireland (in words) CSO has two integrated processing systems: -An older one, in which data are stored after each processing step. This system is used for survey data. -A newer one (to be implemented), in which admin data are stored one time after performing all processing steps. A reason for reducing the number of data storages might be related to a less extensive data cleaning for admin data. Hence, nature of the data (survey or admin data) might be a factor when defining a business architecture for the integrated processing system. ESSnet DWH: Main conclusions first year
21
20 Main conclusion of the ESSnet DWH group (in figure) ESSnet DWH: Main conclusions first year Input I Input II Imp. + aggr Actual DWH Output I Output II Ect. Integrated Out of scope Stat BR (pop. frame) clean ing Processing issues DWH Confidentiaiity issues Integrated systems
22
21 General conclusion of the ESSnet DWH group (in words) A Statistical DataWareHouse consists of two parts: Part I A processing phase in which statistical input data are -at a 1st stage linked to the Business Register - at 2nd stage cleaned (between data source) -at a 3rd stage made consistent between the sources by imputing missing data and correcting for inconsistencies between the sources before being transferred to the actual DataWare House.. ESSnet DWH: Main conclusions first year
23
22 General conclusion of the ESSnet DWH group (in words) A Statistical DataWareHouse consists of two parts: Part II An actual DataWareHouse from which flexible aggregated and microdata, meant for output, can be generated. These generated aggregated and microdata themselves do not belong to the Statistical DataWareHouse System. The data in this DataWareHouse are completely integrated, interpretation of (the quality of) these data should theoretically be independent of the input source Part II is more recognisable for commercial enterprises. ESSnet DWH: Main conclusions first year
24
Main conclusion of the ESSnet DWH group (static SBR or SBR integral part of DWH) ESSnet DWH: Main conclusions first year Input I Input II Imp. + aggr Actual DWH Output I Output II Ect. Integrated data SBRSBR clean ing SBR preferably integral part: Feedback from oth. Sources But with moderation feedback
25
24 Main conclusion of the ESSnet DWH group (metadata) ESSnet DWH: Main conclusions first year Input I Input II Imp. + aggr Actual DWH Output I Output II Ect. Integrated data SBRSBR clean ing Confidentaility issues InputDescr.InputDescr. Process (step) descr.(output)var. descr.
26
Revenue Agency Chambers Commerce Survey N Survey 1 SBR Customs Agency Employe es Data Staging Data SBR Domains EstimationUnivers/Cenus Primary Micro Data Staging Data Data Mart Alimentation: -Extraction -Transformation -Loading Sources Layer Integration Layer Data Access Layer Interpretation and data analysis layer Meta Data Institutiona l Output Dashboar ds AnalysisReporting Data Mining Independent process Integrated systems Actual DWH Relationship with DWH-Architectural models (e.g.Kimball)
27
26 Implication for work 2012/2013 Metadata - Fitting statistical DWH in current metadatamodels. - Keep it manageable ! Methodology -Fitting current (ESSnet) methodology into stat.DWH 1. data-linking & feedback to BR. 2. (selective ?) editing + (repeated) weighting 3. data confidentiality IT and Architecture - Fitting ‘methodology’ into ‘adapted GSBPM-model - Relating ‘adapted’ GSBPM to Stat. DWH Architecture. ESSnet DWH: Main conclusions first year
28
27 Summary Business Case for ESSnet DWH present Questionnaire: Sorry for confusing DWH-model extremes. Visits to Finland and Ireland useful for feedback/ideas ect. Statistical DWH model developed, consisting of -part 1: integrated systems -part 2: actual DataWareHouse Statistical DWH <> ‘commercial DWs, as more emphasizes on part 1 Actions defined for 2012/2013 on metadata methodology IT and Architecture
29
Statistical System in the Netherlands 28 Thank you for your attention! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.