Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

Similar presentations


Presentation on theme: "ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands."— Presentation transcript:

1 ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands

2 1 Outline of the presentation DataWareHouse and importance population frame relationship population frame - business register -(default) target population, statistical units other crucial datasources: “backbones” -turnover + employment datalinking : the statistical unit base conflicting information between datasources - when correcting in statistical DWH - when correcting in backbones - when feedback to business register ESSnet DWH – business register

3 2 Definition of a statistical Datawarehouse (according to the FPA) ESSnet DWH – business register The broad definition of a data warehouse to be used in this ESSnet is therefore: ‘A common conceptual model for managing all available data of interest, enabling the NSI to (re)use this data to create new data/new outputs, to produce the necessary information and perform reporting and analysis, regardless of the data’s source.’

4 3 A DataWarehouse: the general idea ESSnet DWH – business register As staging area is “core business” for NSIs, term statistical DWH is used for staging area + WareHouse

5 4 The statistical DataWarehouse: architecture and layers ESSnet DWH – business register

6 5 The statistical DataWareHouse: processing steps the GSBPM model ESSnet DWH – business register process input DWH / int. data 5.1a: link data 5.1b: integrate data see presentation Fursova Calculate aggregates

7 Titel van de presentatie 6 datasource 1datasource 2datasource 3 Output 1Output 2 Output 3 Linking Processing (integration layer) Integrated data p.analyse 4 GSBPM -step 5.1 5.2 - 5.6 5.7 6 7

8 7 A datawarehouse without population frame ESSnet DWH – business register Datasource I: Admin data Datasource I: Survey 1 Datasource I: Survey 2 Datasource I: BIG DATA different sources cover different enterprises -> information about ? timing of availability sources differs -> when complete desc. available ?

9 8 A Datawarehouse with a population frame ESSnet DWH – business register Population. Datasource 1: admin data 1 Datasource 2: BIG DATA Datasource 3: survey 1 Datasource 4: survey 2 ADVANTAGE: the coverage of DWH is known (e.g. which enterprises are included in a DWH)

10 9 Units and target population The population should be known for the datawarehouse;e.g. “about which enterprises info” its preparation phase ;e.g. when linking data sources Challenges are: units may differ between the data sources - decision: which unit used for linking what is the reference population -decision: how is the default target population defined ESSnet DWH – business register

11 10 Proposals Only statistical unit (=enterprise) is used -for data-linking -in processing phase of the statistical - DWH -justification: most obvious, ESSnet on Consistency, maintenance Default target population : all enterprises with economic activity in reference period (e.g. year) -justification: SBS-regulation -widest definition of enterprises from which flexible outputs for subpopulations can be derived -term default is used: as subpopulations do have a target population, too ESSnet DWH – business register

12 Titel van de presentatie 11 4 GSBPM -step 5.1 5.2 - 5.6 5.7 6 7 Linked data Integrated data Processing on stat. unit + default target population only flexible datasources with different populations and units Weighting to flexible pop. flexible output for different populations, and units

13 12 Population frame and the Business Register Determination of the default target population in SDWH in 2 steps: the population frame, i.e. a list of enterprises with a certain kind of activity during a period. confirmation which enterprises of the list really performed economic activities during a period The business register provides information for the population frame. Therefore, the statistical Business Register is an indirect datasource for the statistical-DWH ESSnet DWH – business register

14 13 Information needed from stat.business register Recommended information for the population frame : the frame reference year the statistical enterprises unit, including national ID and EGR ID the name and address of the enterprise the national identification number (ID) of the enterprise the date in population (mm/yr) the date out of population (mm/yr) the NACE-code the institutional sector code a size class ESSnet DWH – business register

15 14 Other backbones ESSnet AdminData: VAT and social security admin almost complete for quarter and annual can be used for high-quality estimates for turnover + employment respectively. ESSnet DWH: VAT and social security data are crucial to confirm the activity status of enterprises implictly to determine the default target population to integrate data suitable for flexible outputs measurement errors are reduced of sample survey (or data about subpopulation) if weighting to pop.numbers + VAT-turnover + employment Proposal: to include these admin data as backbones in a stat-DWH ESSnet DWH – business register

16 15 Source layer Int. + Analyses layer Access layer Integration layer SBR Pop-frame data 1data 2 VATempl. GSBPM 5.1: link & integrate GSBPM 5.2-5.6: “process” GSBPM 5.7-5.8: calculate aggregates Check processing GSBPM 6: analyse / “DATAWAREHOUSE” GSBPM 7-9: disseminate Backbones in a statistical-DWH Backbones are crucial for data-linking and data-integration; -> need to be checked/cleaned by source in the source layer

17 16 Source layer Int. + Analyses layerIntegration layer SBR Pop-frame data 1data 2 VAT empl. GSBPM 5.1: link & integrate GSBPM 5.2-5.6: “process” GSBPM 5.7-5.8: calculate aggregates Check processing GSBPM 6: analyse GSBPM 7-9: disseminate Observed: admin data incorporated in BR When choosing this option, - important part of linking process outside the S-DWH - unless S-DWH integral part of S-DWH (maintenance ?)

18 17 Determining default target population ESSnet DWH – business register If statistical-DWH covers annual statistics only relatively straightforward - derive population frame from business register at the end of reference year t -determine active or non-active as soon as VAT and/or employment data become available If STS included in statistical-DWH more complicated: -updating necessary !

19 18 Updating population ESSnet DWH – business register

20 19 SBR Pop-frame data 1 data 2 VATempl. GSBPM 5.1: link & integrate GSBPM 5.2-5.6: “process” GSBPM 5.7-5.8: calculate aggregates Check processing “DATAWAREHOUSE” The largest enterprises L.E. output 1 output 2 output 3 If a team within a NSI produces consistent microdata for largest enterprises -> consider this source as backbone

21 20 Units: ideal situation ESSnet DWH – business register enterprise has a unique ID enterprise group has a unique ID enterprise and enterprise group correspond with statistical definitions are used in all data sources In practice more complex situations do exist (especially when using more admin data)

22 Titel van de presentatie 21 4 GSBPM -step 5.1 5.2 - 5.6 5.7 6 7 Linked data Integrated data processing on one unit + one population only flexible datasources with different population and units Flexible output for different populations, and units Key question: how to manage these different in- and output units and their relationships to the statistical unit

23 ESSnet DWH – business register 22 ENTERPRISE (=statistical unit) ENTERPRISE GROUP Legal unit “Accountìng” unit “VAT-unit” other units “other tax” units enterprise Enterprise Local unit LKAU KAU Enterprise group INPUTIN S-DWH processing OUTPUT

24 23 The unit base ESSnet DWH – business register Some remarks: Complexity of unit base depends on - scope of statistical-DWH -national legislation (practices) with respect to enterprise units Unit base closely related to Business Register. Main motivation to place this base outside the Business registers - more flexible in case of new in- and outputs - more transparent in case of linking errors

25 24 SBR Pop-frame VATempl. GSBPM 5.1: link & integrate GSBPM 5.2-5.6: “process” GSBPM 5.7-5.8: calculate aggregates Check processing “DATAWAREHOUSE” Position of Business Register in stat -DWH L.E. output 1 output 2 output 3 survey units tax BIG DATA other

26 25 Feedback to Business Register ESSnet DWH – business register In case of conflicting information between datasources and conclusion is influential error in backbones (and indirectly SBR) When incorporating corrections in statistical DWH ? When incorporating corrections in backbones ? When incorporating corrections in SBR?

27 26 SBR Pop-frame survey other units VATempl. GSBPM 5.1: link & integrate GSBPM 5.1-5.6: “process” GSBPM 5.7-5.8: calculate aggregates Check processing L.E. Correction of information In SDWH: corrections at 5.6 In backbones themselves: timing most important revisions In SBR: after end of year (for consistency) – exception major impact “DATAWAREHOUSE” output 1 output 2 output 3

28 27 Conclusions ESSnet DWH – business register Requirements for statistical-DWH Population well defined Use of one unit in processing Backbones desired for populations, VAT-turnover, admin data employment, large enterprises Business Register is indirect input for statistical DWH population frame, unit base, survey Timing of corrections errors (backbone information) in DWH: before weighting in backbone: when revising in Business Register: end of year


Download ppt "ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands."

Similar presentations


Ads by Google