Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working on coherence and consistency of an output database

Similar presentations

Presentation on theme: "Working on coherence and consistency of an output database"— Presentation transcript:

1 Working on coherence and consistency of an output database
Max Booleman (Statistics Netherlands) IMAODBC September 2005

2 How to improve dissemination (policy) and the transparency of the statistical system?
Starting point : our mission Introduction metadata model Dissemination policy Transformation process of StatLine Relation with other processes

3 Mission of Statistics Netherlands
Core-business: dissemination of undisputed, coherent statistical information on the Dutch society.

4 Dissemination policy (1)
Coherent information: harmonisation and standardization of concepts & definitions models (equations, accounting systems) comparability (time, space, between concepts) Undisputed information: consistent output, quality (reliable, timely, complete) transparency (communication with users)

5 Metadata model (1) (output oriented)
Based on ISO-standards and results of Neuchatel group: Conceptual metadata Process metadata Quality metadata Technical metadata

6 Trade, 2001 Metadata model (2) Turnover Costs Profit Size class 1 9
*1000 euro Costs Profit Size class 1 9 Size class 2 12 total 21 What should be the metadata of ’21’?

7 Trade, 2001 Metadata model (2)
Conceptual metadata = metadata of data element Metadata model (2) Trade, 2001 Turnover *1000 euro Costs Profit Size class 1 9 8 5 3 Size class 2 12 15 10 Total 21 23 What should be the metadata of ’21’? Process metadata = metadata within data element Data element

8 Metadata model (3) Object server Count Variable server Class. servers
enterprise establishment person household Count Variable server Turnover empl. Cap. inv. Class. servers Meas. Unit server Euro’s D.gld number meters Act. ….. textile metal transport …. Time …… 1997 1998 Area …… Amst. Rott. …...

9 The Statistical Process The Statistical Process
C o n c e p t u a l M e t a d a t a Input Variables Output Variables Meta servers for Input-world Output-world The Statistical Process D e s i g n S t a g e : m e t a d a t a 6 7 8 9 4 2 1 5 Implementation stage: data 3 R R E E S G P I O S N T D R E A T I S O N S Input sphere Throughput sphere Output sphere Data collection & data entry Editing & imputation Aggregation & Disclosure Control Selection & Tabulation Publication & Dissemination Micro Level Input-register BaseLine Micro level Output-register MicroBase Macro Level Cube StatBase Output Database StatLine U E R - - - Meta servers for P r o c e s s M e t a d a t a

10 Dissemination policy (2)
How to prevent incomparable and inconsistent information? How to present incomparable and inconsistent information?

11 Purpose Statistical Co-ordination (1)
Variable X Variable close to X 0-10 A 11-20 B >20 C 0-15 16-30 >30 D E F

12 Purpose Statistical Co-ordination (2)
Variable X Variable close to X 0-10 A 11-20 B >20 C 0-15 16-30 >30 D E F

13 Purpose Statistical Co-ordination (3)
Standard classification Variable X Variable close to X 0-10 A 11-20 B >20 C G H I J D K E L F 0-15 16-30 >30 D E F optional

14 Coherence and comparability
Within the standard part of StatLine the (relative) number of empty cells is a good indication for the degree of coherence and comparability

15 Appearance of inconsistencies
Functional vs institutional statistics Make vs use statistics Single source vs integrated (multi) source statistics Technical differences within production process

16 Kind of inconsistencies (1)
Conceptual Variable (unneeded differences) Population (classification, operational definition) Time (reference period) Measurement unit

17 Kind of inconsistencies (2)
Data Soft: (calculated) combination unlikely Hard: combination impossible

18 Kind of inconsistencies (3)
Textual Homonym (equal naming, different meaning) Synonym (different naming, equal meaning)

19 Towards StatBase cubes (1)
Social Statistics Database (operational) Economic Statistics Database (under construction)

20 Towards StatBase cubes (2)
For each statistical unit one (virtual) StatBase cube. Three dimensions of a StatBase cube Time Population (classifications, subdivisions) Variable StatLine: a view on a StatBase cube

21 Why a central metadata server?
Part of the new StatLine datamodel but also available for input and throughput processes The aims: Transparency Harmonisation, standardization Consistency Coherence Efficiency Bureau of Standards (dissemination of metadata standards)

22 Metadata server: process
Convert existing metadata from present StatLine into the new model; New deliveries to StatLine can only point to existing metadata; New metadata can only be added via special group (SC); Existing metadata will be screened (textual consistency) by SC; Existing metadata can be set to ‘expired’ by SC.

23 Statistics Netherlands is going to
have a transparent StatLine and statistical system have a textual consistent StatLine maximize the consistency between figures and minimize the empty data elements present a clear view on dissemination policy be recognized by external organizations as Bureau of Standards for statistical concepts

24 In the future StatLine will contain two parts:
The coherent and consistent part Others Thank you for your attention

Download ppt "Working on coherence and consistency of an output database"

Similar presentations

Ads by Google