Presentation is loading. Please wait.

Presentation is loading. Please wait.

Documentation of statistics Metadata

Similar presentations


Presentation on theme: "Documentation of statistics Metadata"— Presentation transcript:

1 Documentation of statistics Metadata

2 We and our users get lost without metadata
Why metadata? I work in dissemination – Metadata sounds boring and / or as a job for librarians: Necessary to explain the origin and meaning of data Supports “findability” Navigation Search engines We and our users get lost without metadata

3 Metadata is everywhere in the Generic Statistical Business Process Model
Source:UNECE Secretariat - April 2009

4 A never ending demand Annual Danish user surveys since 2001
Every year users have placed more / better documentation as their number 1 priority A number of improvements No effect what so ever Documentation is mainly about metadata

5 Why we can’t live without metadata

6 May (different )ways of looking at metadata
Let’s focus on those relevant to dissemination Purpose: Descriptive – to explain meaning of data ‘Findability’ – Navigation & search engines Data related Variable related Publication related

7 Data related / reference metadata
Description of source of data Methodology used to produce data Status of data (provisional / revised / etc.) Implemented as: Quality declarations Footnotes attached to cells / tables Based on:Edwin de Jonge (CBS)

8 Quality declarations – reference metadata
Administrative info Contents Time Accuracy Comparability Accessibility

9 Quality declarations – Reference metadata
Source:

10 Footnote attached to table

11 Variable related metadata
Name and description of variable Aggregation method used Unit (1,000, euro, kg, etc.) Name and description of classification Name and description of classification items (categories) Variable related metadata is partly descriptive but names are also important for ‘findability’ Based on:Edwin de Jonge (CBS)

12 ©Statistics Denmark©Statistics Denmark
OECD example i ©Statistics Denmark©Statistics Denmark

13 ©Statistics Denmark©Statistics Denmark
Eurostat Example ©Statistics Denmark©Statistics Denmark

14 Metadata is readily available and useable in context of client's information need
What is a projection? What is the difference between immigranta and descendants? Which countries are Western? ©Statistics Denmark

15 Presenting metadata –selective needs
Ancestry click!

16 Metadata on variabel - civilstatus

17 Publication metadata Metadata related to publishing Release calendars
Also for search engines Dublin Core Standard for document metadata on the Internet Hidden metadata information supporting search engines

18 Publication metadata –release calendars

19 Publication metadata –release calendars
Contact information Links to metadata Other publications

20 Publication metadata –release calendars

21 Publication metadata Many publication metadata are Dublin Core (dc) related- and supports search engines: Title (dc) Spatial (dc) Author (dc) Temporal (dc) – reporting period Created (dc) Subject (dc) Modified (dc) Frequency Source (dc) Laguage Description (dc) Subject Area Summary (dc) Statistical theme Published (dc)

22 Dublin core supporting search

23 Terminology / linguistics Coherence
Metadata challenges Terminology / linguistics Coherence Output databases / changes over time Audience / Target groups

24 Terminology –What are our users talking about?
Statistical terms: CPI Employed Salary Income Household Family Layman terms: Inflation Working Income/Salary Family

25 Dissemination metadata issues - Linguistics
‘Findability’: Users uses synonyms /hyponym to find data and finds nothing Synonym: Job <> occupation, business vs enterprises Hyper/hyponym: vehicles <> car <> SUV Musical instrument" is a hypernym of "guitar" because musical instruments include guitars

26 Metadata should ensure coherence in contents
same definitions, aggregations and classifications must be used across all subject areas and media should build on international recognized nomenclatures data sources must be technically coordinated =>Statisticians <> Dissemination

27 Inconsistent tables - Motorbike owner Car owner 18 – 25 A 26 – 45 B
> 46 C 18 – 29 D 30 – 41 E 41+ F When creating tables / compiling statistics detailed attention should be given to the harmonization of variable values even across different subject areas. Otherwise you will end up being inconsistent both across time and across subject areas. In the example above we have to different variables Car owner and motorbike own distributed by age. Even if the data is coming from different surveys and that it is there for not possible to make cross tabulations of Motorbike and Car owner it is still much better dissemination to usage the same age groupings across all tables compiled by your organization. This is of cause in real life a nearly impossible task. 1/17/2019

28 Consistent structural metadata in the Danish model
Centralized variables values unit ”by”/”and” time template for quality declaration Decentralized contents footnote contact person quality declaration Decentralized metadata is highly standardized through templates, guides and editorial overview

29 Time dependency – output databases
Definitions of variables may change All most all cubes have a time dimension If a measure changes A new measure is added If a dimension changes New categories are added Change in dimension depends on selection in time dimension -> Many empty cells – (region)

30 Metadata play a role when the users
Metadata …. for what? Metadata play a role when the users browse search select comprehend compare

31 Metadata – for whom? Staff Users, internal/external
database administrators statisticians developers managers Users, internal/external news media international organisations researchers occasional users

32 Documentation – metadata principles*
ensure customers are identified for all metadata processes make metadata 'active' to greatest extent possible - also to Google (*)single authoritative source - 'registration authority‘ reuse metadata metadata is readily available and useable in context of client's information need (*)

33 And now back to work …. card sorting


Download ppt "Documentation of statistics Metadata"

Similar presentations


Ads by Google