Download presentation
Presentation is loading. Please wait.
1
Documentation of statistics Metadata
2
We and our users get lost without metadata
Why metadata? I work in dissemination – Metadata sounds boring and / or as a job for librarians: Necessary to explain the origin and meaning of data Supports “findability” Navigation Search engines We and our users get lost without metadata
3
Metadata is everywhere in the Generic Statistical Business Process Model
Source:UNECE Secretariat - April 2009
4
A never ending demand Annual Danish user surveys since 2001
Every year users have placed more / better documentation as their number 1 priority A number of improvements No effect what so ever Documentation is mainly about metadata
5
Why we can’t live without metadata
6
May (different )ways of looking at metadata
Let’s focus on those relevant to dissemination Purpose: Descriptive – to explain meaning of data ‘Findability’ – Navigation & search engines Data related Variable related Publication related
7
Data related / reference metadata
Description of source of data Methodology used to produce data Status of data (provisional / revised / etc.) Implemented as: Quality declarations Footnotes attached to cells / tables Based on:Edwin de Jonge (CBS)
8
Quality declarations – reference metadata
Administrative info Contents Time Accuracy Comparability Accessibility
9
Quality declarations – Reference metadata
Source:
10
Footnote attached to table
11
Variable related metadata
Name and description of variable Aggregation method used Unit (1,000, euro, kg, etc.) Name and description of classification Name and description of classification items (categories) Variable related metadata is partly descriptive but names are also important for ‘findability’ Based on:Edwin de Jonge (CBS)
12
©Statistics Denmark©Statistics Denmark
OECD example i ©Statistics Denmark©Statistics Denmark
13
©Statistics Denmark©Statistics Denmark
Eurostat Example ©Statistics Denmark©Statistics Denmark
14
Metadata is readily available and useable in context of client's information need
What is a projection? What is the difference between immigranta and descendants? Which countries are Western? ©Statistics Denmark
15
Presenting metadata –selective needs
Ancestry click!
16
Metadata on variabel - civilstatus
17
Publication metadata Metadata related to publishing Release calendars
Also for search engines Dublin Core Standard for document metadata on the Internet Hidden metadata information supporting search engines
18
Publication metadata –release calendars
19
Publication metadata –release calendars
Contact information Links to metadata Other publications
20
Publication metadata –release calendars
21
Publication metadata Many publication metadata are Dublin Core (dc) related- and supports search engines: Title (dc) Spatial (dc) Author (dc) Temporal (dc) – reporting period Created (dc) Subject (dc) Modified (dc) Frequency Source (dc) Laguage Description (dc) Subject Area Summary (dc) Statistical theme Published (dc)
22
Dublin core supporting search
23
Terminology / linguistics Coherence
Metadata challenges Terminology / linguistics Coherence Output databases / changes over time Audience / Target groups
24
Terminology –What are our users talking about?
Statistical terms: CPI Employed Salary Income Household Family Layman terms: Inflation Working Income/Salary Family
25
Dissemination metadata issues - Linguistics
‘Findability’: Users uses synonyms /hyponym to find data and finds nothing Synonym: Job <> occupation, business vs enterprises Hyper/hyponym: vehicles <> car <> SUV Musical instrument" is a hypernym of "guitar" because musical instruments include guitars
26
Metadata should ensure coherence in contents
same definitions, aggregations and classifications must be used across all subject areas and media should build on international recognized nomenclatures data sources must be technically coordinated =>Statisticians <> Dissemination
27
Inconsistent tables - Motorbike owner Car owner 18 – 25 A 26 – 45 B
> 46 C 18 – 29 D 30 – 41 E 41+ F When creating tables / compiling statistics detailed attention should be given to the harmonization of variable values even across different subject areas. Otherwise you will end up being inconsistent both across time and across subject areas. In the example above we have to different variables Car owner and motorbike own distributed by age. Even if the data is coming from different surveys and that it is there for not possible to make cross tabulations of Motorbike and Car owner it is still much better dissemination to usage the same age groupings across all tables compiled by your organization. This is of cause in real life a nearly impossible task. 1/17/2019
28
Consistent structural metadata in the Danish model
Centralized variables values unit ”by”/”and” time template for quality declaration Decentralized contents footnote contact person quality declaration Decentralized metadata is highly standardized through templates, guides and editorial overview
29
Time dependency – output databases
Definitions of variables may change All most all cubes have a time dimension If a measure changes A new measure is added If a dimension changes New categories are added Change in dimension depends on selection in time dimension -> Many empty cells – (region)
30
Metadata play a role when the users
Metadata …. for what? Metadata play a role when the users browse search select comprehend compare
31
Metadata – for whom? Staff Users, internal/external
database administrators statisticians developers managers Users, internal/external news media international organisations researchers occasional users
32
Documentation – metadata principles*
ensure customers are identified for all metadata processes make metadata 'active' to greatest extent possible - also to Google (*)single authoritative source - 'registration authority‘ reuse metadata metadata is readily available and useable in context of client's information need (*)
33
And now back to work …. card sorting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.