Download presentation
Presentation is loading. Please wait.
Published byMegan Cunningham Modified over 9 years ago
1
Introduction to metadata management, quality and licensing PwC firms help organisations and individuals create the value they’re looking for. We’re a network of firms in 158 countries with close to 180,000 people who are committed to delivering quality in assurance, tax and advisory services. Tell us what matters to you and find out more by visiting us at www.pwc.com. PwC refers to the PwC network and/or one or more of its member firms, each of which is a separate legal entity. Please see www.pwc.com/structure for further details.
2
This presentation has been created by PwC Authors: Makx Dekkers, Michiel De Keyzer, Nikolaos Loutas and Stijn Goedertier Presentation metadata Slide 2 Open Data Support is funded by the European Commission under SMART 2012/0107 ‘Lot 2: Provision of services for the Publication, Access and Reuse of Open Public Data across the European Union, through existing open data portals’(Contract No. 30-CE- 0530965/00-17). © 2014 European Commission Disclaimers 1.The views expressed in this presentation are purely those of the authors and may not, in any circumstances, be interpreted as stating an official position of the European Commission. The European Commission does not guarantee the accuracy of the information included in this presentation, nor does it accept any responsibility for any use thereof. Reference herein to any specific products, specifications, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favouring by the European Commission. All care has been taken by the author to ensure that s/he has obtained, where necessary, permission to use any parts of manuscripts including illustrations, maps, and graphs, on which intellectual property rights already exist from the titular holder(s) of such rights or from her/his or their legal representative. 2.This presentation has been carefully compiled by PwC, but no representation is made or warranty given (either express or implied) as to the completeness or accuracy of the information it contains. PwC is not liable for the information in this presentation or any decision or consequence based on the use of it.. PwC will not be liable for any damages arising from the use of the information contained in this presentation. The information contained in this presentation is of a general nature and is solely for guidance on matters of general interest. This presentation is not a substitute for professional advice on any particular matter. No reader should act on the basis of any matter contained in this publication without considering appropriate professional advice.
3
Learning objectives By the end of this training module you should have an understanding of: What metadata is; The terminology and objectives of metadata management; The use of controlled vocabularies for metadata; The creation and publication of description metadata of datasets on the EU ODP. What (open) data quality means; The open data quality determinants and criteria; Good practices for publishing high-quality (linked) open data. The importance of licensing; The meaning of licensing in the world of Open Data; Reuse principles and conditions for European Commission documents; The licensing option for data and metadata published via the EU ODP. Slide 3
4
Content This module contains... An explanation of what is metadata; An outline of how to create and publish metadata on the EU ODP. A definition of data quality; An overview of the dimensions of data and metadata quality; A selection of best practices for publishing good quality data and metadata. The importance of licensing; Licensing in the Open Data principles; Reuse principles for European Commission documents; The licensing option for publishing data and metadata via the EU ODP. Slide 4 Find more on: training.opendatasupport.eu
5
What is metadata? Definition, examples and reusable standards. Slide 5
6
What is metadata? “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.” -- National Information Standards Organization http://www.niso.org/publications/press/UnderstandingMetadata.pdf Metadata provides information enabling to make sense of data (e.g. documents, images, datasets), concepts (e.g. classification schemes) and real-world entities (e.g. people, organisations, places, paintings, products). Slide 6
7
Examples of metadata Slide 7 Can Book Dataset Label Catalogue card Dataset description (DCAT) Provides metadata on
8
Example: description of an open dataset with the DCAT-AP Description of the Catalogue Description of the Dataset Description of the Distribution Slide 8
9
Reuse existing vocabularies for providing metadata to your datasets DCAT application profile for data portals in Europe, http://joinup.ec.europa.eu/asset/dcat_application_profile/description http://joinup.ec.europa.eu/asset/dcat_application_profile/description Based on DCAT – a W3C Recommendation http://www.w3.org/TR/vocab-dcat/ http://www.w3.org/TR/vocab-dcat/ Defines mandatory, recommended and optional classes and properties Recommends a number of controlled vocabularies for assigning values to properties, e.g. Eurovoc for dcat:theme. Currently implemented in the context of Open Data Support; A number of Member States are considering its adoption; The metadata model of the EU ODP will also converge. Slide 9
10
Controlled vocabularies Using thesauri, taxonomies and standardised lists of terms for assigning values to metadata properties. Slide 10
11
What are controlled vocabularies? A controlled vocabulary is a predefined list of values to be used as values for a specific property in your metadata schema. In addition to careful design of schemas, the value spaces of metadata properties are important for the exchange of information, and thus interoperability. Common controlled vocabularies for value spaces make metadata understandable across systems. Slide 11
12
Which controlled vocabulary to be used for which type of property Use code lists as controlled vocabulary for free text or “string” properties. Example DCAT-AP property: Example code list - ObjectInCrimeClass (ListPoint) Use concepts identified by a URI for reference to “things”. Example DCAT-AP property: Example taxonomy with terms having a URI - EuroVoc Slide 12
13
Example –Publications Office’s Named Authority Lists The Named Authority Lists offer reusable controlled vocabularies for: Countries Corporate bodies File types Interinstitutional procedures Languages Multilingual Resource types Roles Treaties Slide 13 See also: http://publications.europa.eu/mdr/authority/ See also: http://publications.europa.eu/mdr/authority/
14
EuroVoc for labelling the themes of datasets Managed by the Publications Office Thesaurus covering the activities of the EU Terms in 23 EU languages Users include the European Parliament the Publications Office national and regional parliaments and governments in Europe private users around the world Slide 14 See also: http://eurovoc.europa.eu/ See also: http://eurovoc.europa.eu/
15
Creating and publishing description metadata of datasets on the EU ODP Slide 15
16
Metadata management is important Metadata needs to be managed to ensure... Availability: metadata needs to be stored where it can be accessed and indexed so it can be found. Quality: metadata needs to be of consistent quality so users know that it can be trusted. Persistence: metadata needs to be kept over time. Open License: metadata should be available under a public domain license to enable its reuse. The metadata lifecycle is larger than the data lifecycle: Metadata may be created before data is created or captured, e.g. to inform about data that will be available in the future. Metadata needs to be kept after data has been removed, e.g. to inform about data that has been decommissioned or withdrawn. Slide 16
17
Creating and publishing your metadata on the EU ODP Manually creating your metadata using a spreadsheet template Use a spreadsheet template that conforms to the metadata model of the EU ODP in order to create description metadata for your datasets. Metadata creation using (semi-)automatic processes Develop an exporter that exports the description metadata of your datasets from your database/system in a format that conforms to the requirements of the EU ODP. Develop a screen-scraper/harvester that collects the description metadata of your datasets from your portal and transforms it in a format that conforms to the requirements of the EU ODP. Slide 17
18
Updating your metadata – planning for change Metadata operates in a global context that is subject to change! Organisation – departments are established, merge with others, responsibilities are handed over. Usage of the data – new applications emerge around data. Reference data – controlled vocabularies evolve and get linked. Data standards and technologies – technology lifecycle is getting shorter all the time; what will tomorrow’s Web look like? The description metadata of your datasets on the EU ODP needs to be kept up-to-date to the extent possible, taking into account the available time and budget. Slide 18
19
Storing your metadata The description metadata of your datasets to be published on the EU ODP should be stored separate from the data – but should be linked to it. This makes metadata management –including sharing – easier. Depending on the availability of tools and requirements on performance and capacity, metadata can be stored in a ‘classic’ relational database, a file on a Web location or an RDF triple store. Slide 19
20
Conclusions Description metadata provides information on your datasets. The quality of the description metadata directly affects the discoverability and reuse of your datasets. A structured approach should be followed for metadata management. The metadata lifecycle extends the lifecycle of datasets (metadata before publication and after deletion). Homogenised metadata enable the operation of metadata brokers, which can in turn lower the access barriers to your resources, leading to improved visibility and discoverability, and thus increasing their reuse potential. Slide 20
21
(Meta-)data quality Dimensions, principles, recommendations and best practices for publishing high-quality (meta-)data Slide 21
22
What is data (and metadata) quality? Data is of high quality "if they are fit for their intended uses in operations, decision making and planning." Or more specifically: “High quality data are accurate, available, complete, conformant, consistent, credible, processable, relevant and timely.” Slide 22
23
Metadata is data about data… “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data” -- National Information Standards Organization We observe that metadata is a type of data. The same quality considerations apply to data and metadata alike. Slide 23
24
The data quality dimensions What are the main dimensions to be taken into account for delivering good quality (meta)data? Slide 24
25
Data quality dimensions Accuracy: is the data correctly representing the real-world entity or event? Availability: Can the data be accessed now and over time? Completeness: Does the data include all data items representing the entity or event? Conformance: Is the data following accepted standards? Consistency: Is the data not containing contradictions? Credibility: Is the data based on trustworthy sources? Processability: Is the data machine-readable? Relevance: Does the data contain the necessary information to support usage and the application? Timeliness: Is the data representing the actual situation and is it published soon enough? Slide 25
26
Accuracy by example Slide 26 Higher accuracy Less accuracy OpenStreetMap, City of Utrecht, The Netherlands (2011 vs. 2007) Recommendations: Balance the accuracy of your data against the cost in the context of the application; it needs to be good enough for the intended use. Make sure that there is organisational commitment and investment in procedures and tools to maintain accuracy.
27
Availability by example High availability Less availability Slide 27 Recommendations: Follow best practices for the assignment and maintenance of URIs. Make sure that responsibility for the maintenance of data is clearly assigned in the organisation..
28
Completeness by example High completeness Less completeness Slide 28 Recommendations: Design the capture and publication process to include the necessary data points. Monitor the update mechanisms on a continuous basis.
29
Conformance by example High conformance Less conformance Slide 29 See also: https://joinup.ec.europa.eu/asset/adms_foss/ne ws/just-released-admssw-validator-verify-and- visualise-rdf-software-metadata See also: https://joinup.ec.europa.eu/asset/adms_foss/ne ws/just-released-admssw-validator-verify-and- visualise-rdf-software-metadata Recommendations: Apply the most used standards in the domain that is most relevant for the data or metadata. Define local vocabularies if no standard is available, but publish your vocabularies according to best practice (e.g. dereferenceable URIs).
30
Consistency by example High consistency Less consistency Slide 30 Recommendations: Process all data before publication to detect conflicting statements and other errors (in particular if data is aggregated from different sources).
31
Credibility by example High credibility Data coming from the Publications Office of the EU: Less credibility Data coming from Lexvo: Slide 31 Recommendations: Base data on sources that can be trusted or on explicit Service Level Agreements where possible and appropriate. Make appropriate attributions so that re-users can determine whether or not they can trust the data.
32
Processability by example Higher processabilityLess processability Slide 32 Recommendations: Identify the source of terminology and codes used in the data in machine-readable manner. Apply recommendations for syntax of data given in common standards and application profiles.
33
Relevance by example Less relevance Slide 33 Recommendations: Match coverage and granularity of data to its intended use within constraints of available time and money. However, also consider potential future usages of the data. High relevance
34
Timeliness: examples Less timeliness Slide 34 High timeliness Recommendations: Adapt the update frequency of data to the nature of the data and its intended use. Make sure that processes and tools are in place to support the updating.
35
Best practices Best practices for publishing high-quality data and metadata. Slide 35
36
Best practices for publishing high-quality data and metadata Provide appropriate descriptions of data (i.e. metadata). Use standard vocabularies for metadata and data whenever such vocabularies exist. Specify the license under which the data may be re-used. Adhere to legal requirements concerning protection of personal and other sensitive data. Represent metadata and data according to the Linked Data principles using persistent URIs for identifying things. Provide information about the source of the data. Maintenance of metadata and data is critical! Slide 36 See also: http://www.slideshare.net/OpenDataSupport /introduction-to-metadata-management See also: http://www.slideshare.net/OpenDataSupport /introduction-to-metadata-management
37
Conclusions The quality of data is determined by its fitness for (re-)use by data consumers. Metadata is “data about data”, i.e. metadata is a type of data. The same quality considerations apply to data and metadata alike. Data quality has multiple dimensions and is about more than the correctness of data. Accuracy, availability, completeness, conformance, consistency, credibility, processability, relevance, timeliness. Slide 37
38
The importance of Licensing Slide 38
39
Clear licence information is important because... It tells users and reusers exactly what they can do with your data and metadata. It encourages the use and reuse of your data and metadata the way you want them to be used and reused. It creates visibility of your efforts downstream (if you ask for attribution). Slide 39 If no explicit licence is provided, a user does not know what can be done with the data/metadata – the default legal position is that nothing can be done without contacting the owner on a case- by-case basis.
40
Licensing is the first star... Two stars: publish in machine-readable format One star: publish data under an open licence Three stars: publish in open format Five stars: create links to other data Four stars: assign URIs to data See also: http://www.slideshare.net/OpenDataSupport /introduction-to-linked-data-23402165 See also: http://www.slideshare.net/OpenDataSupport /introduction-to-linked-data-23402165 Slide 40
41
Reuse principles for EC documents Slide 41
42
Commission decision of 12 December 2011 on the reuse of Commission documents (2011/833/EU) Article 4 Public documents produced by the Commission or by public and private entities on its behalf are available for reuse as a general principle: (a) for commercial or non-commercial purposes (b) without charge and (c) without the need to make an individual application Slide 42
43
Commission decision of 12 December 2011 on the reuse of Commission documents (2011/833/EU) Article 5 “The Commission shall set up a data portal as a single point of access to its structured data so as to facilitate linking and reuse for commercial and non-commercial purposes. Commission services will identify and progressively make available suitable data in their possession. The data portal may provide access to data of other Union institutions, bodies, offices and agencies at their request.” Slide 43
44
Commission decision of 12 December 2011 on the reuse of Commission documents (2011/833/EU) Article 6 Conditions for reuse: (a)the obligation for the reuser to acknowledge the source of the documents; (b) the obligation not to distort the original meaning or message of the documents; (c) the non-liability of the Commission for any consequence stemming from the reuse. Slide 44
45
Licensing datasets to be published via the EU ODP Slide 45
46
Licensing datasets To attach no restrictions to your data, you need to say it. Every dataset should have a licence associated to it. -Without an explicit licence, reuse is restricted. The objective is to make data(sets) as openly available as possible, within the boundaries of the law. Slide 46 But which licensing option applies for datasets published via the EU ODP?
47
Licensing option for datasets published via the EU ODP Datasets published via the EU ODP are subject to the legal notice establishing that: 1.Reuse is authorised, provided the source is acknowledged; 2.The general principle of reuse can be subject to conditions which may be specified in individual copyright notices; 3.Reuse is not applicable to documents subject to intellectual property rights of third parties. Slide 47
48
Conclusions Data and metadata should be provided with an explicit licence so that reusers know what to do with the metadata and data and allow for maximum interoperability. -For datasets published via the EU ODP, the relevant legal notice applies and don’t forget... If no explicit licence is provided, a user does not know what (if anything) can be done with the data. No reuse = no social and economic value. Slide 48
49
Thank you!...and now YOUR questions? Slide 49
50
References Slide 50 See training.opendatasupport.eu:training.opendatasupport.eu TM 1.4 – Introduction to metadata management TM 2.2 – Introduction to Open Data Quality TM 2.5 – License your Data and Metadata
51
Further reading – Metadata Management Understanding Metadata, NISO. http://www.niso.org/publications/press/UnderstandingMetadata.pdf http://www.niso.org/publications/press/UnderstandingMetadata.pdf Ben Jareo and Malcolm Saldanha. The value proposition of a metadata driven data governance program. Best Practices Metadata. May 2012. https://community.informatica.com/mpresources/Communities/IW2 012/Docs/bos_30.pdf https://community.informatica.com/mpresources/Communities/IW2 012/Docs/bos_30.pdf John R. Friedrich, II. Metadata Management Best Practices and Lessons Learned. The 10th Annual Wilshire Meta-Data Conference and the 18th Annual DAMA International Symposium. April 2006. http://www.metaintegration.net/Publications/2006-Wilshire-DAMA- MetaIntegrationBestPractices.pdf http://www.metaintegration.net/Publications/2006-Wilshire-DAMA- MetaIntegrationBestPractices.pdf Slide 51
52
Further reading – Open Data Quality Joshua Tauberer. Open Government Data. http://opengovdata.io/http://opengovdata.io/ Juran, Joseph M. and A. Blanton Godfrey, Juran's Quality Handbook Slide 52
53
Further reading – (Meta)data Licensing N. Korn and C. Oppenheim. Licensing Open Data: A Practical Guide. http://discovery.ac.uk/businesscase/principles/ http://discovery.ac.uk/businesscase/principles/ Slide 53
54
Related initiatives – Metadata Management Metadata Management. Trainer screencasts, http://managemetadata.com/screencasts/msa/ http://managemetadata.com/screencasts/msa/ MIT Libraries. Data Management and Publishing. Reasons to Manage and Publish Your Data, http://libraries.mit.edu/guides/subjects/data- management/why.htmlhttp://libraries.mit.edu/guides/subjects/data- management/why.html ISA Programme. DCAT Application Profile for European Data Portals, https://joinup.ec.europa.eu/asset/dcat_application_profile/descripti on https://joinup.ec.europa.eu/asset/dcat_application_profile/descripti on Generating ADMS-based descriptions of assets using Open Refine RDF, https://joinup.ec.europa.eu/asset/adms/document/generate- adms-asset-descriptions-spreadsheet-refine-rdfhttps://joinup.ec.europa.eu/asset/adms/document/generate- adms-asset-descriptions-spreadsheet-refine-rdf The Dublin Core Medatata Initiative, http://dublincore.org/ http://dublincore.org/ Slide 54
55
Related initiatives – Open Data Quality Best Practices for Publishing Linked Data. https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html OPQUAST. Open data good practices. http://checklists.opquast.com/en/opendata http://checklists.opquast.com/en/opendata Slide 55
56
Related initiatives - (Meta)data Licensing Revision of the PSI Directive, http://ec.europa.eu/information_society/policy/psi/revision_directive/index _en.htm http://ec.europa.eu/information_society/policy/psi/revision_directive/index _en.htm Europeana Licensing Framework, http://pro.europeana.eu/documents/858566/7f14c82a-f76c-4f4f-b8a7- 600d2168a73d http://pro.europeana.eu/documents/858566/7f14c82a-f76c-4f4f-b8a7- 600d2168a73d Creative Commons Licenses, http://creativecommons.org/licenses/http://creativecommons.org/licenses/ Open Data Commons – Licenses, http://opendatacommons.org/licenses/http://opendatacommons.org/licenses/ The European Thematic Network on Legal Aspects of Public Sector Information, http://www.lapsi-project.eu/http://www.lapsi-project.eu/ EC ISA Programme, ISA Open Metadata licence v1.1. https://joinup.ec.europa.eu/category/license/isa-open-metadata-license-v11 https://joinup.ec.europa.eu/category/license/isa-open-metadata-license-v11 Slide 56
57
Be part of our team... Find us on Contact us Join us on Follow us Open Data Support http://www.slideshare.net/OpenDataSupport http://www.opendatasupport.eu Open Data Support http://goo.gl/y9ZZI @OpenDataSupport contact@opendatasupport.eu Slide 57
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.