Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London.

Slides:



Advertisements
Similar presentations
UK DATA ARCHIVE Louise Corti, ODAF April UK Data Archive an internationally-renowned centre of expertise in data acquisition, preservation, dissemination.
Advertisements

Karen Dennison Accessing international survey data collections via ESDS British Academy, Tuesday 14 March 2006 ESDS International.
Metadata and the UK Data Archive CESSDA Expert Seminar Odense September 2008 Margaret Ward Lenin Ageer.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
An Leabharlann UCD Órna Roche UCD James Joyce Library Metadata Documenting your data
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Discove r Humanities and Social Science Electronic Thesaurus - HASSET Faceted search HASSET is the subject thesaurus that the UK Data Service uses to index.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
1 Adaptive Management Portal April
An Introduction to Metadata by Wendy Duff ECURE 2000 October 6, 2000.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
ACCESS TO QUALITY RESOURCES ON RUSSIA Tanja Pursiainen, University of Helsinki, Aleksanteri institute. EVA 2004 Moscow, 29 November 2004.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
Introduction to Metadata, the DDI and the Metadata Editor Presentation to the SERPent project team by Margaret Ward 3 March 2010.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
1 Metadata Standards Catherine Lai MUMT-611 MIR January 27, 2005.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
Introduction to metadata
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
DDI AND EXPERIENCES AT ICPSR Prepared for Expert Seminar Finnish Social Science Data Archive Tampere, Finland September 1-2, 2000.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
METADATA ORGANISATION ESDS APPROACHES AND RESOURCES …………………………………………
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
The Semantic Web By: Maulik Parikh.
Information modeling and infrastructures for metadata
An Overview of Data-PASS Shared Catalog
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Catherine Lai MUMT-611 MIR January 27, 2005
Lifecycle Metadata for Digital Objects
Data Management: Documentation & Metadata
Cataloging the Internet
2. An overview of SDMX (What is SDMX? Part I)
Session 2: Metadata and Catalogues
Palestinian Central Bureau of Statistics
Presentation transcript:

Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

What Do Social Researchers Want? Discover available datasets (globally, not just in their own country) and related research literature Understand in detail the origin, methodology and structure of datasets (social sciences datasets are modest in size but big in complexity) Compare and Link data from different sources Model the social phenomena underlying the data Publish their findings with all the supporting evidence (no ‘iceberg’ publishing) and Reproduce published results Connect to other experts and Share informal comments and advice Enforce confidentiality and intellectual property rights while mantaining accuracy and access to data sources. … and more

How? through rich and systematic description – though a language that humans and computers can both understand using commonly agreed or mappable vocabularies and standards which must be flexible and adaptable metadata

What are metadata? Metadata are structured data which describe the characteristics of an object or resource. They share many similar characteristics to the cataloguing that takes place in libraries, museums and archives. The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind. A metadata record typically consists of a number of pre-defined elements representing specific attributes of a resource, and each element can have one or more values.

Grasshopper

Metadata schema Element nameValue Title Web UKDA Catalogue CreatorLouise Corti PublisherUK Data Archive Identifier FormatText/html RelationData Archive Web site Each metadata schema will usually have the following characteristics: a limited number of elements the name of each element the meaning of each element

International standards for metadata schema to ensure that every element of information pertaining to the lifecycle of an object ( collection) can be captured: –creation, appraisal, accessioning, conservation, preservation, availability and access must be dynamic and must be open to amendment aim to be consistent, appropriate and self-explanatory description facilitate the retrieval and exchange of information enable the sharing of authority data enable the integration of descriptions from different locations into a unified information system

Common metadata schemas Dublin Core minimum number of elements required to facilitate the discovery of document-like objects in a networked environment (eg Internet). Currently 15: Content: Title, Subject, Description, Source, Language, Relation, Coverage Intellectual Property: Author/Creator, Publisher, Contributor, Rights Electronic/Physical Manifestation: Date,Type, Format, Identifier ISAD(G) General International Standard of Archival Description E-GIF E-Government Interoperability Framework OAIS Open Archival Information Systems Reference Model OAI Open Archives Initiative Protocol for Metadata Harvesting

No shortage of statistical metadata standards The Common Warehouse Metamodel (CWM) from OMG – data warehousing and business intelligence ISO – data elements in a metadata repository SDMX – multidimensional data and time-series IQML, AskXML and Triple-S - questionnaire data The Data Documentation Initiative (DDI) – a general metadata standard for statistical data (micro as well as aggregated) And many other related standards. e-Social Science requires more than simple ”data” metadata: –Thesauri, Classifications

Encoding schemes HTML (Hyper-Text Markup Language in Web pages, version 3.2 or 4.0) SGML (Standard Generalised Markup Language) XML (eXtensible Markup Language) RDF (Resource Description Framework) MARC (MAchine Readable Cataloging) MIME (Multipurpose Internet Mail Extensions) Z39.50 (protocol for distributed information retrieval) LDAP (Lightweight Directory Application Protocol )

Example of deploying metadata for a simple web resource embedding the metadata in a Web page by the creator using META tags in the HTML coding of the page as a separate document (eg XML) linked to a web resource it describes in a database linked to the web resource. The records may either have been directly created within the database or extracted from another source, such as Web pages but what about complex social science data?

Stepping back: The Standard Study Description devised in 1970s to describe academically created sociological/political science datasets recommended key bibliographic elements informally ‘adopted’ by CESSDA in 1980s often adapted to suit local needs

The Standard Study Description recommended elements : subject category title depositor principal investigator abstract and main topics kind of data dimensions of dataset universe sampled sampling procedures method of data collection dates of coverage, fieldwork and deposit availability and access conditions references to reports and related datasets Controlled vocabulary adopted for some elements –e.g sampling, kind of data subject and geographical key words from broad social science Thesaurus (HASSET)

The first step towards interoperability driven by the need to search across European Data Archive holdings development of a core element set for the Integrated Data Catalogue (IDC) catalogue records marked with standard tags for inclusion into WAIS indexes (Wide Area Information Servers) enabled multi-site searching via WAIS protocol simplistic and excluded - links to additional metadata, documentation, thesaurus help, and browsing

the DDI is widely adopted by social sciences data archives all over the world that provide many of the datasets used by social scientists for secondary analysis initiated and organised by the the Inter-University Consortium for Political and Social Research (USA) in 1995 to create a metadata standard for the social science community members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data first in SGML then in XML DDI 1.0 published in Currently at version 2. Version 3 is being designed and it is scheduled for 2006

The Structure of a DDI Codebook Document Description –Description of the codebook document itself (author, sources, etc) Study Description –Information about the entire study or data collection (content, collection methods, processing, sources, access conditions etc) File Description –Description of each single file of the data collection (formats, dimensions, processing information, etc.). Data Description –Description of each single variable in a datafile (format, variable and value labels, definitions, question texts, imputations etc.) Other Study-related Materials –References to reports and publications and other machine readable documentation

Data description - variables CaseNumberSex Age Country Ocuupation QuestionResponses

DDI in XML

Understanding Statistical Metadata Different approaches to understanding: what is it for? –statistical metadata has no value in itself, it is just a means to an end. Its progress should be measured by the extent that it facilitates social research what is it like? –Anything familiar we can relate it to? Form of communication might be a good choice

Benefits interoperability –homogeneous exchangeable documents richer content –comprehensive set of elements providing the potential data analyst with broader knowledge single document - multiple purposes –repurposed for different needs and applications – preservation, discovery, and dissemination on-line subsetting and analysis –standard uniform structure and content for variables, ensures easy import into on-line analysis systemsp precision in searching –field-specific searches across documents are enabled and more … – human-readable and computer actionable – essential foundation for E-science and the Grid

EU Madiera Portal Meta(data) Browsing Search Multilingual Browsing

Summary - the DDI The DDI can serve as the foundation for content, distribution, use and preservation of data collections in the social and behavioural sciences, across institutions, countries, and disciplines cooperation from both data producers and statistical software manufacturers, so that the DDI specification can readily become the basis for the entire research process, from generation of a data collection instrument to production of research articles serves the social science community well with a specification that produces quality metadata with multiple purposes. It fully documents the details of datasets, it is user friendly and accessible, it integrates into the infrastructure of the Web and it supports automatic generation of statistical software system files. the widespread adoption of the DDI will vastly improve access to a range of varied datasets. Expanded use will greatly enhance comparative research; the ability to harmonize datasets over time and geography will lead to significant improvement in our understanding of societies

The future Statistical metadata is here and it is already changing the way people locate and make sense of data but it does not yet support most use cases of interest to social scientist. What we will need to move forward is: Grammar, a standard Semantic infrastructure (e.g. as provided by the Semantic Web): –semantic extendibility –ability of integrating (merging and overriding) descriptions from different sources large Vocabulary, by integrating different flavours of metadata: –unique identifiers for data and research literature –statistical data metadata (full life cycle) –Ontologies, Thesauri and Classifications (and mappings among them) –statistical processing metadata –“Secondary metadata”: annotations, quality assessment, links to research literature –experts metadata (FOAF)

Not Even Half Way There.. DDI StandardTEI for QD RDF Semantic Web Nesstar – Data Web ELSST Integrated Data Catalogue USI Cooperative Markup Annotations Comparable variables Unified Authentication Mappings References Extraction Future developments: Progress in metadata and technical standardisation Latent knowledge capture and extraction Grid

Qualitative data and the DDI in October 2001 ESDS Qualidata formally adopted the DDI to describe data in 2000, began to explore standards for archiving, and web representation of qualitative data expertise from the text processing/arts and humanities communities - TEI ESDS Qualidata Online show basic potential of what can be achieved by a common standard need to catch up with the statistical community! working model that will presented today