Why metadata is AWESOME! Jon Johnson May 21 2015.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Database Planning, Design, and Administration
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
DDI3 Uniform Resource Names: Locating and Providing the Related DDI3 Objects Part of Session: DDI 3 Tools: Possibilities for Implementers IASSIST Conference,
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 7e Kendall & Kendall 8 © 2008 Pearson Prentice Hall.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Entering A New ERA : The European Research Area Ken Miller UK Data Archive University Of Essex June 11-15, 2002.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
The MetaDater Model and the formation of a GRID for the support of social research John Kallas Greek Social Data Bank National Center for Social Research.
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Reducing Metadata Objects Dan Gillman November 14, 2014.
The education variables in the European Social Survey: Advantages in using the DDI for documentation Hilde Orten and Hege Midtsæter Norwegian Social Science.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
Locating objects identified by DDI3 Uniform Resource Names Part of Session: Concurrent B2: Reports and Updates on DDI activities 2nd Annual European DDI.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
Kamaljit Caulton Strategic Research Team Questionnaire Design.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
IMS Proof of Concept for Data Capture using Metadata Bryan Fitzpatrick Rapanea Consulting Limited June 2014.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
CHRIS NELSON METADATA TECHNOLOGY WORK SESSION ON STATISTICAL METADATA GENEVA 6-8 MAY 2013 Designing a Metadata Repository Metadata Technology Ltd.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX IT Tools Introduction
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
MetaPlus Klas Blomqvist Statistics Sweden Research and Development – Central Methods
The Role of International Standards for National Statistical Offices Andrew Hancock Statistics New Zealand Prepared for 2013 Meeting of the UN Expert Group.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
The Application of Semantic Technologies to Scientific Archives J. Steven Hughes Daniel J. Crichton J. Steven Hughes Daniel J. Crichton Science Archives.
TIC Updates EDDI 2010 Wendy Thomas – 6 Dec Schedule and Process Changes Production schedule is moving to: – Summer / Winter release schedule January.
1 The Metadata Groups - Keith G Jeffery. 2 Positioning  Raise profile of metadata  Data first  Also software, resources, users  Achieve outputs/outcomes.
26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
>> Metadata What is it, and what could it be? EU Twinning Project Activity E.2 26 May 2013.
Metadata standards Using DDI to Inform, Organize, and Drive Survey Data Production.
M&CML: A Monitoring & Control Specification Modeling Language
Process 4 Hours.
knowledge organization for a food secure world
Data Management: Documentation & Metadata

ece 627 intelligent web: ontology and beyond
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Statistical Information Technology
in the data production process
Australian and New Zealand Metadata Working Group
Presentation transcript:

Why metadata is AWESOME! Jon Johnson May

Overview A short digression on metadata Questionnaire inputs and outputs Metadata standards Implementing processes in a complex environment CLOSER Search Platform Development The integration of metadata and data management New research possibilities

Discovery & Classification

Structure, navigation and meaning + Tables of contents + Indexes + Glossaries + References + Citations + Keywords

Describing the complex

Metadata, semantics and ontology Metadata is a mechanism for expressing the ‘semantics’ of information, as a means to facilitate information seeking, retrieval, understanding, and use. But meaning is as a ‘locally constructed’ artefact, [..], so that some form of agreement is required to maintain a common space of understanding. In consequence, metadata languages require shared representations of knowledge as the basic vocabulary from which metadata statements can be asserted. Scilia, M. (2006) Metadata, semantics, and ontology: providing meaning to information resources Int. J. Metadata, Semantics and Ontologies, 1(1) p83

Metadata, semantics and ontology Ontology as considered in modern knowledge engineering is intended to convey that kind of shared understanding. In consequence, ontology along with (carefully designed) metadata languages can be considered as the foundation for a new landscape of information management. Scilia, M. (2006) Metadata, semantics, and ontology: providing meaning to information resources Int. J. Metadata, Semantics and Ontologies, 1(1) p83

Lets start with the survey

What are survey questions trying to achieve Accurate Communication & Accurate Response Most important considerations are: Language used Frame of reference Arrangement of questions Length of the questionnaire Form of the response Dichotomous Multiple choice Check lists Open Ended Pictorial From Young, Pauline (1956) “Scientific Social Surveys & Research”, 3 rd Edition. Prentice Hall

What are we trying to capture How the survey was communicated & how participants responded Most important considerations are: Language used in the questions Frame of reference Arrangement of questions Length of the questionnaire Form of the response Dichotomous Multiple choice Check lists Open Ended Pictorial Who was asked Who responded Is the question asked related to another question Who was responsible for the collection

A Common mechanism for communication Capture what was intended What, where it came from and why Capture exactly what was used in the survey implementation How, the logic employed and under what conditions To specify what the data output will be That is mirrors what was captured and its source To keep the connection between the survey implementation through to the data received -> data management at CLS -> to the archive Generalised solution So that is can be actioned efficiently and is self-describing So that it can be rendered in different forms for different purposes

A Framework to work within

Current Longitudinal Survey Landscape

Barriers to sharing data and metadata Different agencies and clients have different systems Taking over a survey from another agency often requires re-inputting everything Questionnaire specification quality and format differences Different clients have different requirements Barriers are also internal within organisations Different disciplines have different attitudes to what is most important Different departments speak different languages Communication is always an issue Manual processes reduce transparency within and between organisations Survey Metadata: Barriers and Opportunities” Meeting June 26, 2014, London sought to address some of these issues

Adopting the standard o The scale and complexity of the CAI instruments is a significant barrier to making the survey collection transparent and comprehensible to survey managers, researchers and analysts and for its subsequent data management o CLS view the capturing of the implementation of the CAI.. in a standardised manner, to allow for version changes … during survey development and for later usage in data management and discovery as key output o Survey contractors will be required to provide as a minimum a DDI-L XML compliant file of the CAI instruments within four months after the start of fieldwork …. and a mapping between survey questions and data outputs o.. work with contractors to produce a ‘human readable version’ to improve usability of the questionnaire for end users

Learn DDI-L in 60 seconds

Study Concepts Concepts measures Survey Instruments using Questions made up of Universes about Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Questions Responses collect resulting in Data Files Variables made up of Categories/ Codes, Numbers with values of Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

THAT’S PRETTY MUCH IT!

CLOSER Metadata Search Platform

Building the Repository Questionnaires from Studies Metadata extracted from data by studies Mapping by studies Correspondences by studies and CLOSER Reuse UKDA metadata and existing sources e.g. Life and Understanding Society Metadata Officer and Assistants input and co-ordination Ingest into Repository

Building the Repository Each data collection is treated as a separate entity Data collections are being added in sequentially (birth – latest) Each captured element, variable, question, instrument, study has its own persistent identifier Relationships are maintained by internal references Each study has its own identity

Building the Repository Group studies by owner Connections between studies can be established to an item level Provenance is ‘built in’

Building the Repository Some things are maintained centrally Topics Health, etc Life Stage(s) Reference data Occupation coding Geography Schemes Canonical instruments GHQ SDQ Rutter

Building the Repository Ownership is returned to the studies Control by studies of what is pushed to the centre? Long term maintenance and management planning Resourcing Training Capacity planning New Studies can be brought in

Metadata management -> Data Management All objects in a DDI have a URN. These are intended to serve as persistent, location- independent identifiers, allowing the simple mapping of namespaces into a single URN namespace. The existence of such a URI does not imply availability of the identified resource, but such URIs are required to remain globally unique and persistent, even when the resource ceases to exist or becomes unavailable urn:ddi:DDIAgencyID:BaseID:Version e.g. urn:ddi:uk.closer:thingamajig:1.0.0

Information, Meaning and Relationships ETHNIC White / Black / Asian / Other Universe ETHNICE == respondents England ETHNICN == respondents (N Ireland) Concept “self defined ethnic identity” Based on 2000 ONS self defined ethnic identity Equal to 2010 ONS self defined ethnic identity Comparison ETHNICE (3) == ETHNICN (2) Agency uk.ons:ethnic2000:1.0 = ETHNIC 2000 uk.ons.ethnic = ETHNIC 2010 White 1. English/Welsh/Scottish/Northern Irish/British 2. Irish 3. Gypsy or Irish Traveller 4. Any other White background, please describe Mixed/Multiple ethnic groups 5. White and Black Caribbean 6. White and Black African 7. White and Asian 8. Any other Mixed/Multiple ethnic background, please describe Asian/Asian British 9. Indian 10. Pakistani 11. Bangladeshi 12. Chinese 13. Any other Asian background, please describe Black/ African/Caribbean/Black British 14. African 15. Caribbean 16. Any other Black/African/Caribbean background, please describe Other ethnic group 17. Arab 18. Any other ethnic group, please describe

Question and variable organisation

Code List Mapping

Metadata Code generation Use Cases Harmonisation Common code base from same metadata Platform independence Reproducibility of outputs

Understanding change ICD9 to ICD10 Is There a One-to-One Match Between ICD-9-CM and ICD-10? No, there is not a one-to-one match between ICD-9-CM and ICD-10, for which there are a variety of reasons including: ➤ There are new concepts in ICD-10 that are not present in ICD-9-CM; ➤ For a small number of codes, there is no matching code in the GEMs; ➤ There may be multiple ICD-9-CM codes for a single ICD-10 code; and ➤ There may be multiple ICD-10 codes for a single ICD-9-CM code. Comparison mapping between different codes (“things that mean something”) Concepts e.g. laterality in ICD10 Processing instructions leverage meaning and concepts

Tracking Version ICD-10-CM to ICD-9-CM GEM entry for “Toxic effect of lead, cause undetermined”

Metadata Driven Pipeline

Let’s DISCO PREFIX disco: PREFIX rdfs: PREFIX dcterms: PREFIX skos: SELECT COUNT(?universe) AS ?no ?universeDefinition WHERE { ?universe a disco:Universe. ?universe skos:definition ?universeDefinition. FILTER(langMatches(lang(?universeDefinition ), "EN")) FILTER regex(?universeDefinition, "SEARCHWORD", "i") } GROUP BY ?universeDefinition ORDER BY DESC(?no) LIMIT 10

Some final thoughts Reduction in manual processes Enables distributed data collection Enables distributed research Increased quality of documentation of data collection Raises visibility of needs Encourages users to better understand the data and the data collection process New tools to think in more interesting ways can be built