IASSIST Conference 2006 – Ann Arbor, May 24- 26 Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Testing Relational Database
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
10. NLTS2 Documentation Overview. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules.
Why, what were the idea ? 1.Create a data infrastructure, 2.Data + the knowledge products that are produced on the basis of data a) Efficiant access to.
Tutorial 12: Enhancing Excel with Visual Basic for Applications
Programming Types of Testing.
L ECTURE 11 – D ATA M ODELLING Data Dictionaries Entity Relationship Diagram for Data Modelling Steps to Construct Entity Relationship Diagrams Validation.
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
Plannes security for items, variables and applications NEPS User Rights Management.
Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme.
Inside View of DDI Version 3.0: Structural Reform Group Report Presented to IASSIST 25 May 2005 Edinburgh Scotland UK.
Hannele Keckman-Koivuniemi and Mari Kleemola : Data Processing in FSD : CHALLENGES IN A NEW ARCHIVE IASSIST2003 Ottawa,
A database-driven tool to create items, variables and questionnaires NEPS Metadata Editor.
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
Codebook Centric to Life-Cycle Centric In the beginning….
Upgrading ABC News/Washington Post Data Collections Using DDI and Legacy Databases Marc Maynard The Roper Center for Public.
Reducing Metadata Objects Dan Gillman November 14, 2014.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
System Analysis and Design
Data Management: Documentation & Metadata Types of Documentation.
Grant Proposal Basics 101 Office of Research & Sponsored Programs.
Multiple Indicator Cluster Surveys Data Interpretation, Further Analysis and Dissemination Workshop Data Archiving.
Chapter 7 Designing Classes. Class Design When we are developing a piece of software, we want to design the software We don’t want to just sit down and.
C++ Object Oriented 1. Class and Object The main purpose of C++ programming is to add object orientation to the C programming language and classes are.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Simple Program Design Third Edition A Step-by-Step Approach
ITEC224 Database Programming
Word Processing Notes: Mail Merge Understand business documents.2 Mail Merge Example Letter shows Merge Fields (placeholders) Letter is Personalized.
Categories of Vocabulary Compatibility Dmitry Lenkov Oracle.
Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)
1 Data Flow Diagrams. 2 Identifying Data Flows During the analysis stage of a project it is important to find out how data flows through a system:  Where.
© 2007 by Prentice Hall 1 Introduction to databases.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
The european ITM Task Force data structure F. Imbeaux.
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions Geneva,
Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Visual Basic for Application - Microsoft Access 2003 Programming applications using Objects.
THE METADATA MODEL AND DATA PRODUCTION PROCEDURES AND DISSEMINATION Marios Fridakis, Greek Social Data Bank at EKKE John Kallas, Greek Social Data Bank.
The Data Documentation Initiative: more discussion Chuck Humphrey University of Alberta Atlantic DLI Workshop 2005, Acadia University.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Overview of Previous Lesson(s) Over View 3 Program.
1 Chapter 2 Database Environment Pearson Education © 2009.
What is a software? Computer Software, or just Software, is the collection of computer programs and related data that provide the instructions telling.
Understanding the Value and Importance of Proper Data Documentation 5-1 At the conclusion of this module the participant will be able to List the seven.
Software Specification Tools
Relational Databases.
What’s New in Colectica 5.3 Part 1
Chapter 2 Database Environment.

Analysis models and design models
Question Banks, Reusability, and DDI 3.2 (Use Parameters)
A programming language
Database Design Hacettepe University
Spreadsheets, Modelling & Databases
Prepared by Peter Boško, Luxembourg June 2012
Learning Intention I will learn about the different types of programming errors.
C++ Object Oriented 1.
Presentation transcript:

IASSIST Conference 2006 – Ann Arbor, May Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I D O S Neuchâtel – Switzerland

IASSIST Conference 2006 – Ann Arbor, May Steps  Two ways of looking at metadata  Metadata as reporting about data, information to the data user  Metadata as supporting work with data, specifically the work of the data publisher  Example  Comparing expected metadata with fielded metadata (processing)  Questions

IASSIST Conference 2006 – Ann Arbor, May Background: VarInfo  A prototype for managing metadata, used at SIDOS   Concepts further developed for the MetaDater poject, yet not integrated in final model

IASSIST Conference 2006 – Ann Arbor, May Reporting

IASSIST Conference 2006 – Ann Arbor, May I - The ‘reporting’ perspective  Metadata as a report on data construction...  Meaning (wordings)  Representativity (collection method)  Relevance (indexes)  Intention (concepts and hypotheses) ... published to meet the needs of data users  Publication: One dataset with the matching metadata  Characteristics or those metadata  Static – final state, even if successive versions  Selective – only published data are documented  ‘Passive’ – They don’t work for you, they do just describe data

IASSIST Conference 2006 – Ann Arbor, May Once upon a time...the life cycle stance  Need for a simplification of the presentation of the DDI model, which grows more and more complex  Observation: all metadata are not needed at every stage of the data definition, collection, processing and analysis processes  Response is: to split up the model into modules  Study, data collection, logical product, physical data product, physical instance, archive...)  Phase in process and/or levels of information

IASSIST Conference 2006 – Ann Arbor, May Life cycle report

IASSIST Conference 2006 – Ann Arbor, May The life cycle report: take a questionnaire  Modalities of the report  Printout of the questionnaire  File (PDF or text editor)  Oject in the DDI 3 ‘data collection module’  Variables appear as part of an other object  Data definition file (classical)  Logical Data Product module in DDI 3  Questions and variables can be linked  Textual reference or electronic  The link is descriptive  Questions belong to a questionnaire, variables to a data file

IASSIST Conference 2006 – Ann Arbor, May Life cycle support

IASSIST Conference 2006 – Ann Arbor, May II – The supporting perspective  The supporting perspective supposes a life cycle approach  No support is needed for a fixed object (data/metadata as to be published)  Support: various activities must be supported over time  Action: There is a ‘before’ and an ‘after’  It is a cycle of actions, not only a cycle of states  Use cases: you need a description of the action to get the model, which will really support that action

IASSIST Conference 2006 – Ann Arbor, May Excursus: Behind the ‘support’ idea, a system  Documenting means reporting on something  Only needed : a format (e.g. DDI 2)  Supporting work means having a system capable of action  Store (database)  Procedures (application)  A data model including elements to control procedures ... various states of the data and metadata (not only versions!)  A process model, defining the steps to be gone

IASSIST Conference 2006 – Ann Arbor, May Rescuing endangered metadata (a use case)  Data publishers (archives) often get metadata and data in a poorly coordinated way  Some version of a printed questionnaire  A data file the primary researcher worked with (constructions, recodes, badly documented variables)  Primary researchers may get from the data collector a data file which does not match the questionnaire  Variations in variable names, codes, variables lists  Both need a consistent data / metadata set  Matching information with a pencil and paper method may be very time-consuming and leaves nothing to be of any further use

IASSIST Conference 2006 – Ann Arbor, May Introducing: Expected metadata The Q/V  Questions imply a variable definition  you ask a question to get a specific kind of measure. The basic metadata unit is not just a question, but a question & variables element  Those variable definitions have the status of expectations  The link between a question and the expected variables is an organic, not a casual one. Q and expected V’s belong together  The link between the fielded and the expected variables (and hence the questions) is to be assessed  Consistent variable names?  All expected variables present?  Are there additional fielded variables?  The link between a question and the fielded variables is composed of an organic and an assessed part

IASSIST Conference 2006 – Ann Arbor, May The schema Q V V V Questions and expected variables V V V V V Fielded variables Organic relationships Assessed relationships

IASSIST Conference 2006 – Ann Arbor, May Data processing use case: the setting  Given:  System, Study, Questions & expected variables  A semi-documented data file of the SPSS kind, coming from the field  Metadata construct:  Two distinct stores for variable level metadata Expected metadata, expressed as a question and response categories or another kind of variable definition Fielded metadata, expressed as a file definition  Tables establishing correspondence between expected and actual metadata, where a mismatch occurs Establishe mediated match Define correction

IASSIST Conference 2006 – Ann Arbor, May Data processing: the procedures  Identify mismatches  Variable names (lists of non-matching names)  Values of coded variables: lists of non-matching codes; example: list of values in a data file, which are not defined in the variable definition as expected example  Correct mismatches  Variable names Variable names  Values of coded variables Values of coded variables  Run corrections  Procedure depends on the data store used  SPSS files: the program computes and executes a syntax filesyntax file

IASSIST Conference 2006 – Ann Arbor, May  Sometimes, it is the expectations, which have to be amended...  The same information is used for  correction (supporting)  documentation of the correction (reporting)  There is no additional reporting work to do (‘documentation’)  Just process, the process will leave a trace (‘documentation’)

IASSIST Conference 2006 – Ann Arbor, May Expected metadata: Answer categories directly related to variable labels  The Q/V concept integrates answer categories (questions) and variable labels (variable definitions)  Functionally equivalent  Only difference: length, because of limited store for labels  Answer categories and expected labels:  Answer categories should be the labels if they don’t exceed the allowed length  Either lets store all short versions, and long versions only if necessarystore all short versions ...or store answer categories of any lenght, and additional short versions if the answer category is too long  Possible action: label any data file with expected labels (instead of « correcting the file »)

IASSIST Conference 2006 – Ann Arbor, May Closing questions  Shall we stay with reporting metadata, or add supporting metadata?  Which use cases are central enough?  Can we, as a small community, manage the way from the format to the system?  Which organisation, which funding?

IASSIST Conference 2006 – Ann Arbor, May Next generation support