Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation.

Slides:



Advertisements
Similar presentations
IASSIST 2007 Montreal, May , 2007 Session A2 Open Data and the Common Good Technology Solutions for Difficult Challenges Pascal Heus Open Data Foundation.
Advertisements

Workshop on Metadata Standards and Best Practices November th, 2007 Session 2 Metadata specifications for socio-economic science and supporting initiatives.
Workshop on Metadata Standards and Best Practices November th, 2007 Session 3 Researcher Metadata in RDCs Pascal Heus Open Data Foundation
11th Annual Federal CASIC Workshops Washington, DC, March 6 - 8, 2007 Session WP4 Metadata challenges and solutions for socio-economic data Pascal Heus.
Workshop on Metadata Standards and Best Practices November th, 2007 Session 4 The Data Documentation Initiative Technical Overview Pascal Heus Open.
10th Annual Open Forum for Metadata Registries New York, NY, July 9-11, 2007 Track 3 – Future Directions Metadata challenges and solutions for socio-economic.
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Status on the Mapping of Metadata Standards
ODaF Europe 2008 Colchester, UK, April 14-15, 2008 Metadata in social science and the Open Data Foundation Pascal Heus Open Data Foundation
ICPSR-SRO Shared Data Model Project Mary Vardigan Director, DDI Alliance.
ODaF Europe 2009 Virtual Research and Collaborative Center Pascal Heus, Open Data Foundation Tim Mulcahy, National Opinion Research Center
National Institute of Statistics, Geography and Informatics (INEGI) Implementation of SDMX in Mexico.
Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.
International Household Survey Network (IHSN) Microdata Management Toolkit Trevor Croft MICS3 Data Archiving, Dissemination and Further.
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Archiving.
Chuck Humphrey University of Alberta RDC CFI Projects Building the next generation of metadata tools.
Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
Introduction to SDMX Seminar Eurostat/ECLAC 02 October 2012 August Götzfried Head of Unit, Eurostat B5 Management of statistical data and metadata.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
6th MSDI Working Group Meeting
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Virtual Center for Collaborative Research (ViCtoR) IASSIST 2010 – Session D3: Virtual Research Environments Pascal Heus, Metadata Technology North America.
INDEPTH Network INDEPTH Data Systems Kobus Herbst.
The International Household Survey Network IHSN IHSN Secretariat PARIS21 Steering Committee, 14 November 2007.
The European Statistical System Vision Infrastructure Programme Daniel Defays, Director Directorate B, Eurostat Eurostat Workshop on the Modernisation.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
World Bank, Africa Region, Africa Household Survey Databank - The World Bank - Africa.
Summary of workshop Workshop on Writing Metadata for Development Indicators Lusaka, Zambia 30 July – 1 August 2012.
Changing the culture: Ethiopia’s commitment to dissemination and the multi-media approach By Yakob Mudesir Seid
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Overview of SDMX: Statistical Data and Metadata eXchange Technical and Content Standards for Statistical Data Ann McPhail, Division Chief Statistics Department,
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
IHSN International Household Survey Network Strategy for the Development of Data: Improve the Availability, Accessibility, and Quality of Survey Data Mahesh.
CASE STUDY: STATISTICS NORWAY (SSB) Jenny Linnerud and Anne Gro Hustoft Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) Luxembourg.
SERPent Project Secure Epidemiology Research Platform January – October 2010 Virtual Research Environment Rapid Innovation Project Funded.
The Brain Project – Building Research Background Part of JISC Virtual Research Environments (Phase 3) Programme Based at Coventry University with Leeds.
IFAP Special Event: Information and Knowledge for All, Emerging Trends and Challenges Information Preservation 4000 Years of Traditions Challenged by Digital.
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, for UNECE.
International Initiatives International Household Survey Network and Accelerated Data Program Olivier Dupriez World Bank / IHSN.
Catawba County Board of Commissioners Retreat June 11, 2007 It is a great time to be an innovator 2007 Technology Strategic Plan *
ADP SUPPORT IN UGANDA BUILDING A NATIONAL DATA ARCHIVE Presented by Kizito Kasozi Director Information Technology Uganda Bureau of Statistics PARIS21.
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
OVERVIEW OF ARCHIVING OF MICRODATA SILAS M. MULWA Kenya National Bureau of Statistics United Nations Regional Seminar on Census Data Archiving for Africa.
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
SDMX IT Tools Introduction
1 The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
1 Joint UNECE/EUROSTAT/OECD METIS Work Session (Geneva, March 2010) The On-Going Review of the SDMX Technical Specifications Marco Pellegrino, Håkan.
ΕΚΤ Access to Knowledge ΕΚΤ Access to Knowledge R&D Statistics Information System: An Interoperability Tail between CERIF and SDMX Dimitris Karaiskos Dimitrios.
Setting Up a National Data Archive: The Ugandan Experience
International Household Survey Network
System Overview Training on the use of the new countrystat
System Overview Training on the use of the new countrystat
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
SDMX in the S-DWH Layered Architecture
SDMX: an Overview Abdulla Gozalov UNSD.
Statistical Information Technology
Capitalising on Metadata
The role of metadata in census data dissemination
IASSIST 2007 Montreal, May , 2007 Session A2 Open Data and the Common Good Technology Solutions for Difficult Challenges Pascal Heus Open Data.
The Role of Metadata in Census Data Dissemination
Palestinian Central Bureau of Statistics
Presentation transcript:

Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation pheus@opendatafoundation.org http://www.opendatafoundation.org

Open Data Foundation – IZA 2007/11 Outline PART 1: General issues & ODaF Needs and challenges in statistical data and metadata management Metadata and XML solutions Selecting specifications Need for tools Open Data Foundation PART 2: RDC Specific issues Metadata in RDCs Solutions and benefits Tools and ongoing initiatives Conclusions / Q&A http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 What is Metadata? Common definition: Data about Data Labeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf Unlabeled stuff Provide descriptive information about of an object or concept Properties, characteristics (in XML: elements and attributes) It does not alter the content or nature of the object It can be carried around without having to share the underlying object: catalogs, cars, libraries, etc. It is usually public domain (important for sensitive data) http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Managing data and metadata is challenging! We are in charge of the data. We support our users but also need to protect our respondents! We have an information management problem We want easy access to high quality and well documented data! We need to collect the information from the producers, preserve it, and provide access to our users! General Public Policy Makers Sponsors Media/Press Academic Business Government Producers Users Librarians Many actors & communities with different needs and perspectives Users: want open access to high quality and well documented data. Need discovery tools. Public sector, private sector, academics Producers: prepare the data and need to comply with privacy laws Data Archives: need to interface with both communities Policy Makers: need data to measure results and impact and to plan ahead Sponsors: want to support the most relevant data collection Public and Media: want access to simple, easy to understand statistics Solving Information management issues is what ICT & XML are for http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 XML to the rescue! XML is driving today’s web service oriented architecture of the Internet and Intranets Using XML, we can capture, structure, transform, discover, exchange, query, edit and secure metadata and data XML is platform & language independent and can be used by everyone XML is both machine and human readable XML is non-proprietary, public domain and many open tools exist Domain specific standards are available! Technologies Capture: XML Structure: XSchema Transform: XSL, XSLT, XSL-FO Discover: Registries Exchange: SOAP, REST, etc. Query: XPath, XQuery Edit: XForms Secure: WS-Security (OASIS), etc. http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

XML Technical Overview Is eXtensible and is concerned with capturing information (unlike HTML who is not extensible a and focuses on representation) It’s a Markup system Is a Language with a syntax and grammar XML is also a complete set of technologies for managing information/knowledge: Capture: metadata and/or data can be expressed using the XML language. Structure: Document Type Definition (DTD) and XSchema are use to validate an XML document by defining namespaces, elements, rules. Transform: XML separates the metadata storage from its presentation. XML documents can be transformed into something else, like HTML, PDF, XML, other) through the use of the Discover: using registries and/or native or relational databases Exchange: XML separates the metadata storage from its presentation. XML documents can be transformed into something else, like HTML, PDF, XML, other) through the use of the eXtensible Stylesheet Language, XSL Transformations (XSLT) and XSL Formatting Objects (XSL-FO) Search: Very much like a database system, XML documents can be searched and queried through the use of XPath. There is no need to create or maintain tables, indexes or define relationships! Manage: Specialized software and can be used to create and edit XML documents. The XForms specification can also be used http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 XML Solutions XML Specs Well documented data, here we come! Great, I can provide public metadata! Academic Use our specifications and your will be happy! It will harmonize everything. Producers Users Now we can talk to each other! Government Sponsors Librarians Business A new actor: the specification/standard settings agencies, consortiums, alliances, etc. Use XML specifications will solve your problems User, Producers and Librarians have many reasons to cheer But…. Policy Makers General Public Media/Press http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Let’s use XML, but…. XML Specs Open Data Foundation Producers ? Users Librarians Which specifications should we adopt? How do we do this? Where are the tools and guidelines?  Perfect, let’s use XML But… Which XML specification should we adopt? Where are the tools? How do we do this?  The Open Data Foundation has been established to answered these issues and needs http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation (ODaF) US Based non-profit organization, established 2006 Directors, advisors and managers from statistical and ICT communities Project oriented Mission Focus on socio-economic data Adoption of global metadata standards Coordinated development of open-source tools Capacity building Improving data and metadata accessibility and overall quality Operate at the global level GLOBAL: Same issues present in all countries/agencies XML solutions are global solutions Data without borders: Global understanding of socio-economic issues requires global data (population/economic growths) Directors: Ernie Boyko - President of the International Association for Social Science Information Service and Technology (IASSIST) Rune Gloersen - Head of Information Technology, Statistics Norway Robert Glushko, PhD - Member of the OASIS Board of Directors, and the founder and leader of Berkeley's Center for Document Engineering Julia Lane - Senior Vice President Director, Economics, Labor and Population, National Opinion Research Center (NORC) / University of Chicago Advisors: Sandra Cannon - Board of Governors of the Federal Reserve System Gilles Collette - Visual Communications, Pan-American Health Organization Daniel Gillman - US Bureau of Labor Statistics Eduardo Gutentag - Member of the the OASIS Board of Directors Paul Johanis - Statistics Canada Graeme Oakley - Australian Bureau of Statistics Ken Miller - UK Data Archive / Economic and Social Data Service Juraj Riecan - United Nations Economic Commission for Europe (UNECE) Gerard Salou - European Central Bank Professor Bo Sundgren, Ph.D - Statistics Sweden Wendy Thomas - Minnesota Population Center, University of Minnesota Mary Vardigan - Inter-University Consortium for Political and Social Research Management Team: Arofan Gregory - specialist in SGML and XML-based open standards in the areas of publishing, e-commer ce, and statistics. Recent work includes participation in ebXML and related initiatives, and acting as a technical expert for SDMX and DDI. Pascal Heus - an experienced IT specialist with a focus in microdata management systems. He has worked with international agencies such as the World Bank and the International Household Survey Network, and with national statistical agencies in developing countries. He is also active in the DDI initiative. Chris Nelson - a modeling specialist who was a significant contributor to the OMG's Common Warehouse Metamodel, he has also worked for many years with GESMES (a statistical standard in EDIFACT syntax) and as a technical expert in the SDMX initiative. Jostein Ryssevik - active within the DDI community, he was a key player in the development and sucess of Nesstar, the pre-eminent DDI-based toolkit. http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Selecting XML specifications A single specification is not enough! XML specifications commonly focus on a specific area of knowledge and/or set of functionalities Cannot answer the needs of all actors XML mappings between specifications are possible Information can be converted from one domain to another and be carried across communities Which ones should we use? Fit for purpose Widely accepted and supported Can be mapped to a cross-domain family Mappings: remember that , XML is easy to convert to another XML, it’s build in the technology http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

A suggested set for socio-economic data Statistical Data and Metadata Exchange (SDMX) Macrodata, time series, indicators, registries http://www.sdmx.org Data Documentation Initiative (DDI) Microdata (surveys, studies) http://www.ddialliance.org ISO 11179 Semantic modeling, concepts, registries http://metadata-standards.org/11179/ ISO 19115 Geography http://www.isotc211.org/ Dublin Core Resources (documentation, images, multimedia) http://www.dublincore.org This is a set of specifications for socio-economic data When it comes to implementation, these are complemented with commonly used ICT specifications such as the XML family of recommendations, SOAP, OASIS WS-* security specifications, SVG, etc. http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 The need for Tools We set specifications and standards. Tools are not our mandate We produce data not tools! We don’t have the expertise. XML Specs Open Data Foundation Producers Users Librarians We preserve and disseminate data not software! We don’t have the expertise Software application and guidelines are crucial for the adoption of XML specifications But very few organizations are developing such tools: Lack of mandate: most of the agencies are not in the business of developing software Lack of expertise: even if the would want to, they seldom have the ICT capacity to do so Lack of coordination: agencies are often locked into their own world and are not particularly interested in the big picture. Someone must be there to coordinate efforts and ensure compatibility Lack of funding: since the mandate is not there, the money rarely follows. We need a way to raise awareness and funding for tool development Liability issues: agencies do not want to be held responsible  Open Data Foundation Coordinated development of open source tools in an harmonized framework We use data and software but we don’t build tools! We don’t have the expertise http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

The need for Tools Mandated to develop tools Provide cross-domain expertise in ICT and statistics Provide umbrella for coordinated development Ensure inter-operability Outline harmonized architecture and environment Promote open source / maximize reusability Foster global registries Resources/Fund raising … Open Data Foundation Software application and guidelines are crucial for the adoption of XML specifications But very few organizations are developing such tools: Lack of mandate: most of the agencies are not in the business of developing software Lack of expertise: even if the would want to, they seldom have the ICT capacity to do so Lack of coordination: agencies are often locked into their own world and are not particularly interested in the big picture. Someone must be there to coordinate efforts and ensure compatibility Lack of funding: since the mandate is not there, the money rarely follows. We need a way to raise awareness and funding for tool development Liability issues: agencies do not want to be held responsible  Open Data Foundation Coordinated development of open source tools in an harmonized framework http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 ODaF Vision Promote and facilitate the production and use of “open data” Public metadata, high quality, fully documented, respondent protected, easy to find, accessible in accordance to statistical principles and legislations Foster a global harmonized framework Facilitate the flow of data and metadata Promotes dialog between all stakeholders The harmonized framework is the key to unlock the data Unlock the Data! http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Some Projects & Ideas Guidelines for an harmonized architecture and development environment Roadmap for tools development XML mappings Facility to host development of open source projects (GForge) Provide hosting services for agencies Implement registries / catalogs Produce training and reference material Technical support & capacity building Advocacy http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

ODaF partners / clients Statistical agencies / producers Data Archives Academic & Research communities Standard settings agencies & consortiums Governmental organizations International organizations Open source community Software developers IT Vendors A few examples: DDI Alliance US Census Interuniversity Consortium for Political and Social Research (ICPSR) National Opinion Research Center (NORC) UNESCO Institute for Statistics International Household Survey Network Food and Agriculture Organization (FAO) UN Economic Commission for Europe (UNECE) http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Growing solutions in a complex environment XML-DB Programming XSLT XPath SOAP Databases Warehouse Web SDMX Infrastructure GIS XML DDI ISO 11179 TECHNOLOGY ISO 19115 SAS METADATA DCMI Stata Excel Registries Accessibility ANALYSIS DISSEMINATION SPSS Legal Toolkit DISCOVERY Privacy Disclosure PRESERVATION Access SECURITY Blaise SDDS PRODUCTION QUALITY GDDS CSPro USE DQAF What are we concerned with? We are looking at the many dimensions of the socio-economic landscape Rooted in several communities Need to grow solutions in a very complex environment (through communities) http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Growing solutions in a complex environment XML-DB Programming XSLT XPath SOAP Databases Warehouse Web SDMX Infrastructure GIS XML DDI ISO 11179 TECHNOLOGY ISO 19115 SAS METADATA DCMI Stata Excel Registries Accessibility ANALYSIS DISSEMINATION SPSS Legal Toolkit DISCOVERY Privacy Disclosure PRESERVATION Access SECURITY Blaise SDDS PRODUCTION QUALITY GDDS CSPro USE DQAF CHALLENGE We need a set of tools that work together in an harmonized framework. This requires coordinated efforts and expertise from the various communities OPEN DATA FOUNDATION Provide cross-domain & IT expertise Coordinate and support development Knowledge sharing Capacity Building Provide global vision and guidance We need a harmonized toolbox ODaF’s role: Provide cross-domain expertise Coordinate and support development of open tools Share knowledge Capacity Building Provide a global vision http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Challenges The technology is available today The right people are available today The need and the will are there The real challenges are: Tools availability Awareness / Understanding of technology Change management Coordination & Guidance Focused resources and funding Institutional commitment Learn for the past for a better future It’s not about data, it’s about people Change management This is the biggest challenge Need to overcome resistance to change and take the first steps Collecting and compiling the right information Metadata quality control Awareness & Understanding Need to be aware that solutions exists Need to understand what the technology can do Promote ODaF and partners Coordination & Guidance Need for the right expertise Need for training, best practices, champions Focused resources and fund raising Need to commit resources to achieve this It won’t be free but certainly worth the return on investment! Investment is a small percentage of what is invested in the data production efforts Institutional commitment Success cannot be achieved individual efforts only, institutions and people need to come together Successful projects require upper management support Rapid results are possible! Learn from the past for a better future: We cannot change what has been produced & done so far but we can decide for a different future today We’ve done the best we can so far. Let’s accept that we have not been perfect, transparency is vital. Integrating new tools and techniques in the data life-cycle will improve the overall quality and usefulness Our data is about people We need to develop a sense of urgency, the sooner the better Policy makes need access to better data for better results Evidence based policies Results based framework This has serious impact on living conditions http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Summary Managing data and metadata is challenging Solutions exist to make it easier and provide better information to unlock the data Adopt a set of specifications that answer your requirements and can connect across domains DDI, SDMX, ISO 11179, Dublin Core, ISO 19115 Promote the use and development of open tools, do not work in isolation, get the appropriate expertise Open Data Foundation http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

PART 2: Metadata & RDCs

Open Data Foundation – IZA 2007/11 PART 2 RDC metadata perspective List of stakeholders / initiatives Benefits of adopting metadata Challenges Tools demo (IHSN Toolkit) http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 RDC Objectives Provide a secure environment for the researcher to perform the in depth analysis of sensitive/confidential data in a cost effective way Facilitate the capture, sharing and dissemination of research knowledge Provide feedback to the producer on data usage and quality Exchange information with other RDC’s / agencies / public Overall: benefit all stakeholders: producers, librarians, researcher, general public, etc. http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 RDC metadata Simple access to data file and codebook is insufficient. Researcher need high quality comprehensive metadata and a collaborative environment to promote dynamic research Traditionally, survey metadata has focused on archiving/preservation (current DDI 1/2.x) This however insufficient and should extended into both the survey production process and the secondary use of the data New DDI 3.0 meets such requirements RDC ideal environment for capture of researcher metadata http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

DDI 3.0 and the Survey Life Cycle A survey is not a static process It dynamically evolved across time and involves many agencies/individuals DDI 2.x is about archiving, DDI 3.0 extends to life cycle 3.0 is a modular framework available for multiple purposes (use cases) Metadata is key to comprehensive capture of knowledge http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 RDC issues Without producer metadata researchers can’t work discover data or perform efficient work Without researcher metadata producer don’t know about data usage and quality issues Other researcher are not aware of what has been done Without standards Information can’t be properly managed and exchanged between agencies or with the public Without tools: Can’t capture and preserve/share knowledge http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

When to capture metadata? The first figure outlines the various stages of a survey production process The graph on the right illustrates the amount of metadata that is typically recovered if the knowledge capture occurs after the fact (blue line) versus the amount of metadata actually generated throughout the process (red line) Metadata must be captured at the time the event occurs! Documenting after the facts leads to considerable loss of information This is true for producers and researchers http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDC Metadata Framework Researcher 1. Producer provide data & basic docs 2. Need to enhance existing metadata 3. Start capturing researcher metadata RDC 4. Knowledge grows and gets reused RDC 5. Provides usage and quality feedback to producer / RDC RDC 6. Repeat across surveys/topics 7. Metadata facilitates output Research Output 8. Public metadata facilitates data discovery / fosters global knowledge Research Metadata External users 9. Metadata exchange between agencies Public Use metadata Producers Producer/Archive Metadata In typical RDC, researchers have access to data files with minimal metadata provided by the producer (codebook) 2. This is insufficient and we must first improve on the producer/archive metadata (DDI 1/2.x) 3. Metadata should also include the research process to ensure preservation of the information and facilitate dissemination. This can be achieve with DDI 3.0 and web collaborative tools 4. As the researcher's contributions grows, so does the knowledge base and the reusable secondary metadata, leading to dynamic environment that maximizes reusability and minimizes duplication of efforts 5. This metadata also provides valuable feedback to the RDC and producer on data usage and quality, leading to better data production 6. The process can be repeated across surveys or topics, creating comparative / cross domain knowledge 7. The research output is made publicly available and metadata facilitates its production, dissemination and discovery 8. Both producer and researcher metadata can be made available publicly providing a rich knowledge and facilitating the discovery of the data 9. The process is replicated across RDC and metadata can be easily exchange between standard compliant partners Data http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Metadata Components Producer metadata: Codebook, questionnaires, reports, methodologies, processing, scripts, quality, admin, etc. Research metadata Recodes, analysis, table, scripts, papers, references, logs, quality, usage Activities, discussions, knowledge base Outputs Papers, presentations, tables Public metadata Metadata stripped out of sensitive information (summary statistics, sensitive variables, etc.) Metadata capture can be manual, semi-automated, automated http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 RDC Solutions Metadata management Adopt standards and provide researcher with comprehensive metadata Use related tools to capture research process Metadata mining and reporting utilities Collaborative environment Used web technologies to foster a dynamic research environment Connected and Remote enclaves Connect RDCs through secure networks Consider virtual data enclave or batch analysis Data disclosure Protect respondent through sound data disclosure techniques (using metadata as well) Train producers/researchers (methods and data) http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Solution Examples Simple solutions: use good practices File and variable naming conventions, sound statistical methods Comment source code Document the work Metadata solutions: DDI tools, citation database, source code level metadata capture, variable recodes, table disclosure, data quality feedback, comparability Web based collaboration environment Wiki, blogs, discussion groups, events/todo http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Benefits (1) Comprehensive data documentation Through good metadata practices, comprehensive documentation is available to the researchers Preservation, integration and sharing of knowledge Research process is captured and preserved in harmonized formats Research knowledge becomes integrant part of the survey and available to all Reduce duplication of efforts and facilitates reuse Producer gets feedback from the data users (usage, quality issues) http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 Benefits (2) Research outputs and dissemination Facilitate production of research outputs Facilitate dissemination and fosters broader visibility of research outputs Exchange of information Metadata exchange between RDC, producers, librarians Importance of public metadata for sensitive datasets Facilitate data discovery (inside and outside RDC) Advanced metadata mining / comparability http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Answering the tools challenge Metadata standards are available but there is a lack of tools for metadata management Several efforts are ongoing DDI Alliance, International Household Survey Network, UK Data archive, NORC Data Enclave, Canada RDC, Open Data Foundation DDI Foundation Tools Program, UK DExT, Canada RDC, EU Framework 7 Joint efforts will minimize costs, maximize reusability and foster tool harmonization / interoperability Open source model: availability & sustainability http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation – IZA 2007/11 RDC challenges Adopting good metadata management framework takes effort Survey metadata must first be compiled ICT capacity building and tools development Producer and researchers need to be trained Not only a technological challenge change management, training Leads to better research, shared knowledge, better user/producer dialog, improved data quality Meets the mandate of RDC http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

IHSN Toolkit Quick Demo 1 Import data and compile metadata 3 Generate HTML based CD-ROM 2 Import metadata and prepare CD-ROM http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11