Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation.

Similar presentations


Presentation on theme: "Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation."— Presentation transcript:

1 Workshop on Metadata Standards and Best Practices November 19-20th, Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation

2 Open Data Foundation – IZA 2007/11
Outline PART 1: General issues & ODaF Needs and challenges in statistical data and metadata management Metadata and XML solutions Selecting specifications Need for tools Open Data Foundation PART 2: RDC Specific issues Metadata in RDCs Solutions and benefits Tools and ongoing initiatives Conclusions / Q&A Open Data Foundation – IZA 2007/11

3 Open Data Foundation – IZA 2007/11
What is Metadata? Common definition: Data about Data Labeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, Unlabeled stuff Provide descriptive information about of an object or concept Properties, characteristics (in XML: elements and attributes) It does not alter the content or nature of the object It can be carried around without having to share the underlying object: catalogs, cars, libraries, etc. It is usually public domain (important for sensitive data) Open Data Foundation – IZA 2007/11

4 Managing data and metadata is challenging!
We are in charge of the data. We support our users but also need to protect our respondents! We have an information management problem We want easy access to high quality and well documented data! We need to collect the information from the producers, preserve it, and provide access to our users! General Public Policy Makers Sponsors Media/Press Academic Business Government Producers Users Librarians Many actors & communities with different needs and perspectives Users: want open access to high quality and well documented data. Need discovery tools. Public sector, private sector, academics Producers: prepare the data and need to comply with privacy laws Data Archives: need to interface with both communities Policy Makers: need data to measure results and impact and to plan ahead Sponsors: want to support the most relevant data collection Public and Media: want access to simple, easy to understand statistics Solving Information management issues is what ICT & XML are for Open Data Foundation – IZA 2007/11

5 Open Data Foundation – IZA 2007/11
XML to the rescue! XML is driving today’s web service oriented architecture of the Internet and Intranets Using XML, we can capture, structure, transform, discover, exchange, query, edit and secure metadata and data XML is platform & language independent and can be used by everyone XML is both machine and human readable XML is non-proprietary, public domain and many open tools exist Domain specific standards are available! Technologies Capture: XML Structure: XSchema Transform: XSL, XSLT, XSL-FO Discover: Registries Exchange: SOAP, REST, etc. Query: XPath, XQuery Edit: XForms Secure: WS-Security (OASIS), etc. Open Data Foundation – IZA 2007/11

6 XML Technical Overview
Is eXtensible and is concerned with capturing information (unlike HTML who is not extensible a and focuses on representation) It’s a Markup system Is a Language with a syntax and grammar XML is also a complete set of technologies for managing information/knowledge: Capture: metadata and/or data can be expressed using the XML language. Structure: Document Type Definition (DTD) and XSchema are use to validate an XML document by defining namespaces, elements, rules. Transform: XML separates the metadata storage from its presentation. XML documents can be transformed into something else, like HTML, PDF, XML, other) through the use of the Discover: using registries and/or native or relational databases Exchange: XML separates the metadata storage from its presentation. XML documents can be transformed into something else, like HTML, PDF, XML, other) through the use of the eXtensible Stylesheet Language, XSL Transformations (XSLT) and XSL Formatting Objects (XSL-FO) Search: Very much like a database system, XML documents can be searched and queried through the use of XPath. There is no need to create or maintain tables, indexes or define relationships! Manage: Specialized software and can be used to create and edit XML documents. The XForms specification can also be used Open Data Foundation – IZA 2007/11

7 Open Data Foundation – IZA 2007/11
XML Solutions XML Specs Well documented data, here we come! Great, I can provide public metadata! Academic Use our specifications and your will be happy! It will harmonize everything. Producers Users Now we can talk to each other! Government Sponsors Librarians Business A new actor: the specification/standard settings agencies, consortiums, alliances, etc. Use XML specifications will solve your problems User, Producers and Librarians have many reasons to cheer But…. Policy Makers General Public Media/Press Open Data Foundation – IZA 2007/11

8 Open Data Foundation – IZA 2007/11
Let’s use XML, but…. XML Specs Open Data Foundation Producers ? Users Librarians Which specifications should we adopt? How do we do this? Where are the tools and guidelines?  Perfect, let’s use XML But… Which XML specification should we adopt? Where are the tools? How do we do this?  The Open Data Foundation has been established to answered these issues and needs Open Data Foundation – IZA 2007/11

9 Open Data Foundation (ODaF)
US Based non-profit organization, established 2006 Directors, advisors and managers from statistical and ICT communities Project oriented Mission Focus on socio-economic data Adoption of global metadata standards Coordinated development of open-source tools Capacity building Improving data and metadata accessibility and overall quality Operate at the global level GLOBAL: Same issues present in all countries/agencies XML solutions are global solutions Data without borders: Global understanding of socio-economic issues requires global data (population/economic growths) Directors: Ernie Boyko - President of the International Association for Social Science Information Service and Technology (IASSIST) Rune Gloersen - Head of Information Technology, Statistics Norway Robert Glushko, PhD - Member of the OASIS Board of Directors, and the founder and leader of Berkeley's Center for Document Engineering Julia Lane - Senior Vice President Director, Economics, Labor and Population, National Opinion Research Center (NORC) / University of Chicago Advisors: Sandra Cannon - Board of Governors of the Federal Reserve System Gilles Collette - Visual Communications, Pan-American Health Organization Daniel Gillman - US Bureau of Labor Statistics Eduardo Gutentag - Member of the the OASIS Board of Directors Paul Johanis - Statistics Canada Graeme Oakley - Australian Bureau of Statistics Ken Miller - UK Data Archive / Economic and Social Data Service Juraj Riecan - United Nations Economic Commission for Europe (UNECE) Gerard Salou - European Central Bank Professor Bo Sundgren, Ph.D - Statistics Sweden Wendy Thomas - Minnesota Population Center, University of Minnesota Mary Vardigan - Inter-University Consortium for Political and Social Research Management Team: Arofan Gregory - specialist in SGML and XML-based open standards in the areas of publishing, e-commer ce, and statistics. Recent work includes participation in ebXML and related initiatives, and acting as a technical expert for SDMX and DDI. Pascal Heus - an experienced IT specialist with a focus in microdata management systems. He has worked with international agencies such as the World Bank and the International Household Survey Network, and with national statistical agencies in developing countries. He is also active in the DDI initiative. Chris Nelson - a modeling specialist who was a significant contributor to the OMG's Common Warehouse Metamodel, he has also worked for many years with GESMES (a statistical standard in EDIFACT syntax) and as a technical expert in the SDMX initiative. Jostein Ryssevik - active within the DDI community, he was a key player in the development and sucess of Nesstar, the pre-eminent DDI-based toolkit. Open Data Foundation – IZA 2007/11

10 Selecting XML specifications
A single specification is not enough! XML specifications commonly focus on a specific area of knowledge and/or set of functionalities Cannot answer the needs of all actors XML mappings between specifications are possible Information can be converted from one domain to another and be carried across communities Which ones should we use? Fit for purpose Widely accepted and supported Can be mapped to a cross-domain family Mappings: remember that , XML is easy to convert to another XML, it’s build in the technology Open Data Foundation – IZA 2007/11

11 A suggested set for socio-economic data
Statistical Data and Metadata Exchange (SDMX) Macrodata, time series, indicators, registries Data Documentation Initiative (DDI) Microdata (surveys, studies) ISO 11179 Semantic modeling, concepts, registries ISO 19115 Geography Dublin Core Resources (documentation, images, multimedia) This is a set of specifications for socio-economic data When it comes to implementation, these are complemented with commonly used ICT specifications such as the XML family of recommendations, SOAP, OASIS WS-* security specifications, SVG, etc. Open Data Foundation – IZA 2007/11

12 Open Data Foundation – IZA 2007/11
The need for Tools We set specifications and standards. Tools are not our mandate We produce data not tools! We don’t have the expertise. XML Specs Open Data Foundation Producers Users Librarians We preserve and disseminate data not software! We don’t have the expertise Software application and guidelines are crucial for the adoption of XML specifications But very few organizations are developing such tools: Lack of mandate: most of the agencies are not in the business of developing software Lack of expertise: even if the would want to, they seldom have the ICT capacity to do so Lack of coordination: agencies are often locked into their own world and are not particularly interested in the big picture. Someone must be there to coordinate efforts and ensure compatibility Lack of funding: since the mandate is not there, the money rarely follows. We need a way to raise awareness and funding for tool development Liability issues: agencies do not want to be held responsible  Open Data Foundation Coordinated development of open source tools in an harmonized framework We use data and software but we don’t build tools! We don’t have the expertise Open Data Foundation – IZA 2007/11

13 The need for Tools Mandated to develop tools
Provide cross-domain expertise in ICT and statistics Provide umbrella for coordinated development Ensure inter-operability Outline harmonized architecture and environment Promote open source / maximize reusability Foster global registries Resources/Fund raising Open Data Foundation Software application and guidelines are crucial for the adoption of XML specifications But very few organizations are developing such tools: Lack of mandate: most of the agencies are not in the business of developing software Lack of expertise: even if the would want to, they seldom have the ICT capacity to do so Lack of coordination: agencies are often locked into their own world and are not particularly interested in the big picture. Someone must be there to coordinate efforts and ensure compatibility Lack of funding: since the mandate is not there, the money rarely follows. We need a way to raise awareness and funding for tool development Liability issues: agencies do not want to be held responsible  Open Data Foundation Coordinated development of open source tools in an harmonized framework Open Data Foundation – IZA 2007/11

14 Open Data Foundation – IZA 2007/11
ODaF Vision Promote and facilitate the production and use of “open data” Public metadata, high quality, fully documented, respondent protected, easy to find, accessible in accordance to statistical principles and legislations Foster a global harmonized framework Facilitate the flow of data and metadata Promotes dialog between all stakeholders The harmonized framework is the key to unlock the data Unlock the Data! Open Data Foundation – IZA 2007/11

15 Open Data Foundation – IZA 2007/11
Some Projects & Ideas Guidelines for an harmonized architecture and development environment Roadmap for tools development XML mappings Facility to host development of open source projects (GForge) Provide hosting services for agencies Implement registries / catalogs Produce training and reference material Technical support & capacity building Advocacy Open Data Foundation – IZA 2007/11

16 ODaF partners / clients
Statistical agencies / producers Data Archives Academic & Research communities Standard settings agencies & consortiums Governmental organizations International organizations Open source community Software developers IT Vendors A few examples: DDI Alliance US Census Interuniversity Consortium for Political and Social Research (ICPSR) National Opinion Research Center (NORC) UNESCO Institute for Statistics International Household Survey Network Food and Agriculture Organization (FAO) UN Economic Commission for Europe (UNECE) Open Data Foundation – IZA 2007/11

17 Growing solutions in a complex environment
XML-DB Programming XSLT XPath SOAP Databases Warehouse Web SDMX Infrastructure GIS XML DDI ISO 11179 TECHNOLOGY ISO 19115 SAS METADATA DCMI Stata Excel Registries Accessibility ANALYSIS DISSEMINATION SPSS Legal Toolkit DISCOVERY Privacy Disclosure PRESERVATION Access SECURITY Blaise SDDS PRODUCTION QUALITY GDDS CSPro USE DQAF What are we concerned with? We are looking at the many dimensions of the socio-economic landscape Rooted in several communities Need to grow solutions in a very complex environment (through communities) Open Data Foundation – IZA 2007/11

18 Growing solutions in a complex environment
XML-DB Programming XSLT XPath SOAP Databases Warehouse Web SDMX Infrastructure GIS XML DDI ISO 11179 TECHNOLOGY ISO 19115 SAS METADATA DCMI Stata Excel Registries Accessibility ANALYSIS DISSEMINATION SPSS Legal Toolkit DISCOVERY Privacy Disclosure PRESERVATION Access SECURITY Blaise SDDS PRODUCTION QUALITY GDDS CSPro USE DQAF CHALLENGE We need a set of tools that work together in an harmonized framework. This requires coordinated efforts and expertise from the various communities OPEN DATA FOUNDATION Provide cross-domain & IT expertise Coordinate and support development Knowledge sharing Capacity Building Provide global vision and guidance We need a harmonized toolbox ODaF’s role: Provide cross-domain expertise Coordinate and support development of open tools Share knowledge Capacity Building Provide a global vision Open Data Foundation – IZA 2007/11

19 Open Data Foundation – IZA 2007/11
Challenges The technology is available today The right people are available today The need and the will are there The real challenges are: Tools availability Awareness / Understanding of technology Change management Coordination & Guidance Focused resources and funding Institutional commitment Learn for the past for a better future It’s not about data, it’s about people Change management This is the biggest challenge Need to overcome resistance to change and take the first steps Collecting and compiling the right information Metadata quality control Awareness & Understanding Need to be aware that solutions exists Need to understand what the technology can do Promote ODaF and partners Coordination & Guidance Need for the right expertise Need for training, best practices, champions Focused resources and fund raising Need to commit resources to achieve this It won’t be free but certainly worth the return on investment! Investment is a small percentage of what is invested in the data production efforts Institutional commitment Success cannot be achieved individual efforts only, institutions and people need to come together Successful projects require upper management support Rapid results are possible! Learn from the past for a better future: We cannot change what has been produced & done so far but we can decide for a different future today We’ve done the best we can so far. Let’s accept that we have not been perfect, transparency is vital. Integrating new tools and techniques in the data life-cycle will improve the overall quality and usefulness Our data is about people We need to develop a sense of urgency, the sooner the better Policy makes need access to better data for better results Evidence based policies Results based framework This has serious impact on living conditions Open Data Foundation – IZA 2007/11

20 Open Data Foundation – IZA 2007/11
Summary Managing data and metadata is challenging Solutions exist to make it easier and provide better information to unlock the data Adopt a set of specifications that answer your requirements and can connect across domains DDI, SDMX, ISO 11179, Dublin Core, ISO 19115 Promote the use and development of open tools, do not work in isolation, get the appropriate expertise Open Data Foundation Open Data Foundation – IZA 2007/11

21 PART 2: Metadata & RDCs

22 Open Data Foundation – IZA 2007/11
PART 2 RDC metadata perspective List of stakeholders / initiatives Benefits of adopting metadata Challenges Tools demo (IHSN Toolkit) Open Data Foundation – IZA 2007/11

23 Open Data Foundation – IZA 2007/11
RDC Objectives Provide a secure environment for the researcher to perform the in depth analysis of sensitive/confidential data in a cost effective way Facilitate the capture, sharing and dissemination of research knowledge Provide feedback to the producer on data usage and quality Exchange information with other RDC’s / agencies / public Overall: benefit all stakeholders: producers, librarians, researcher, general public, etc. Open Data Foundation – IZA 2007/11

24 Open Data Foundation – IZA 2007/11
RDC metadata Simple access to data file and codebook is insufficient. Researcher need high quality comprehensive metadata and a collaborative environment to promote dynamic research Traditionally, survey metadata has focused on archiving/preservation (current DDI 1/2.x) This however insufficient and should extended into both the survey production process and the secondary use of the data New DDI 3.0 meets such requirements RDC ideal environment for capture of researcher metadata Open Data Foundation – IZA 2007/11

25 DDI 3.0 and the Survey Life Cycle
A survey is not a static process It dynamically evolved across time and involves many agencies/individuals DDI 2.x is about archiving, DDI 3.0 extends to life cycle 3.0 is a modular framework available for multiple purposes (use cases) Metadata is key to comprehensive capture of knowledge Open Data Foundation – IZA 2007/11

26 Open Data Foundation – IZA 2007/11
RDC issues Without producer metadata researchers can’t work discover data or perform efficient work Without researcher metadata producer don’t know about data usage and quality issues Other researcher are not aware of what has been done Without standards Information can’t be properly managed and exchanged between agencies or with the public Without tools: Can’t capture and preserve/share knowledge Open Data Foundation – IZA 2007/11

27 When to capture metadata?
The first figure outlines the various stages of a survey production process The graph on the right illustrates the amount of metadata that is typically recovered if the knowledge capture occurs after the fact (blue line) versus the amount of metadata actually generated throughout the process (red line) Metadata must be captured at the time the event occurs! Documenting after the facts leads to considerable loss of information This is true for producers and researchers Open Data Foundation – IZA 2007/11

28 RDC Metadata Framework
Researcher 1. Producer provide data & basic docs 2. Need to enhance existing metadata 3. Start capturing researcher metadata RDC 4. Knowledge grows and gets reused RDC 5. Provides usage and quality feedback to producer / RDC RDC 6. Repeat across surveys/topics 7. Metadata facilitates output Research Output 8. Public metadata facilitates data discovery / fosters global knowledge Research Metadata External users 9. Metadata exchange between agencies Public Use metadata Producers Producer/Archive Metadata In typical RDC, researchers have access to data files with minimal metadata provided by the producer (codebook) 2. This is insufficient and we must first improve on the producer/archive metadata (DDI 1/2.x) 3. Metadata should also include the research process to ensure preservation of the information and facilitate dissemination. This can be achieve with DDI 3.0 and web collaborative tools 4. As the researcher's contributions grows, so does the knowledge base and the reusable secondary metadata, leading to dynamic environment that maximizes reusability and minimizes duplication of efforts 5. This metadata also provides valuable feedback to the RDC and producer on data usage and quality, leading to better data production 6. The process can be repeated across surveys or topics, creating comparative / cross domain knowledge 7. The research output is made publicly available and metadata facilitates its production, dissemination and discovery 8. Both producer and researcher metadata can be made available publicly providing a rich knowledge and facilitating the discovery of the data 9. The process is replicated across RDC and metadata can be easily exchange between standard compliant partners Data Open Data Foundation – IZA 2007/11

29 Open Data Foundation – IZA 2007/11
Metadata Components Producer metadata: Codebook, questionnaires, reports, methodologies, processing, scripts, quality, admin, etc. Research metadata Recodes, analysis, table, scripts, papers, references, logs, quality, usage Activities, discussions, knowledge base Outputs Papers, presentations, tables Public metadata Metadata stripped out of sensitive information (summary statistics, sensitive variables, etc.) Metadata capture can be manual, semi-automated, automated Open Data Foundation – IZA 2007/11

30 Open Data Foundation – IZA 2007/11
RDC Solutions Metadata management Adopt standards and provide researcher with comprehensive metadata Use related tools to capture research process Metadata mining and reporting utilities Collaborative environment Used web technologies to foster a dynamic research environment Connected and Remote enclaves Connect RDCs through secure networks Consider virtual data enclave or batch analysis Data disclosure Protect respondent through sound data disclosure techniques (using metadata as well) Train producers/researchers (methods and data) Open Data Foundation – IZA 2007/11

31 Open Data Foundation – IZA 2007/11
Solution Examples Simple solutions: use good practices File and variable naming conventions, sound statistical methods Comment source code Document the work Metadata solutions: DDI tools, citation database, source code level metadata capture, variable recodes, table disclosure, data quality feedback, comparability Web based collaboration environment Wiki, blogs, discussion groups, events/todo Open Data Foundation – IZA 2007/11

32 Open Data Foundation – IZA 2007/11
Benefits (1) Comprehensive data documentation Through good metadata practices, comprehensive documentation is available to the researchers Preservation, integration and sharing of knowledge Research process is captured and preserved in harmonized formats Research knowledge becomes integrant part of the survey and available to all Reduce duplication of efforts and facilitates reuse Producer gets feedback from the data users (usage, quality issues) Open Data Foundation – IZA 2007/11

33 Open Data Foundation – IZA 2007/11
Benefits (2) Research outputs and dissemination Facilitate production of research outputs Facilitate dissemination and fosters broader visibility of research outputs Exchange of information Metadata exchange between RDC, producers, librarians Importance of public metadata for sensitive datasets Facilitate data discovery (inside and outside RDC) Advanced metadata mining / comparability Open Data Foundation – IZA 2007/11

34 Answering the tools challenge
Metadata standards are available but there is a lack of tools for metadata management Several efforts are ongoing DDI Alliance, International Household Survey Network, UK Data archive, NORC Data Enclave, Canada RDC, Open Data Foundation DDI Foundation Tools Program, UK DExT, Canada RDC, EU Framework 7 Joint efforts will minimize costs, maximize reusability and foster tool harmonization / interoperability Open source model: availability & sustainability Open Data Foundation – IZA 2007/11

35 Open Data Foundation – IZA 2007/11
RDC challenges Adopting good metadata management framework takes effort Survey metadata must first be compiled ICT capacity building and tools development Producer and researchers need to be trained Not only a technological challenge change management, training Leads to better research, shared knowledge, better user/producer dialog, improved data quality Meets the mandate of RDC Open Data Foundation – IZA 2007/11

36 IHSN Toolkit Quick Demo
1 Import data and compile metadata 3 Generate HTML based CD-ROM 2 Import metadata and prepare CD-ROM Open Data Foundation – IZA 2007/11


Download ppt "Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation."

Similar presentations


Ads by Google