From Gutenberg to Berners-Lee The need for metadata Ed Simons.

Slides:



Advertisements
Similar presentations
Digital Library Service at Higher Education in India
Advertisements

The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
EDUCATION DATABASES: OVERVIEW. Primary Journal Databases Available for Education Education specific: ProQuest Education Journals Professional Development.
Academic or Student Registration and Information System (ARIS)
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Project Proposal.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Metadata in Research Information Conclusions / Take Home Messages Ed Simons, President of euroCRIS eurocris Seminar – Brussels – September 10, 2013.
Task Group Cerif/CRIS-IR. TG Cerif/CRIS-IR Ed Simons, Radboud University Nijmegen (NL) Projectleader academic information systems development. -Initiator.
Building The Rare book Collection at Rijeka University Library in the Digital Age Ines Cerovac, Senka Tomljanović, Rijeka University Library Seminar The.
One Size Fits Small? A Few Thoughts on Web Discovery Simon Perry, Systems Librarian, IT Carlow
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Conceptual modelling. Overview - what is the aim of the article? ”We build conceptual models in our heads to solve problems in our everyday life”… ”By.
Eurostat J OINT UNECE/OECD/E UROSTAT MEETING OF THE GROUP OF EXPERTS ON BUSINESS REGISTERS 3-4 September 2013, Geneva Session 1: Economic globalisation.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Atif Iqbal, R. K. Bagga.  Appropriate mechanism for good governance with the involvement of Information Technology in the system of the government and.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Dr. Jūratė Kuprienė Director for innovations and infrastructure development Workshop: Information services for research process , Rīga Research.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
Linking resources Praha, June 2001 Ole Husby, BIBSYS
Outcome Based Evaluation for Digital Library Projects and Services
1 CrossRef - a DOI Implementation for Journal Publishers January 29, 2003 CENDI Workshop.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
The physics departments and documents network EUNIS Conference, Bled, June 29 th -July 2 nd 2004 Michael Schlenker: Dynamic.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
DRIS/CERIF/SharePoint Sergey Parinov BP TG September 2009.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
BAIGORRI Antonio – Eurostat, Unit B1: Quality; Classifications Q2010 EUROPEAN CONFERENCE ON QUALITY IN STATISTICS Terminology relating to the Implementation.
CERIF-CRIS / IR Task Group Putting it on the Rails Ed Simons Radboud University Nijmegen The Netherlands.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
1 of 27 How to invest in Information for Development An Introduction Introduction This question is the focus of our examination of the information management.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Current Research Information Systems in Greece Dr Nikos Houssos National Documentation Centre (EKT) / National Hellenic Research Foundation (NHRF)‏ Dr.
Joint Information Systems Committee Supporting Higher and Further Education Catherine Grout Assistant Director for Development, JISC/DNER
Task Group CRIS–IR. Ed Simons Radboud University Netherlands Initiator and project leader for the development of METIS, the CRIS used by all 13 universities.
National Library of the Czech Republic as End-User of the Research Networks Adolf Knoll deputy director
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Jane Hill Directory Services Product Manager, Harvard University.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
DANS is an institute of KNAW and NWO Data Archiving and Networked Services Measurement of research impact in OpenAIRE 2020: via text mining or the CRISs?
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
ACADEMIC INFORMATION AND REGISTRATION SYSTEM OPUS-College.
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
CRIS and repositories: NARCIS Elly Dijk KNAW Research Information EuroCRIS meeting, Moscow (Rusland), 9 October 2008.
Part of the Cronos Group 4C/kZen 4 th EcoTerm meeting, Vienna, April 18, 2007 Jef Vanbockryck Research & Development “Risk Assessment ontologies and data.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
The important role of CRIS’s for registration and archiving of research data Case study: Radboud University (NL) in cooperation with the DANS EASY archive.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Towards more flexibility in responding to users’ needs
From the old to the new… Towards better resource discoverability
RECENT TRENDS IN METADATA GENERATION
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Business Process Measures
Outline Pursue Interoperability: Digital Libraries
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Presenter: Dr Antonios Stasis
Open Science: the crucial importance of metadata
Oya Y. Rieger Cornell University Library May 2004
Introduction to Information Retrieval
Presentation transcript:

From Gutenberg to Berners-Lee The need for metadata Ed Simons

Some personal data Dr. Eduard (Ed) Jozef Simons Senior Staff Radboud University Nijmegen: Board Member EuroCRIS: IT-project manager: ◦ - International Federation of Catholic Universities (IFCU): ◦ - OPUS-College project Mozambique/Zambia: ◦ - KE-interoperabillity project:

Intro: the nature of the presentation This presentation concentrates on metadata for (research) publications, and as such only covers a part of the Academic Information Domain, but nevertheless the most important part from the point of view of this conference. The presentation will be “generalistic” in nature, so will have more the character of a vision on current and future developments than deal with concrete problems and technologies regarding the use of publication metadata. Along the way a new concept will be introduced which possibly could be of use or inspiration for future discussions on this matter.

Importance of metadata Metadata allow us to describe and classify information in a systematic way, and as such they are indispensable for searching and finding academic information and outcomes of research (publications). Metadata are often called “data about data”, which is an appealing, “catchy” sentence, but perhaps a better description would be “data about objects” in contrast to “data of objects”.

Is there really a need for (more) metadata? As the title of this presentation suggests, there seems to be a need for (more, appropriate) metadata regarding publications, resulting from the shift from paper (printed) to electronic (on line) publication of research information.

Is there really a need for (more) metadata? In contrast with the gentlemen in the previous sheet, I am of the opinion that our on line, networked research information space positively requires an extended set of metadata (compared to the ones up to now commonly used in IR’s) in order to be optimally used. Why is this so and what should then these new or extended metadata be? To answer this questions, let’s first take a closer look on this allegedly important shift, mentioned in the presentation’s title.

We are in transit.... Currently we are in a transition from the “Gutenberg Era” (started in the 1450's) – the past 5 centuries of paper-based publication of research information - to the “Berners-Lee Era” (1980’s) of on- line storage and supply of information. Characteristic for the Gutenberg Era was (is) that research information in a given field is laid down in a collection of individual “stand-alone” research information objects (publications), not directly linked – or better linkable - to one another. With the coming into being of the Berners-Lee Era, say the Internet, the possibility arises to go beyond the individual publications, and concentrate on networks of publications as a new focal point. We’ll see further in this presentation that such networks and more notably a specific type of network of publications, brings significant added value to a user looking for scientific information.

The mind of passengers in transit (still) lives in the two worlds…. As is the case for any transition period, also this one already implements parts of the new era, but still bears characteristics of the previous era. Applied to research information: publications – as in the IR's - are already put on line (according to the Berners-Lee Era), but with metadata which still reflect and stem from the Gutenberg Era. A big challenge for the near future will be to complete the transition, in other words: implement the metadata which are fully suited for and optimally use the possibilities of the Internet.

A Tale of Two Cultures… Before going into more detail on the needed new and extra metadata for the future, let’s first focus a bit on the two communities, which deal with research information metadata, and their (different) cultures, the Library community on the one hand and the CRIS- community on the other. Library Community: creators of repositories and using metadata stemming from the paper library culture, in principle: an electronic version of the old catalogue cards. CRIS-Community: creators of administrative or management research information systems from within a culture which focus on the context of research and its publications (e.g. the project it results from, the unit who produced it, etc…). Both communities, up to very recently, were isolated from each other, not communicating with one another and often even not knowing of each other’s existence. Luckily in the last few years this is changing and gradually the two cultures are coming together and discovering each other, as this workshop illustrates. Important difference: The CRIS’s set of metadata is substantially more extended than that of the Library Community’s Repositories, which is fully understandable since the former not only deal with publication metadata as such, but also (have to) take the context of the publication into account.

From SIO to NETRINO The “traditional” metadata as used by the Repositories are still mainly focused on the search for and detection of (a collection of) stand-alone publications, not linked to one another, in short: SIO’s (Stand-alone Information Objects). In contrast to this we need metadata which are really getting the best out of (the possibilities of the) Internet, meaning metadata which are able to detect (the already previously mentioned) networks of publications and especially networks of related publications, which hold a special value to the user and could be generically called: NETRINO: Network of Related Information Objects This fact, the power of the Internet – in combination with the appropriate metadata - to detect these kind of networks of publications can hardly be overestimated. The added value lies in the term “related”. The fact that in a NETRINO the publications are, one way or another, directly related to one another (based on parameters we will discuss in a moment) enhances substantially the probability for the user that a given publication in the collection of the NETRINO will be of relevance to him/her. This as compared to a harvested list of not directly related publications.

NETRINO : a new focal point in the Research Information Space. When it comes to research publications there are two types or sets of metadata on which networks of related information objects can be based: citation metadata on the one hand and context metadata on the other. Corresponding to this there are two types of publication netrino’s. Citation NETRINO: based on the mutual (incoming and outgoing) citations of the publications involved. To be able to detect this kind of networks, a prerequisite is that the citation data of the publications is available in a structured way (e.g. a database) or a “citation harvesting” tool exists which automatically can extract the citation information from the publications. Context NETRINO: based on parameters regarding the research context in which the publication came about, e.g.: co- researchers, co-authors, the research project the publication resulted from, etc... The metadata needed for the detection of these context networks is available in CRIS's

Graphical representation of a Citation NETRINO

Graphical representation of a Context NETRINO

Citation and context compared

Where should the metadata be stored? Given: ◦ It’s extensive metadata set, including already all the context metadata needed. ◦ It’s well thought-out architecture and metadata model. ◦ It’s granular, relational (database) structure. The metadata needed for the description and detection of NETRINO’s should be stored in a CERIF-CRIS, which then should be the driving entity for the Repositories. So, the model to develop in the future should be that of “CRIS-driven Repositories”.

METIS: firing at ISI Within the framework of the Dutch research information system METIS, an experiment is currently being carried out to automatically and continuously get citation data from ISI concerning publications registered in METIS. For this 24/7 and continuously requests are “fired” at a web service of ISI by which citation data on the publications can automatically be added to the metadata already in METIS for the publication in question. For the moment only quantitative citation data (numbers of citations) are obtained this way, but in the future also information on the publications themselves could be harvested. This is an example of registering citation data in a CRIS from which a Citation NETRINO could be created.

Work to do… In order to set up an efficient system for the detection of NETRINO ’s, the following steps and activities are in (my view) necessary: Define an optimal set of context metadata (what should be on the list and what not) Make sure these metadata are stored in your CRIS, and create an automatic transfer of these metadata to the repositories. (CRIS-driven repositories). Create a metadata standard for interoperability which allows flexible granularity (cfr. the KE CRIS-OAR project). Create unique identifiers for the main objects in the research information space (being the core CERIF entities): apart from DOI and DAI (see the NARCIS project in the NL), we need probably also DII’s and DPI’s (Digital Institution Id’s and Digital Project id’s) Make the automatic detection of citation metadata within publications possible (cfr. the previously mentioned experiments going on in METIS these days). Work out / implement various controlled vocabularies (content classifications) for the information objects in the Academic Information Domain (AID). In other words: work out (or fill) the CERIF semantic layer.

Summarizing The transition from paper-based to on line publications offers the possibility to concentrate on networks of publications as a meaningful unit of information, instead of just the individual publications. These networks, and especially a typical kind, notably NETRINO (Network of Related Information Objects) brings important added value to the user, since the probability that a publication in a collection of a NETRINO is of significance for him/her is substantial. One could distinguish between two types of networks of related information objects: Citation NETRINO and Context NETRINO. The metadata for the latter are commonly stored in a CRIS. In order for NETRINO’ s to work a definition of a standardized metadata set for Repositories is necessary, as well as a mechanism (technology) to automatically subtract citation metadata from publications.

Thank you for your attention! (Images taken from “Head First” IT-training books: “Head First Java” and “Servlets and JSP”)