NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.

Slides:



Advertisements
Similar presentations
Institutional Repositories Workshop Universiteit Maastricht 4 October 2006.
Advertisements

IRUS-UK: Improving understanding of the value and impact of institutional repositories Ross MacIntyre, Mimas Service Manager Munin Conference, November.
A Registry of Collections and their Services: from Metadata to Implementation Ann Apps MIMAS, The University of Manchester, UK.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
NEEO Workpackage 5 NEEO Workpackage Leader Meeting - 3 Warwick, UK 3 September, 2009 Benoit PAUWELS.
NEEO Workpackage 5 NEEO Project Meeting - 4 Leuven, Belgium March 5th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
NEEO Workpackage 5 NEEO Project Meeting - 5 Geneva, Switzerland 22 June, 2009 Benoit PAUWELS.
NEEO Technical Workshop 2 DIDL/MODS implementation Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
1 herbert van de sompel CNRI meeting June Herbert Van de Sompel Oren Beit-Arie [edited version with revised terminology, as a result of discussions.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
NEEO Workpackage 5 NEEO Project Meeting - 6 Paris, FR 26 November, 2009 Benoit PAUWELS.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
NEEO Workpackage 5 NEEO WorkPackage Leader Meeting - 2 Sciences Po, Paris January 16th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. 1.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
World Bank: Microdata Library Development Data Group.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
1 Chuck Koscher, CrossRef New Developments Relating to Linking Metadata Metadata Practices on the Cutting Edge May 20, 2004 Chuck Koscher Technology Director,
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. Revised 1/12/2015 by William Pegram 1.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
MPEG-21 : Overview MUMT 611 Doug Van Nort. Introduction Rather than audiovisual content, purpose is set of standards to deliver multimedia in secure environment.
IESR-PEPC IESR. PEPC2004, IESR: Providing a Catalogue of Resources for Portals Ann Apps MIMAS, University of Manchester.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1.
European Endeavor Users Group Meeting Helsinki, Sept Esa-Pekka Keskitalo, System Analyst Helsinki University Library OpenURL 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The DPubS Development Project: Building an Open Source Electronic Publishing System David Ruddy Cornell University Library.
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
Emerging Uses for the OpenURL Framework Ann Apps and Ross MacIntyre MIMAS, The University of Manchester.
Workshop 1.4: ESPON Database ESPON Internal Seminar November 2011 Kraków,Poland ESPON M4D Project - LIG (Grenoble Computer Science Lab) Partner Jérôme.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
What's New in Kinetic Calendar 2.0 Jack Boespflug Kinetic Data.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.
JISC Information Environment Service Registry (IESR) Ann Apps MIMAS, The University of Manchester, UK.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Economists Online researchers and libraries collaborate. A subject-specific service model. Benoit Pauwels Université Libre de Bruxelles.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
Economists Online as a building block of a VRE solution OAI6 Conference, Geneva 18 June, 2009 Benoit PAUWELS - Université Libre de Bruxelles.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
3 Copyright © 2010, Oracle. All rights reserved. Product Data Hub: PIM Functional Training Program Setup Workbench Fundamentals.
Integration of the Activity Research Database and the Institutional Repository at Carlos III University of Madrid Teresa Malo de Molina Head Librarian.
VuFind Digital Libraries à la Carte International Ticer School 2009 Tilburg University 31 July, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB)
Digital Library Syllabus Uploader Will Cameron CSC 8530 Fall 2006 Presentation 1.
PIRUS 2 Creating a common standard for measuring online usage of individual articles Paul Needham, Cranfield University Peter Shepherd, COUNTER October.
DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic.
Technical Report 4th CERN Workshop of Innovations in Scholarly Communication (OAI4)
General Architecture of Retrieval Systems 1Adrienn Skrop.
Mod_oai: Metadata Harvesting for Everyone Michael L. Nelson, Herbert Van de Sompel, Xiaoming Liu, Aravind Elango
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
Open Access Statistics: How to Generate Interoperable Usage Information from Distributed Open Access Services 1.
PIRUS PIRUS -Publisher and Institutional Repository Usage Statistics
NEEO Technical Workshop 2
NEEO Workpackage Leader Meeting - 3
Information modeling and infrastructures for metadata
An Overview of Data-PASS Shared Catalog
Institutional Repository at NIO: Inspiration to Implementation
The Re3gistry software and the INSPIRE Registry
Database Design Hacettepe University
Márton Németh – László Drótos How to catalogue a web archive?
Metadata supported full-text search in a web archive
Presentation transcript:

NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels

Plan Reminder of planning Problem description First proposal – OAI exchange of SWUP Current implementation (OAI/SWUP) at ULB (DSpace) Proposal variants / issues

Reminder Allow for creation and maintenance of NEEO compatible usage metadata (core partners)1420IR implementors group Allow for exchange of NEEO compatible usage metadata (core partners)1420IR implementors group Allow for creation and maintenance of NEEO compatible usage metadata (non-core partners)2024IR implementors group Allow for exchange of NEEO compatible usage metadata (non-core partners)2024IR implementors group

EO usage data service Current ideas for EO: how many times every item in the IR has been read which item (and by extrapolation which author, department, …) is the most popular within the institution or within a given research domain an evolution on the usage of the IR in its whole search results get ranked on frequency of download of the object files In more advanced environments, mining of the usage data could yield other very interesting value-added services, like: the creation of a network of (clusters of) related publications: publications that are read by the same person within a certain amount of time can be considered to be similar in some way recommender systems, in which the end user gets a recommendation on which other publications are of possible interest in relation to a document he wishes to retrieve

Information of interest an identification of the object file that was downloaded an identification of the corresponding item an indication of the date and time at which this item was downloaded an identifier of an end user who downloaded this item an indication of what type of usage has been done (abstract view, download request) identification of the service from where the usage request was made by the end user identification of the web page from which the request was initiated application that has sent the request Example: – – Download request for same object file from EO portal search result for phrase "wage dispersion and firm performance"

Need for harmonization This information is stored on the IR platform in log files of all sorts, with all sorts of formatting (Apache log, DSpace log, …) We want to get this information in the EO gateway in a normalized way: – Decide on exchange format – Decide on way to exchange First proposal: – SWUP ContextObject – OAI-PMH

OpenURL ContextObject An OpenURL ContextObject is defined as a data structure that holds information on the following 6 entities: – Referent: this entity corresponds to the resource which this ContextObject is about – ReferringEntity: an entity that references the Referent – Requester: an entity that describes the resource that requests services pertaining to the Referent – ServiceType: type of service requested – Resolver: a resource that can deliver the requested services – Referrer: a resource that generates the ContextObject

OpenURL ContextObject Each of these entities is described through descriptors, which can be of 4 different types: – identifier: identifier for the entity – metadata-by-val: metadata about the entity; the metadata is included ‘by-value’ in the ContextObject – metadata-by-ref: metadata about the entity; the metadata is available at a network location – private-data: metadata about the entity; the format is not defined within the OpenURL Framework (but rather defined within a specific community) SWUP = proposal on how to use the OpenURL ContextObject concepts to describe usage events

WARNING This is a draft proposal There are outstanding issues

Information mapped to SWUP an identification of the object file that was downloaded; and an identification of the corresponding item  Referent an indication of the date and time at which this item was downloaded  Contextobject attribute an identifier of an end user who downloaded this item  Requester an indication of what type of usage has been done (abstract view, download request)  ServiceType identification of the service from where the usage request was made by the end user  Referrer identification of the web page from which the request was initiated  ReferringEntity application that has sent the request  ?

Example See guidelines

Exchange of SWUPs OAI-PMH

Implementation in DSpace University Of Minho (PT) – Statistics add-on module – Automatically transforms a DSpace log entry into a specific database entry – [ Massaging within database permitting all sorts of usage reports ] ULB – Minimal adaptation: HTTP Referer and User-Agent added to db entry – Example of database entries – OAICat software: OAI-PMH DP – Crosswalk which transforms db entry into SWUP ContextObject – downloads/request?verb=ListRecords&metadataPrefix=swup downloads/request?verb=ListRecords&metadataPrefix=swup – More info: cs/index.html cs/index.html

Proposal variants / issues Other information of interest – application that has sent the request (User Agent)  Referrer – the repository to which the request was sent  Resolver – baseUrl of the fileserver use? – URL of the request use? – geographical info in requester unnecessary? Can be determined in EO gateway, based on IP address of requester (if not encrypted) – OAI identifier use? – institution identifier Is this not already available on the EO Gateway?

Proposal variants / issues “Primary” identifier is the one of the object file – JISC: publication – Irrelevant discussion? The two identifiers need to be there, however encoded “For the publication a new namespace is introduced: The idea is that acting on the URI of the publication results in a redirection to the metadata as stored in the EO gateway.” – Using original+enriched metadata, instead of original metadata from IR?

Proposal variants / issues Could be a big XML payload  minimally needed – Identifier of the request – Datetime of the request – Referent: identifier for item and object file – Requester: (encrypted) IP address – Referrer: identifier for User Agent or originating web service – ReferringEntity: identifier for originating web page – ServiceType: identifier – Resolver: identifier for repository  Alternative format to SWUP: one line containing all information (as a variant of the Combined Log Format)

Proposal variants / issues Alternatives for exchange:  HTTP / FTP Get of files containing one-line log entries  OAI exchange of files containing one-line log entries  HTTP / FTP Get of OAI-ListRecords-Reponse formatted files containing SWUP ContextObjects – File nomenclature? – Option 2 requires administration of files (filename - datetime)? – If file exchange, size is less of an issue: we should go for XML formatted information? Filtering out double clicks  No agreement on double click period (COUNTER, Eprints, LogEC). – What do we do in EO?

Proposal variants / issues Filtering out robot requests  We must set up (and maintain) a filtering algorithm to be used by all partners for distinguishing real downloads from downloads by machines. – Authoritative list of robots? – List of regular expressions, rules – Remove all HEAD requests – Some bots can be recognized by their ip-address – Discover bots from mining EO database with usage log entries: bots can be active day and nigth, bots generate much more events than human beings bots regular visit the same URLs LogEC eliminates users who access more than 10% of all items in RePEc within one month

Proposal variants / issues Exchange of IP addresses of requesters  Infringement on privacy laws?  How to anonymize requester information? Level of encryption?