Presentation is loading. Please wait.

Presentation is loading. Please wait.

NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.

Similar presentations


Presentation on theme: "NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels."— Presentation transcript:

1 NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels

2 Plan Reminder of planning Problem description First proposal – OAI exchange of SWUP Current implementation (OAI/SWUP) at ULB (DSpace) Proposal variants / issues

3 Reminder Allow for creation and maintenance of NEEO compatible usage metadata (core partners)1420IR implementors group Allow for exchange of NEEO compatible usage metadata (core partners)1420IR implementors group Allow for creation and maintenance of NEEO compatible usage metadata (non-core partners)2024IR implementors group Allow for exchange of NEEO compatible usage metadata (non-core partners)2024IR implementors group

4

5 EO usage data service Current ideas for EO: how many times every item in the IR has been read which item (and by extrapolation which author, department, …) is the most popular within the institution or within a given research domain an evolution on the usage of the IR in its whole search results get ranked on frequency of download of the object files In more advanced environments, mining of the usage data could yield other very interesting value-added services, like: the creation of a network of (clusters of) related publications: publications that are read by the same person within a certain amount of time can be considered to be similar in some way recommender systems, in which the end user gets a recommendation on which other publications are of possible interest in relation to a document he wishes to retrieve

6 Information of interest an identification of the object file that was downloaded an identification of the corresponding item an indication of the date and time at which this item was downloaded an identifier of an end user who downloaded this item an indication of what type of usage has been done (abstract view, download request) identification of the service from where the usage request was made by the end user identification of the web page from which the request was initiated application that has sent the request Example: – http://bib11.ulb.ac.be:8080/dspace/handle/2013/781 http://bib11.ulb.ac.be:8080/dspace/handle/2013/781 – Download request for same object file from EO portal search result for phrase "wage dispersion and firm performance"

7 Need for harmonization This information is stored on the IR platform in log files of all sorts, with all sorts of formatting (Apache log, DSpace log, …) We want to get this information in the EO gateway in a normalized way: – Decide on exchange format – Decide on way to exchange First proposal: – SWUP ContextObject – OAI-PMH

8 OpenURL ContextObject An OpenURL ContextObject is defined as a data structure that holds information on the following 6 entities: – Referent: this entity corresponds to the resource which this ContextObject is about – ReferringEntity: an entity that references the Referent – Requester: an entity that describes the resource that requests services pertaining to the Referent – ServiceType: type of service requested – Resolver: a resource that can deliver the requested services – Referrer: a resource that generates the ContextObject

9 OpenURL ContextObject Each of these entities is described through descriptors, which can be of 4 different types: – identifier: identifier for the entity – metadata-by-val: metadata about the entity; the metadata is included ‘by-value’ in the ContextObject – metadata-by-ref: metadata about the entity; the metadata is available at a network location – private-data: metadata about the entity; the format is not defined within the OpenURL Framework (but rather defined within a specific community) SWUP = proposal on how to use the OpenURL ContextObject concepts to describe usage events

10 WARNING This is a draft proposal There are outstanding issues

11 Information mapped to SWUP an identification of the object file that was downloaded; and an identification of the corresponding item  Referent an indication of the date and time at which this item was downloaded  Contextobject attribute an identifier of an end user who downloaded this item  Requester an indication of what type of usage has been done (abstract view, download request)  ServiceType identification of the service from where the usage request was made by the end user  Referrer identification of the web page from which the request was initiated  ReferringEntity application that has sent the request  ?

12 Example See guidelines

13 Exchange of SWUPs OAI-PMH

14 Implementation in DSpace University Of Minho (PT) – Statistics add-on module – Automatically transforms a DSpace log entry into a specific database entry – [ Massaging within database permitting all sorts of usage reports ] ULB – Minimal adaptation: HTTP Referer and User-Agent added to db entry – Example of database entries – OAICat software: OAI-PMH DP – Crosswalk which transforms db entry into SWUP ContextObject – http://bib15.ulb.ac.be:8080/dspace-oai- downloads/request?verb=ListRecords&metadataPrefix=swup http://bib15.ulb.ac.be:8080/dspace-oai- downloads/request?verb=ListRecords&metadataPrefix=swup – More info: http://www.bibhost.ulb.ac.be/RDIB/DISpace/DIfusion%201.4.2/Statisti cs/index.html http://www.bibhost.ulb.ac.be/RDIB/DISpace/DIfusion%201.4.2/Statisti cs/index.html

15 Proposal variants / issues Other information of interest – application that has sent the request (User Agent)  Referrer – the repository to which the request was sent  Resolver – baseUrl of the fileserver use? – URL of the request use? – geographical info in requester unnecessary? Can be determined in EO gateway, based on IP address of requester (if not encrypted) – OAI identifier use? – institution identifier Is this not already available on the EO Gateway?

16 Proposal variants / issues “Primary” identifier is the one of the object file – JISC: publication – Irrelevant discussion? The two identifiers need to be there, however encoded “For the publication a new namespace is introduced: http://identifier.economistsonline.org/. The idea is that acting on the URI of the publication results in a redirection to the metadata as stored in the EO gateway.” – Using original+enriched metadata, instead of original metadata from IR?

17 Proposal variants / issues Could be a big XML payload  minimally needed – Identifier of the request – Datetime of the request – Referent: identifier for item and object file – Requester: (encrypted) IP address – Referrer: identifier for User Agent or originating web service – ReferringEntity: identifier for originating web page – ServiceType: identifier – Resolver: identifier for repository  Alternative format to SWUP: one line containing all information (as a variant of the Combined Log Format)

18 Proposal variants / issues Alternatives for exchange:  HTTP / FTP Get of files containing one-line log entries  OAI exchange of files containing one-line log entries  HTTP / FTP Get of OAI-ListRecords-Reponse formatted files containing SWUP ContextObjects – File nomenclature? – Option 2 requires administration of files (filename - datetime)? – If file exchange, size is less of an issue: we should go for XML formatted information? Filtering out double clicks  No agreement on double click period (COUNTER, Eprints, LogEC). – What do we do in EO?

19 Proposal variants / issues Filtering out robot requests  We must set up (and maintain) a filtering algorithm to be used by all partners for distinguishing real downloads from downloads by machines. – Authoritative list of robots? – List of regular expressions, rules – Remove all HEAD requests – Some bots can be recognized by their ip-address – Discover bots from mining EO database with usage log entries: bots can be active day and nigth, bots generate much more events than human beings bots regular visit the same URLs LogEC eliminates users who access more than 10% of all items in RePEc within one month

20 Proposal variants / issues Exchange of IP addresses of requesters  Infringement on privacy laws?  How to anonymize requester information? Level of encryption?


Download ppt "NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels."

Similar presentations


Ads by Google