Presentation is loading. Please wait.

Presentation is loading. Please wait.

IRUS-UK: standardised institutional repository usage statistics

Similar presentations


Presentation on theme: "IRUS-UK: standardised institutional repository usage statistics"— Presentation transcript:

1 IRUS-UK: standardised institutional repository usage statistics
John Salter, White Rose University Consortium, on behalf of IRUS-UK and COUNTER Panel 5: Usage Statistics Open Repositories 2016, Dublin, 16 June

2 What is IRUS-UK? A national aggregation service for UK Institutional Repository Usage Statistics: Collects raw download data from UK IRs for *all item types* within repositories Processes raw data into COUNTER-compliant statistics - so produced on the same basis as ‘traditional’ scholarly publishers IRUS-UK is a national aggregator of repository usage statistics. Its purpose is to collate and provide institutions with access to reliable, accurate standard-based usage statistics for their institutional repository. This enables them to gain a better understanding of the usage of their institution's research, which they can then share with key internal and external stakeholders. IRUS-UK collects data on all downloads from the institutional repository, (not just articles), and processes the data into COUNTER-compliant usage statistics.

3 Who is responsible for IRUS-UK?
Funded by Jisc Team Members: Jisc – Service Management & Host Cranfield University – Service Development & Maintenance Evidence Base, Birmingham City University – User Engagement & Evaluation Bringing together key repository services to deliver a connected national infrastructure to support OA IRUS-UK is funded by Jisc as part of the Repository Shared Services Project, which includes services to support all stages of workflows for UK institutional repositories such as deposit, metadata, discovery, and usage statistics. The IRUS-UK team is a consortium utilising expertise from the same group supporting JUSP, the Journal Usage Statistics Portal. This includes: Jisc, who host the service and provide project and service management Cranfield University, who provide technical development Evidence Base at Birmingham City University, who support evaluation and user engagement

4 About COUNTER Counting Online Usage of Networked Electronic Resources
an international initiative serving librarians, publishers and intermediaries setting standards that facilitate the recording and reporting of online usage statistics consistent, credible and comparable COUNTER Codes of Practice: Release 4 of the COUNTER Code of Practice for e-Resources Books, databases , journals and multimedia content Release 1 of the COUNTER Code of Practice for Articles Journal articles Release 1 of the COUNTER Code of Practice for Usage Factors Alternative metric to the Journal Impact Factor The Codes specify: How raw data should be processed into statistics How statistics reports should be formatted and delivered

5 The Tracker Protocol To enable repositories to participate in IRUS-UK a small piece of code is added to repository software, which employs the ‘Tracker Protocol’ pdf Patches for DSpace (1.8.x, 3.x, 4.x) and Plug-ins for Eprints ( x), implementation guidelines for Fedora In development for PURE portal and figshare The Tracker Protocol was devised in collaboration with and is endorsed by COUNTER Gathers basic data for each raw download and sends to IRUS- UK server Raw download data is stored in daily logfiles before being ingested into IRUS-UK database IRUS works by adding a small piece of code to repository software which employs the ‘Tracker Protocol’. The tracker is available for DSpace, EPrints, and we have implementation guidelines for Fedora. The tracker gathers basic data for each download and sends it to the IRUS-UK server. The tracker protocol adheres to Release 4 of the COUNTER Code of Practice for e-Resources and Release 1 of the COUNTER Code of Practice for Articles. On a daily basis, the data goes through a number of filters and checks before being added to the portal, and is then reviewed again at the end of each month. These process help ensure robot activity or other unusual activity is spotted and removed to improve the accuracy of the data.

6 IRUS-UK ingest procedures
The IRUS-UK daily ingest follows processing rules specified in: Release 4 of the COUNTER Code of Practice for e-Resources Release 1 of the COUNTER Code of Practice for Articles Processes entries from recognised IRs Sorts and filters entries following COUNTER rules to remove robot entries using the COUNTER User Agent Exclusion List regarded as a minimum requirement for excluding bots, etc. Filters entries using additional IRUS-UK filters Filters entries by applying the COUNTER 30 second double-click rule Consolidates raw usage data for each item into daily statistics At the end of each month, daily statistics are consolidated into monthly statistics IRUS works by adding a small piece of code to repository software which employs the ‘Tracker Protocol’. The tracker is available for DSpace, EPrints, and we have implementation guidelines for Fedora. The tracker gathers basic data for each download and sends it to the IRUS-UK server. The tracker protocol adheres to Release 4 of the COUNTER Code of Practice for e-Resources and Release 1 of the COUNTER Code of Practice for Articles. On a daily basis, the data goes through a number of filters and checks before being added to the portal, and is then reviewed again at the end of each month. These process help ensure robot activity or other unusual activity is spotted and removed to improve the accuracy of the data.

7 Robots & rogue usage – IRUS-UK current approach
As stated, the COUNTER Robot Exclusion list is a minimum requirement for eliminating robots and rogue usage It worked well for traditional scholarly publishers with resources behind a subscription barrier As we move to an OA world and newer players – repositories – more needs to be done So in addition IRUS-UK: excludes several IP ranges that are associated with some known robots and spiders, which don’t identify themselves in their User Agent string And based on several years of empirical evidence: exclude downloads from IP addresses with 40+ downloads in a day Exclude downloads from IP address/User Agent where a single item is downloaded 10+ times a day Though we are aware that more can still be done More on robots later, but next a diagrammatic summary of the ingest . . . IRUS works by adding a small piece of code to repository software which employs the ‘Tracker Protocol’. The tracker is available for DSpace, EPrints, and we have implementation guidelines for Fedora. The tracker gathers basic data for each download and sends it to the IRUS-UK server. The tracker protocol adheres to Release 4 of the COUNTER Code of Practice for e-Resources and Release 1 of the COUNTER Code of Practice for Articles. On a daily basis, the data goes through a number of filters and checks before being added to the portal, and is then reviewed again at the end of each month. These process help ensure robot activity or other unusual activity is spotted and removed to improve the accuracy of the data.

8 Recent Daily Stats Table
IRUS-UK Daily Ingest Daily Ingest Scripts IP Excludes (textfile) 40+ downloads PreStep1 Check Repository details. Generate exclude files. List unknown hosts 108 repositories Raw download data IP/UA/ID Excludes (textfile) 10+ downloads of a single item Daily log files Step1 Remove COUNTER & IRUS-UK exclusions & dbl-clicks Consolidate raw data into stats Intermediate stats (textfile) Step2 Insert new, unknown items into Item table. Lookup irus-uk item ids via OAI identifiers and Insert stats into Daily Stats table Insert new manifest data Manifest (textfile) Manifest Table Item Table Daily Stats Table Recent Daily Stats Table IRUS-UK Database RecentDailyStats Regenerate latest 30 days table (for performance in UI) Step3 Harvest bibliographic metadata for unknown items 108 OAI-PMH interfaces

9 IRUS-UK Daily Files – example snippets
IP Excludes (textfile) IP address Downloads 205 52 42 1506 1546 386 1524 103 59 1301 47 IP/UA/ID Excludes (textfile) IP address~Useragent~oai identifier Downloads ~Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/ Firefox/31.0~oai:sure.sunderland.ac.uk:3880 10 ~Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/ Firefox/31.0~oai:sure.sunderland.ac.uk:3904 32 ~Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/ Firefox/31.0~oai:sure.sunderland.ac.uk:4542 ~Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/ Firefox/31.0~oai:sure.sunderland.ac.uk:4575 25 ~Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:31.0)+Gecko/ Firefox/31.0~oai:sure.sunderland.ac.uk:4682 ~Mozilla/5.0+(compatible;+MSIE+10.0;+Windows+NT+6.2;+WOW64;+Trident/6.0)~oai:dspace.lib.cranfield.ac.uk:1826/8508 31 ~Mozilla/5.0+(Windows+NT+10.0;+WOW64;+Trident/7.0;+rv:11.0)+like+Gecko~oai:bradscholars.brad.ac.uk:10454/4308 27 ~Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X)+Word/ ~oai:open.ac.uk.OAI2:17219 12 ~AHC/2.0~oai:dspace.lib.cranfield.ac.uk:1826/3570 11 ~PSEARCH/2.0~oai:dspace.lib.cranfield.ac.uk:1826/3570 ~PSEARCH/2.0~oai:dspace.lib.cranfield.ac.uk:1826/3570

10 IRUS-UK Daily Files – example snippets
Manifest (textfile) Logfile name Repo id Platform id Raw data in Counter exclusions IRUS exclusions Double clicks Filtered data out txt 1 2 3463 1309 239 24 1891 3570 1966 318 9 1277 3 6352 3597 613 202 1940 4 3246 1363 634 45 1204 5 8589 5885 427 130 2147 6 6758 5805 180 28 745 7 12997 7639 929 90 4339 Intermediate stats (textfile) Repo id Platform id OAI id Date Downloads 1 2 oai:eprints.bournemouth.ac.uk:9773 oai:eprints.bournemouth.ac.uk:9808 oai:eprints.bournemouth.ac.uk:9836 oai:eprints.bournemouth.ac.uk:9849 oai:eprints.bournemouth.ac.uk:990 10 oai:gala.gre.ac.uk:10259 4 oai:gala.gre.ac.uk:10506 oai:gala.gre.ac.uk:10689 3 oai:gala.gre.ac.uk:10708 oai:gala.gre.ac.uk:10728 oai:gala.gre.ac.uk:10738

11 Monthly Summary Stats Table
IRUS-UK Monthly Update Monthly Update Script Daily Stats Table Monthly Stats Table Monthly Summary Stats Table IRUS-UK Database Monthly-Update yyyy-mm Consolidate daily stats for yyyy-mm into monthly stats (required COUNTER granularity) and update Monthly Stats table Consolidate daily stats for yyyy-mm into monthly summary stats and update Monthly Summary Stats table (for performance in UI)

12 Viewing/retrieving IRUS-UK statistics
Web User Interface - The IRUS-UK Portal Access currently behind Shibboleth authentication/authorisation Wide range of views, slicing and dicing stats from the IRUS-UK database Reports available for download as TSV/CSV/Excel spreadsheet files SUSHI service XML SOAP service for more info see: SUSHI Lite API Under development by the NISO SUSHI Lite Technical Report Working Group ( Allows retrieval of stats reports and snippets as JSON to be embedded into Repository (and other) web pages, or ingested into other services Example client: Once the data has been checked it is accessible via the web user interface in the IRUS-UK portal, which is accessed via Shibboleth authentication. Within the portal there are a wide range of views, showing the data in various different forms of reports. All the reports are available for download as CSV/Excel spreadsheets. The IRUS data can also be viewed via SUSHI service, and a SUSHI Lite API which is currently being developed in collaboration with the NISO SUSHI Lite Technical Report Working Group. The API allows retrieval of statistics to embed in other places (e.g. repository).

13 IRUS-UK: Home A few example screenshots . . .

14 IRUS-UK: Repository Report 1
The Repository Report 1 shows the number of successful item downloads requests by month and repository. You can select on the previous screen which months you would like to look at, and the report will show usage across all repositories for those months. You can sort within the portal, or you can export to Excel for further analysis (e.g. to support benchmarking).

15 IRUS-UK: Item Report 1 The Item Report 1 goes into more detail for each repository. From this report you are able to view downloads at an item level. Again you can select a date range, and sort within the portal (e.g. to see which items have received the most downloads). From this page you can also get to a more detailed view in the Item Statistics report which I’ll show in a few minutes.

16 IRUS-UK: Item Statistics
The Item Statistics shows a more detailed view of usage for a particular item. You can see in the screenshot that there is both tabular and graphical representation of downloads, as well as more metadata. For some items (including this one) there is also Altmetrics information which includes use on social media. Hopefully that quick tour has shown you some of the data you can get from IRUS - I haven’t included all reports, and more are being added all the time via user requests and feedback.

17 What next – COUNTER and IRUS-UK?
IRUS-UK follows the COUNTER Codes of Practice for its ingest . . . But reports we produce are not covered by the existing releases So, they are COUNTER-like rather than official reports We’re working with COUNTER and the COUNTER auditors to define requirements for additional reports to cater for Repositories and other new services An OA landscape COUNTER and the NISO SUSHI Committee are developing a new, comprehensive R5 Code of Practice Simplifying and harmonising reporting requirements Fewer reports with increased flexibility Integrating requirements for repositories Incorporating procedures and statistics for Research Data As mentioned earlier, we regularly collect user feedback on IRUS-UK. In the 2015 user survey, we asked people what the best thing about IRUS-UK was and the responses were largely within the themes of reliable, authoritative statistics, the ability to benchmark against others, and the ease of use. We asked about improvements to their work due to IRUS-UK and found that 92% of respondents reported that IRUS-UK has improved statistical reporting; 63% reported that IRUS-UK has saved time collecting statistics, and 83% reported that IRUS-UK has enabled reporting that they were previously unable to do. 84% of respondents either currently use IRUS-UK for benchmarking or plan to.

18 What next – COUNTER and robots?
Members of this panel are in broad agreement: Static lists of IPs and UAs, while useful to a certain extent, are not sufficient to eliminate robots and rogue usage A better approach is to look at dynamic filtering based on observed behaviour of IPs and UAs Panel members, COUNTER and others interested parties are Working to identify the most practical and generally applicable robot detection techniques Attempting to determine and agree various thresholds for elimination of non-human usage (with a justifiable statistical basis) COUNTER intends to incorporate the results of this work into the R5 Code of Practice As mentioned earlier, we regularly collect user feedback on IRUS-UK. In the 2015 user survey, we asked people what the best thing about IRUS-UK was and the responses were largely within the themes of reliable, authoritative statistics, the ability to benchmark against others, and the ease of use. We asked about improvements to their work due to IRUS-UK and found that 92% of respondents reported that IRUS-UK has improved statistical reporting; 63% reported that IRUS-UK has saved time collecting statistics, and 83% reported that IRUS-UK has enabled reporting that they were previously unable to do. 84% of respondents either currently use IRUS-UK for benchmarking or plan to.

19 What next – COUNTER and robots?
Members of this panel are in broad agreement: Static lists of IPs and UAs, while useful to a certain extent, are not sufficient to eliminate robots and rogue usage A better approach is to look at dynamic filtering based on observed behaviour of IPs and UAs Panel members, COUNTER and others interested parties are Working to identify the most practical and generally applicable robot detection techniques Attempting to determine and agree various thresholds for elimination of non-human usage (with a justifiable statistical basis) COUNTER intends to incorporate the results of this work into the R5 Code of Practice As mentioned earlier, we regularly collect user feedback on IRUS-UK. In the 2015 user survey, we asked people what the best thing about IRUS-UK was and the responses were largely within the themes of reliable, authoritative statistics, the ability to benchmark against others, and the ease of use. We asked about improvements to their work due to IRUS-UK and found that 92% of respondents reported that IRUS-UK has improved statistical reporting; 63% reported that IRUS-UK has saved time collecting statistics, and 83% reported that IRUS-UK has enabled reporting that they were previously unable to do. 84% of respondents either currently use IRUS-UK for benchmarking or plan to.

20 Final caveat! At certain levels of activity machine/non-genuine usage is practically indistinguishable from genuine human activity. As mentioned earlier, we regularly collect user feedback on IRUS-UK. In the 2015 user survey, we asked people what the best thing about IRUS-UK was and the responses were largely within the themes of reliable, authoritative statistics, the ability to benchmark against others, and the ease of use. We asked about improvements to their work due to IRUS-UK and found that 92% of respondents reported that IRUS-UK has improved statistical reporting; 63% reported that IRUS-UK has saved time collecting statistics, and 83% reported that IRUS-UK has enabled reporting that they were previously unable to do. 84% of respondents either currently use IRUS-UK for benchmarking or plan to.

21 Contacts & Information
IRUS-UK web site: Contact IRUS-UK: @IRUSNEWS COUNTER web site: Contact COUNTER: @projectCOUNTER You can find lots more information about IRUS on our website, and if you’d like to contact us you can us. We’re also on Twitter so feel free to follow us and contact us that way if you prefer.


Download ppt "IRUS-UK: standardised institutional repository usage statistics"

Similar presentations


Ads by Google