Download presentation
Presentation is loading. Please wait.
Published byTyrone Benson Modified over 6 years ago
1
RoMEO and CRIS Technical Issues & Efficiency Tips
Peter Millington Centre for Research Communications University of Nottingham RoMEO and CRIS in Practice Birmingham, 1st April 2011
2
Outline Patterns of usage Approaches to using ROMEO in CRIS
Do we have a crisis? Approaches to using ROMEO in CRIS Real time queries Caching and reusing RoMEO query results Rates of change – Reality Check And their implications Other efficiency tips
3
Usage of Interactive RoMEO
4
Usage of Interactive RoMEO
5
Usage of Interactive RoMEO
Similar curve shapes for other measures Distinct weekly pattern ~4,500 Page views per day ~1,000 Visits per day ~ Unique visitors per day Seems to be a stable seasonal pattern
6
Usage of the RoMEO API – All Users
7
Usage of the RoMEO API – All Users
8
Usage of the RoMEO API – Requests
9
Usage of the RoMEO API – Requests
10
Usage of the RoMEO API Much more variable pattern
Weekly cycle of visits less distinct Number of requests very highly variable More usage by fewer users ~60 Unique visitors per day Over 250,000 hits per day (>50 times interactive) Significant growth Steady growth in number of API users Rapid growth in number of requests
11
Do we have a Crisis? Do you ever think RoMEO is slow?
Most API usage is by CRIS-like applications How can we improve things? Higher capacity server? Funding? Unnecessary? Improve efficiency? Optimise the API? More efficient usage? Put a cap on number of requests per day? What level? 1000? 2000? Block commercial software users N.b. Creative Commons License
12
API approaches in CRIS applications
Real time requests when displaying data Acceptable for individual article displays Latency too slow for lists of articles Caching RoMEO data for rapid local re-use Initial (bulk) checks against RoMEO Store the results locally Periodically recheck for updated policies Whole bibliography Additions and updates only
13
Real Time Usage Pattern
14
Real Time Usage Pattern
15
Real Time Usage Pattern
Levels vary day by day Arguably high usage for one installation Occasional peaks Special system jobs Special end user projects
16
Caching with Monthly Updates
17
Caching with Monthly Updates
Rechecking the whole database each cycle Seems to take three days. Low priority setting? Scheduled job – starts 1st of the month Could it be a weekend instead? Faster. Less intrusive. What is being checked? Each reference? Groups of records for each journal title? What about additions between cycles?
18
Caching with Daily Updates (1)
19
Caching with Daily Updates (1)
20
Caching with Daily Updates (1)
Whole database checked every day Institutions can easily have lists of 50,000 items! Lists constantly growing, slowing things down What is being checked? Each reference? Probably Additions and updates between checks? No accuracy problems Sledgehammer to crack a nut
21
Is the nut cracking the sledgehammer?
22
Caching with Daily Updates (2)
23
Caching with Daily Updates (2)
Note the logarithmic scale Large initial check of the whole database Daily check of added & changed items only Welcome low loading on the API
24
Rates of Change – Reality Check
Institutional Bibliographies Up to 2,000 additions per year (<40 per week) Few bibliographic changes after initial QA RoMEO Publishers’ Policies c.25 additions or substantive changes per week Journal - Publisher Correlations Change of publisher - infrequent - mostly January Bulk changes - Business take-over or name change Expiry of archiving embargos
25
RoMEO Implications of Change Rates
Institutional Bibliographies Only need to check additions & changes Weekly check probably sufficient, or on first use RoMEO Publishers’ Policies Recheck when the RoMEO record changes Store RoMEO ID with article/journal for bulk updates Journal - Publisher Correlations Full recheck annually on rolling cycle Specific rechecks for known business/name changes Expiry of archiving embargos Scope for improvement in RoMEO
26
Caching of RoMEO Publisher Data
Download the whole database with “?all=yes” Relatively fast Download as often as you wish Suggest weekly And/Or… Store key RoMEO data with bibliographic records Provide links to interactive RoMEO Full publisher records using RoMEO ID, or Journal level data using ISSN
27
Caching Journal-level Data
Schema/Organisation Per journal (efficient) Per article (probably inefficient) Fields Journal title ISSN and ESSN RoMEO Persistent Publisher ID RoMEO Colour and/or Version-specific permissions Normal – i.e. At the time of publication Adjusted after the completion of any embargo period
28
Most Efficient RoMEO Queries
Journals ISSN/ESSN or Exact Title Unique or far fewer results, so faster May avoid the overhead of needing to search Zetoc Publishers RoMEO ID Unique result. It gets no faster. Exact publisher name May sometimes find multiple results.
29
What to do with failed requests?
Don’t just keep rechecking! Not a journal article? Outside RoMEO’s scope. Prevent rechecking Data error (e.g. typo, bad abbreviation)? Correct the source data, then recheck No publisher or no policy in RoMEO? Feedback to RoMEO – if important Recheck infrequently – say annually or quarterly
30
Any Questions? RoMEO: API: Blog: Peter Millington:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.