Presentation is loading. Please wait.

Presentation is loading. Please wait.

RoMEO and CRIS Technical Issues & Efficiency Tips

Similar presentations


Presentation on theme: "RoMEO and CRIS Technical Issues & Efficiency Tips"— Presentation transcript:

1 RoMEO and CRIS Technical Issues & Efficiency Tips
Peter Millington Centre for Research Communications University of Nottingham RoMEO and CRIS in Practice Birmingham, 1st April 2011

2 Outline Patterns of usage Approaches to using ROMEO in CRIS
Do we have a crisis? Approaches to using ROMEO in CRIS Real time queries Caching and reusing RoMEO query results Rates of change – Reality Check And their implications Other efficiency tips

3 Usage of Interactive RoMEO

4 Usage of Interactive RoMEO

5 Usage of Interactive RoMEO
Similar curve shapes for other measures Distinct weekly pattern ~4,500 Page views per day ~1,000 Visits per day ~ Unique visitors per day Seems to be a stable seasonal pattern

6 Usage of the RoMEO API – All Users

7 Usage of the RoMEO API – All Users

8 Usage of the RoMEO API – Requests

9 Usage of the RoMEO API – Requests

10 Usage of the RoMEO API Much more variable pattern
Weekly cycle of visits less distinct Number of requests very highly variable More usage by fewer users ~60 Unique visitors per day Over 250,000 hits per day (>50 times interactive) Significant growth Steady growth in number of API users Rapid growth in number of requests

11 Do we have a Crisis? Do you ever think RoMEO is slow?
Most API usage is by CRIS-like applications How can we improve things? Higher capacity server? Funding? Unnecessary? Improve efficiency? Optimise the API? More efficient usage? Put a cap on number of requests per day? What level? 1000? 2000? Block commercial software users N.b. Creative Commons License

12 API approaches in CRIS applications
Real time requests when displaying data Acceptable for individual article displays Latency too slow for lists of articles Caching RoMEO data for rapid local re-use Initial (bulk) checks against RoMEO Store the results locally Periodically recheck for updated policies Whole bibliography Additions and updates only

13 Real Time Usage Pattern

14 Real Time Usage Pattern

15 Real Time Usage Pattern
Levels vary day by day Arguably high usage for one installation Occasional peaks Special system jobs Special end user projects

16 Caching with Monthly Updates

17 Caching with Monthly Updates
Rechecking the whole database each cycle Seems to take three days. Low priority setting? Scheduled job – starts 1st of the month Could it be a weekend instead? Faster. Less intrusive. What is being checked? Each reference? Groups of records for each journal title? What about additions between cycles?

18 Caching with Daily Updates (1)

19 Caching with Daily Updates (1)

20 Caching with Daily Updates (1)
Whole database checked every day Institutions can easily have lists of 50,000 items! Lists constantly growing, slowing things down What is being checked? Each reference? Probably Additions and updates between checks? No accuracy problems Sledgehammer to crack a nut

21 Is the nut cracking the sledgehammer?

22 Caching with Daily Updates (2)

23 Caching with Daily Updates (2)
Note the logarithmic scale Large initial check of the whole database Daily check of added & changed items only Welcome low loading on the API

24 Rates of Change – Reality Check
Institutional Bibliographies Up to 2,000 additions per year (<40 per week) Few bibliographic changes after initial QA RoMEO Publishers’ Policies c.25 additions or substantive changes per week Journal - Publisher Correlations Change of publisher - infrequent - mostly January Bulk changes - Business take-over or name change Expiry of archiving embargos

25 RoMEO Implications of Change Rates
Institutional Bibliographies Only need to check additions & changes Weekly check probably sufficient, or on first use RoMEO Publishers’ Policies Recheck when the RoMEO record changes Store RoMEO ID with article/journal for bulk updates Journal - Publisher Correlations Full recheck annually on rolling cycle Specific rechecks for known business/name changes Expiry of archiving embargos Scope for improvement in RoMEO

26 Caching of RoMEO Publisher Data
Download the whole database with “?all=yes” Relatively fast Download as often as you wish Suggest weekly And/Or… Store key RoMEO data with bibliographic records Provide links to interactive RoMEO Full publisher records using RoMEO ID, or Journal level data using ISSN

27 Caching Journal-level Data
Schema/Organisation Per journal (efficient) Per article (probably inefficient) Fields Journal title ISSN and ESSN RoMEO Persistent Publisher ID RoMEO Colour and/or Version-specific permissions Normal – i.e. At the time of publication Adjusted after the completion of any embargo period

28 Most Efficient RoMEO Queries
Journals ISSN/ESSN or Exact Title Unique or far fewer results, so faster May avoid the overhead of needing to search Zetoc Publishers RoMEO ID Unique result. It gets no faster. Exact publisher name May sometimes find multiple results.

29 What to do with failed requests?
Don’t just keep rechecking! Not a journal article? Outside RoMEO’s scope. Prevent rechecking Data error (e.g. typo, bad abbreviation)? Correct the source data, then recheck No publisher or no policy in RoMEO? Feedback to RoMEO – if important Recheck infrequently – say annually or quarterly

30 Any Questions? RoMEO: API: Blog: Peter Millington:


Download ppt "RoMEO and CRIS Technical Issues & Efficiency Tips"

Similar presentations


Ads by Google