Download presentation
Presentation is loading. Please wait.
Published byNathaniel Blair Modified over 10 years ago
1
Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham peter.millington@nottingham.ac.uk
2
http://www.opendoar.org/ Background to OpenDOAR Created in 2005 – Lists over 2320 repositories (2013-07-02) Manually validated – High quality… – …but we didnt like to talk about the record counts Counts not updated after the initial entry – Unless prompted by users Fixed in 2012 – Record counts updated about every 2 weeks
3
http://www.opendoar.org/ Established counting methods Manual inspection – Labour-intensive Counting OAI-PMH record identifiers – Inefficient Handling big files Iterative – Unreliable File size limits and timeouts – Inaccurate Need to account for deleted records
4
http://www.opendoar.org/ How difficult can it be? SELECT COUNT(*) FROM repository; – Still fast even with added complexity – Statuses, Breakdown by date, etc. The number is often there on the web page – Headline number, or – x to y of z tally, or – Adding up numbers on a Browse by year page
5
http://www.opendoar.org/ OpenDOARs Strategy Avoid OAI-PMH whenever possible Use other m2m interfaces, if available/suitable Screen scrape numbers from web pages If all else fails, use manual methods Counts for full texts as well, where possible
6
Some examples…
7
http://www.opendoar.org/ Generic n records Documents avec texte intégral 229181
8
http://www.opendoar.org/ Generic x to y of z counters DSpace Browse Counter is a special case Showing results 1 to 20 of 6727
9
DSpace totalCnt Add-on NCKUR [40782/74662] [ / ] -
10
Generic Sum of List Counters EPrints count Browse List is a special case Add up the numbers in brackets
11
Number of items EPrints V.3 Counter http://eprints.nonesuch.ac.uk/cgi/counter
12
Generic Sum of Numbers Add up the numbers
13
Generic HTML tag counting Count item tags in HTML source code
14
http://www.opendoar.org/ Counting multiple pages Separate pages per letter, document type, etc Issues with Greenstone – lack of predictability
15
OAI-PMH ListIdentifiers: Simple http://... /oai?verb=ListIdentifiers&metadataPrefix=oai_dc Count these No resumptionToken
16
OAI-PMH ListIdentifiers: Iterative resumptionToken for blocks of identifiers 193114FUS
17
OAI-PMH completeListSize <resumptionToken completeListSize="89805" Bingo!
18
http://www.opendoar.org/ Twelve count harvesting methods Generic – Generic n records – Generic x to y of z counters – Generic Sum of List Counters – Generic HTML tag counting – Generic Sum of Numbers DSpace – DSpace Browse Counter – DSpace totalCnt Add-on EPrints – EPrints count Browse List – EPrints V.3 Counter OAI-PMH ListIdentifiers – Simple – Iterative – completeListSize Manual counting
19
Efficiency of the methods Iterative OAI-PMH so much slower
20
Relative Frequency of Methods
21
http://www.opendoar.org/ Ugent Numbers galore DSpace and EPrints Easily scrapeable counts
22
http://www.opendoar.org/ Count harvesting issues No counts visible or harvestable Static counts – often approx. – e.g. over 2m items Connectivity issues – Infrastructure limitations – e.g. heavy internet traffic – HTTP 401 (unauthorised) & 403 (forbidden) errors Data hidden in include files (e.g. JavaScript) – Not visible in View Source code No direct URL known for the pages with counts – Only accessible to human navigators Remodelled websites – requiring updated settings
23
http://www.opendoar.org/ Help OpenDOAR count your repository Display record counts on your home page – Using distinctive wording & highlighting – Ideally in or tags Ensure numbers can be seen in View Source code Ensure pages & files are not blocked to robots – Grant read-only access if necessary Implement OAI-PMH properly – Return ListIdentifiers in chunks – not one big file – Include completeListSize in the resumptionToken Tell us about any changes, so we can update settings
24
http://www.opendoar.org/ Ideas for the Future Comparing counts from OpenDOAR & ROAR – E.g. Nottm ePrints: 1,239 < 1,277 – E.g. HAL-Inserm: 7,498 > 2,773 OpenDOAR – Growth charts – Full text counts Extending OAI-PMH – Statistical features – Trial PSH
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.