Download presentation
Presentation is loading. Please wait.
Published byPatrick Cowan Modified over 10 years ago
1
http://crc.nottingham.ac.uk/ Sometimes, I just want to count things Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham peter.millington@nottingham.ac.uk
2
http://crc.nottingham.ac.uk/ Sometimes, I just want to count things Actually, thats a lie How difficult can it be? It should be as easy as 1 - 2 - 3 1.One– Simplicity 2.Two– High performance 3.Three– Acceptable limits Actions speak louder than words Some datasets to play with
3
http://crc.nottingham.ac.uk/ Actually, thats a lie Just give me numbers for OpenDOAR –No. of items in ~1,800 repositories –Growth rates –Number of full texts v metadata-only records More generally ( any database or resource ) –No. of records in the database –No. of records by year, month, etc. –No. of records by category
4
http://crc.nottingham.ac.uk/ How difficult can it be? Screen scraping?– Uh-uh-uh OAI-PMH – counting identifiers –BIG files – e.g. DSpace – Time out! –Iterative chunks – e.g. EPrints – Yawn –completeListSize argument– If only… ORE is no better– Whatever… select count(*) from TABLE; – Duh! So back to screen scraping– Sigh
5
http://crc.nottingham.ac.uk/
6
It should be as easy as …one… Simplicity Single SQL SELECT statement –Anything more is too complex and so too slow Single Call/File –No iteration Single simple schema –XML (+ optional JSON, and other renditions)
7
http://crc.nottingham.ac.uk/ …two… Target Performance - Rules of Two <= 0.2 seconds –SQL execution <= 2 seconds –Rendering the output file <= 20 –Data points
8
http://crc.nottingham.ac.uk/ …three Maximum limits - Rules of Twenty (?) <= 2 seconds –SQL execution <= 20 seconds –Rendering the output file <= 200 –Data points
9
http://crc.nottingham.ac.uk/ Actions speak louder than words Protocol for Statistical Harvesting (PSH) –Base URL + verb + optional arguments Specification & Examples –http://opendoar.org/demos/psh_prototype.phphttp://opendoar.org/demos/psh_prototype.php Example Base URL: –http://opendoar.org/demos/psh.phphttp://opendoar.org/demos/psh.php
10
http://crc.nottingham.ac.uk/ Simplest case - [base url]?verb=Count 2011-02-11T00:05:26Z http://www.opendoar.org/demos/psh.php 1860
11
http://crc.nottingham.ac.uk/ Optional Count Arguments &countType – units for counts –e.g. records, repositories, groups, genera, etc &setType – some sort of category –e.g. subject, region, social class, etc. &dateUnit –e.g. decade, year, month &dateType –e.g. Date added, updated, performed, extinct, etc.
12
http://crc.nottingham.ac.uk/ Breakdown by year added 2011-02-11T00:36:24Z http://www.opendoar.org/demos/psh.php 2008 298 2009 278
13
http://crc.nottingham.ac.uk/ Other verbs Verbs for listing available argument values –ListSetTypes –ListDateUnits –ListDateType s –ListCountTypes Help – Technical help Identify – Information about the resource
14
http://crc.nottingham.ac.uk/ Some datasets to play with OpenDOAR – open access repositories –http://opendoar.org/demos/psh.phphttp://opendoar.org/demos/psh.php SHERPA/RoMEO – Publishers policies –http://www.sherpa.ac.uk/romeo/psh.phphttp://www.sherpa.ac.uk/romeo/psh.php Folk Play Scripts database –http://mastermummers.org/scripts/psh.phphttp://mastermummers.org/scripts/psh.php Folk Play Groups & Events –http://mastermummers.org/groups/psh.phphttp://mastermummers.org/groups/psh.php
15
http://crc.nottingham.ac.uk/ How could this be improved? http://opendoar.org/demos/psh_prototype.php peter.millington@nottingham.ac.ukhttp://opendoar.org/demos/psh_prototype.php peter.millington@nottingham.ac.uk
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.