Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sometimes, I just want to count things Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham.

Similar presentations


Presentation on theme: "Sometimes, I just want to count things Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham."— Presentation transcript:

1 http://crc.nottingham.ac.uk/ Sometimes, I just want to count things Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham peter.millington@nottingham.ac.uk

2 http://crc.nottingham.ac.uk/ Sometimes, I just want to count things Actually, thats a lie How difficult can it be? It should be as easy as 1 - 2 - 3 1.One– Simplicity 2.Two– High performance 3.Three– Acceptable limits Actions speak louder than words Some datasets to play with

3 http://crc.nottingham.ac.uk/ Actually, thats a lie Just give me numbers for OpenDOAR –No. of items in ~1,800 repositories –Growth rates –Number of full texts v metadata-only records More generally ( any database or resource ) –No. of records in the database –No. of records by year, month, etc. –No. of records by category

4 http://crc.nottingham.ac.uk/ How difficult can it be? Screen scraping?– Uh-uh-uh OAI-PMH – counting identifiers –BIG files – e.g. DSpace – Time out! –Iterative chunks – e.g. EPrints – Yawn –completeListSize argument– If only… ORE is no better– Whatever… select count(*) from TABLE; – Duh! So back to screen scraping– Sigh

5 http://crc.nottingham.ac.uk/

6 It should be as easy as …one… Simplicity Single SQL SELECT statement –Anything more is too complex and so too slow Single Call/File –No iteration Single simple schema –XML (+ optional JSON, and other renditions)

7 http://crc.nottingham.ac.uk/ …two… Target Performance - Rules of Two <= 0.2 seconds –SQL execution <= 2 seconds –Rendering the output file <= 20 –Data points

8 http://crc.nottingham.ac.uk/ …three Maximum limits - Rules of Twenty (?) <= 2 seconds –SQL execution <= 20 seconds –Rendering the output file <= 200 –Data points

9 http://crc.nottingham.ac.uk/ Actions speak louder than words Protocol for Statistical Harvesting (PSH) –Base URL + verb + optional arguments Specification & Examples –http://opendoar.org/demos/psh_prototype.phphttp://opendoar.org/demos/psh_prototype.php Example Base URL: –http://opendoar.org/demos/psh.phphttp://opendoar.org/demos/psh.php

10 http://crc.nottingham.ac.uk/ Simplest case - [base url]?verb=Count 2011-02-11T00:05:26Z http://www.opendoar.org/demos/psh.php 1860

11 http://crc.nottingham.ac.uk/ Optional Count Arguments &countType – units for counts –e.g. records, repositories, groups, genera, etc &setType – some sort of category –e.g. subject, region, social class, etc. &dateUnit –e.g. decade, year, month &dateType –e.g. Date added, updated, performed, extinct, etc.

12 http://crc.nottingham.ac.uk/ Breakdown by year added 2011-02-11T00:36:24Z http://www.opendoar.org/demos/psh.php 2008 298 2009 278

13 http://crc.nottingham.ac.uk/ Other verbs Verbs for listing available argument values –ListSetTypes –ListDateUnits –ListDateType s –ListCountTypes Help – Technical help Identify – Information about the resource

14 http://crc.nottingham.ac.uk/ Some datasets to play with OpenDOAR – open access repositories –http://opendoar.org/demos/psh.phphttp://opendoar.org/demos/psh.php SHERPA/RoMEO – Publishers policies –http://www.sherpa.ac.uk/romeo/psh.phphttp://www.sherpa.ac.uk/romeo/psh.php Folk Play Scripts database –http://mastermummers.org/scripts/psh.phphttp://mastermummers.org/scripts/psh.php Folk Play Groups & Events –http://mastermummers.org/groups/psh.phphttp://mastermummers.org/groups/psh.php

15 http://crc.nottingham.ac.uk/ How could this be improved? http://opendoar.org/demos/psh_prototype.php peter.millington@nottingham.ac.ukhttp://opendoar.org/demos/psh_prototype.php peter.millington@nottingham.ac.uk


Download ppt "Sometimes, I just want to count things Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham."

Similar presentations


Ads by Google