Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harvesting Data Mark Doyle APS AAHEP 7 – April 2, 2014.

Similar presentations


Presentation on theme: "Harvesting Data Mark Doyle APS AAHEP 7 – April 2, 2014."— Presentation transcript:

1 Harvesting Data Mark Doyle APS AAHEP 7 – April 2, 2014

2 OAI-PMH + Widely used + Relatively easy to implement (server and client) + Allows consumer to easily keep up-to-date or re-harvest as necessary + Self-identifying (available formats, etc.) - Responses require XML – Have to embed metadata XML into response - Non-XML data difficult – DIDL (complex!) or URLs in responses

3 RESTful API + Simple – just HTTP requests + Data can be in any format (JSON) + Pagination based on Link: HTTP header (borrowed from GitHub API) + Return data in a zip file (BagIt format) – Includes manifest with checksums - No real data model for file types/names

4 curl 'http://harvest.aps.org/content/journals/articles?from=2014-02- 20&until=2014-02-28’ [ … {"doi":"10.1103/PhysRevB.88.235414", "metadata_last_modified_at":"2014-02-24T19:00:00-0500", "last_modified_at":"2013-12-11T10:24:59-0500", "bagit_urls":{ "complete":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/co mplete", "apsxml":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/apsx ml", "adsfulltext":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/a dsfulltext", "pdfxml":"http://harvest.aps.org/bagit/articles/10.1103/PhysRevB.88.235414/pdfx ml"} … ]

5 curl 'http://harvest.aps.org/bagit/articles/10.1103/PhysRevLett.106.0143 01/apsxml' >! PhysRevLett.106.014301.zip unzip -l PhysRevLett.106.014301.zip Archive: PhysRevLett.106.014301.zip Length Date Time Name -------- ---- ---- ---- 74 03-13-12 08:11 manifest-md5.txt 82 03-13-12 08:11 manifest-sha1.txt 64 03-13-12 08:11 bag-info.txt 55 03-13-12 08:11 bagit.txt 0 03-13-12 08:11 data/ 0 03-13-12 08:11 data/PhysRevLett.106.014301/ 60948 03-13-12 08:11 data/PhysRevLett.106.014301/fulltext.xml

6 { "identifier":[ { "type":"doi", "id":"10.1103/PhysRevD.89.042001" } ], "link":[ { "url":"http://link.aps.org/doi/10.1103/PhysRevD.89.042001" } ], "type":"article", "title":"Dark matter constraints from observations of 25 Milky Way satellite galaxies with the Fermi Large Area Telescope", "journal":{ "id":"PRD", "name":"Physical Review D", "shortcode":"Phys. Rev. D" }, "volume":"89", "issue":"4", "pages":"042001",

7 "author":[ { "collaboration":"Fermi-LAT Collaboration" }, { "name":"M. Ackermann", "firstname":"M.", "lastname":"Ackermann", "affiliations":[ "a1" ] }, { "name":"A. Albert", "firstname":"A.", "lastname":"Albert", "affiliations":[ "a2" ] },

8 "affiliation":[ { "id":"a1", "name":"Deutsches Elektronen Synchrotron DESY, D-15738 Zeuthen, Germany" }, { "id":"a2", "name":"W. W. Hansen Experimental Physics Laboratory, Kavli Institute for Particle Astrophysics and Cosmology, Department of Physics and SLAC National Accelerator Laboratory, Stanford University, Stanford, California 94305, USA" },

9 CrossRef TDM (née Prospect) Support for authenticated text data mining – Via tokens Click through licenses Rate limiting API Example: Researcher with ORCID at subscribing institution


Download ppt "Harvesting Data Mark Doyle APS AAHEP 7 – April 2, 2014."

Similar presentations


Ads by Google