Programmatic interaction with the Invenio-based NADRE Repository Mr. Mario Torrisi (PI4 – Italy – mario.torrisi@ct.infn.it) 16 October 2018 – Third NADRE Training Workshop – Jimma (Ethiopia)
Overview Search engine API Upload records GitHub repository XML API Outline Overview GitHub repository Search engine API XML API JSON API Search engine API hands-on Upload records MARCXML file Upload records hands-on curl php
Overview For this tutorial you can refer to Github project repository, that collects all the examples you will see: https://github.com/nadre-project/nadre-tutorial Clone or download this repository on your system as shown in this video
Invenio offers three different kinds of APIs XML API JSON API Search engine API Allows you to search digital asset on the NADRE Repository, sending HTML requests Invenio offers three different kinds of APIs XML API Will return output in MARCXML JSON API Internally, Invenio records are represented in JSON, so you can ask for JSON output format Python API Invenio Search Engine can be called from within your Python programs (this API is not covered in this tutorial)
XML API
GET /search?p=...&of=...&ot=...&jrec=...&rg=... XML API Using XML API Invenio replies with an XML containing the records found Syntax: GET /search?p=...&of=...&ot=...&jrec=...&rg=... Example: Get the first 10 records in XML format http://nadre.ethernet.edu.et/search?jrec=1&rg=10&of=xm Parameters jrec - jump to record ID (e.g. 1 for first hit) rg - records in group (e.g. 10 hits per page) of - output format (e.g. Xm for XML format) Full list of parameters: link
Paginate results (XML API) Set jrec and rg properly to paginate the output Example http://nadre.ethernet.edu.et/search?of=xm&jrec=1&rg=10 http://nadre.ethernet.edu.et/search?of=xm&jrec=11&rg=10 http://nadre.ethernet.edu.et/search?of=xm&jrec=22&rg=10 Do not set rg too high – there is a server-wide safety limit for it
Look for patterns in fields (XML API) Get the first 10 records that contain the string “Hackfest” in the title: http://nadre.ethernet.edu.et/search?p=Hackfest&f=title&jrec=0&rg=10&of=xm Parameters p - pattern (e.g. your query) f - field to search within (e.g. “title”, “authors”, etc.) Get the first 10 records in 'PRESENTATIONSNADRE' collection that contain 'NADRE' in keyword: http://nadre.ethernet.edu.et/search?p1=collection:PRESENTATIONSNADRE+keyword:NADRE&of=xm&jrec=1&rg=10 p1 - first pattern to search for
Filter records and outputs in NADRE Repository (XML API) Get all records uploaded from a given date (e.g. 2018-01-01) to another given date (e.g. 2018-02-22) http://nadre.ethernet.edu.et/search?of=xm&d1=2018-01-01&d2=2018-02-22 Parameters d1 - is the first date in `YYYY-mm-dd` format d2 - is the second date in `YYYY-mm-dd` format Get only the abstract, title and authors of a resources http://nadre.ethernet.edu.et/search?of=xm&ot=abstract,title,authors ot: output tags, that is a comma separated lists of tags should be shown (e.g. ‘’ to get all fields, ‘title’ to get titles only)
JSON API
JSON API Internally, Invenio records are represented in JSON. You can ask for JSON output format (`of=recjson`) Syntax: GET /search?p=...&of=...&ot=...&jrec=...&rg=... Example: Get the first 10 records in XML format http://nadre.ethernet.edu.et/search?jrec=1&rg=10&of=recjson Parameters jrec - jump to record ID (e.g. 1 for first hit) rg - records in group (e.g. 10 hits per page) of - output format (e.g. Xm for XML format)
Paginate results (JSON API) Set jrec and rg properly to paginate the output Example http://nadre.ethernet.edu.et/search?of=recjson&jrec=1&rg=10 http://nadre.ethernet.edu.et/search?of=recjson&jrec=11&rg=10 http://nadre.ethernet.edu.et/search?of=recjson&jrec=21&rg=10 Do not set rg too high – there is a server-wide safety limit for it
Look for patterns in fields (JSON API) Get the first 10 records that contain the string “Hackfest” in the title: http://nadre.ethernet.edu.et/search?p=Hackfest&f=title&jrec=0&rg=10&of=recjson Parameters p - pattern (e.g. your query) f - field to search within (e.g. “title”, “authors”, etc.) Get the first 10 records in 'PRESENTATIONSNADRE' collection that contain 'NADRE' in keyword: http://nadre.ethernet.edu.et/search?p1=collection:PRESENTATIONSNADRE+keyword:NADRE&of=recjson&jrec=1&rg=10 p1 - first pattern to search for
Filter records and outputs in NADRE Repository (JSON API) Get all records uploaded from a given date (e.g. 2018-01-01) to another given date (e.g. 2018-02-22) http://nadre.ethernet.edu.et/search?of=recjson&d1=2018-01-01&d2=2018-02-22 Parameters d1 - is the first date in `YYYY-mm-dd` format d2 - is the second date in `YYYY-mm-dd` format Get only the abstract, title and authors of resources http://nadre.ethernet.edu.et/search?of=recjson&ot=abstract,title,authors ot: output tags, that is a comma separated lists of tags should be shown (e.g. ‘’ to get all fields, ‘title’ to get titles only)
Search engine API hands-on https://github.com/nadre-project/nadre-tutorial/tree/master/search
Search engine references To know more about XML, JSON and Python API of an Invenio based OAR visit this guide: http://nadre.ethernet.edu.et/help/hacking/search-engine-api
Upload records
Send an IP address authorization request Upload records Send an IP address authorization request Create a MARCXML file as input (e.g. your_file.xml) that describes the resources you’re going to upload to NADRE Repository Submit this XML file to the Repository: curl –T your_file.xml http://nadre.ethernet.edu.et/batchuploader/robotupload/insert -A invenio_webupload -H “Content-Type: application/marcxml+xml” A generic file you can use as template for your submission can be found at: https://github.com/nadre-project/nadre-tutorial/blob/master/submit/xml/0-generic-submission-to-OAR.xml
Must be compliant with MARCXML standard your_file.xml (1/3) Must be compliant with MARCXML standard Must have only one <collection…> tag <collection…> can have one or more <record…> that represents the resource
Each record has many <datafield…> tags your_file.xml (2/3) Each record has many <datafield…> tags tag value refers to a corresponding MARCXML metadata Each <datafield…> can have many <subfield…> that are the metadata values based on the code attribute value
Digital Object Identifier (MAN) (NR) your_file.xml (3/3) Digital Object Identifier (MAN) (NR) tag=”024” Main author (MAN) (NR) tag=”100” Other authors (R) tag=”700” Keyword (R) tag=”653” Collection (MAN) (NR) tag=”980” (MAN) Mandatory tag, (NR) not repetitive, (R) repetitive https://nadre.ethernet.edu.et/help/admin/howto-marc
Upload records hands-on https://github.com/nadre-project/nadre-tutorial/tree/master/submit
Upload records references BibUpload admin guide http://nadre.ethernet.edu.et/help/admin/bibupload-admin-guide#2 MARCXML http://nadre.ethernet.edu.et/help/admin/howto-marc
Thank you! አመሰግናለሁ!