Digitometric Services for Open Archives Environments Tim Brody Simon Kampa, Stevan Harnad, Les Carr, Steve Hitchcock {tdb01r,srk,harnad,lac,sh94r}@ecs.soton.ac.uk University of Southampton, Intelligence, Agents, Multimedia Group 08 December 2018 ECDL 2003, Trondheim, Norway
Open Archives Initiative The protocol is openly documented, and metadata is “exposed” to at least some peer group (note: rights management can still apply!) Archive defined as a “collection of stuff” -- not the archivist’s definition of “archive”. “Repository” used in most OAI documents. Promoting interoperability 08 December 2018 ECDL 2003, Trondheim, Norway
OAI Data Model: Resources/Items/Records All available (meta)data about the resource Item = OAI identifier item Dublin Core Metadata MARC Metadata ??? XML records record = metadata + identifier + datestamp 08 December 2018 ECDL 2003, Trondheim, Norway
Protocol Responses 08 December 2018 ECDL 2003, Trondheim, Norway
Protocol 1 2 3 HTTP URL Requests Service Provider Data Provider XML Responses Identify 1 Collection-level Description ListRecords?metadataPrefix=xyz 2 All repository xyz records 3 ListRecords?from=2003-04-02&… All repository xyz records since 2003-04-02 08 December 2018 ECDL 2003, Trondheim, Norway
Other Commands ListIdentifiers ListMetadataFormats ListSets GetRecord Return only the identifier/datestamp/set membership ListMetadataFormats Return the available data formats ListSets Return the set structure (if there is one) GetRecord Return a record given by OAI identifier 08 December 2018 ECDL 2003, Trondheim, Norway
Interest in OAI 111 registered OAI repositories Many unregistered (e.g. all GNU EPrints.org and DSpace archives) 4,500,000 public records http://arc.cs.odu.edu/ NSDL project, UK’s JISC Information Environment OLAC (language community built on OAI) 08 December 2018 ECDL 2003, Trondheim, Norway
Why OAI? Mandated Dublin Core allows the quick establishment of basic services and tools Simple and metadata-neutral protocol allows more interesting possibilities (without breaking 1.) and extensions … 08 December 2018 ECDL 2003, Trondheim, Norway
Adding Caching to OAI-PMH 08 December 2018 ECDL 2003, Trondheim, Norway
Celestial (OAI Cache) Developed to maintain a local metadata copy Avoid repeated, large harvests during development Provides an abstraction over multiple OAI versions (hence acts as a gateway to older implementations) Useful for testing OAI implementations & improving performance Using XSLT provides a Web interface to OAI Provides redundancy 08 December 2018 ECDL 2003, Trondheim, Norway
08 December 2018 ECDL 2003, Trondheim, Norway
Citebase Search – Data Model e-Services 08 December 2018 ECDL 2003, Trondheim, Norway
Content 250,000 full-text resources 6 million references 240,000 of which arXiv.org 6 million references 29 mean refs/paper (therefore failed to extract references for 18% of papers) (n.b. modal refs is 19) 1 million references linked internally to the full-text (15%) 08 December 2018 ECDL 2003, Trondheim, Norway
08 December 2018 ECDL 2003, Trondheim, Norway
Citebase Search 08 December 2018 ECDL 2003, Trondheim, Norway The abstract page shows the usual title/authors/abstract and some analysis of the current article. The graph shows over time when the paper has been cited and when it has been downloaded. 08 December 2018 ECDL 2003, Trondheim, Norway
Citebase Search: Navigation by Citation Links Article with reference list Future Reference link Following the abstract are links to related pages by citations. These links can go backwards in time using the reference list, forwards in time by what has cited me, and sideways by either related or co-citation. Related papers are papers that have a similar reference list – often where an author has used the same references more than once! Co-cited is where two papers have been cited next to each other, the same as author co-citation. However co-cited papers can only be found for articles that have been cited, hence can’t be used for new articles. Related Current Article Co-cited Past 08 December 2018 ECDL 2003, Trondheim, Norway
Citebase Search cites cites 08 December 2018 This is the reference list, as parsed from the full-text. “eprint” takes the user to the Citebase abstract page of the cited article, journal are bespoke links for the American Physical Society journals. 08 December 2018 ECDL 2003, Trondheim, Norway
Citebase Search cites cites 08 December 2018 Articles that have cited the current article, following these links will take the user towards newer papers. 08 December 2018 ECDL 2003, Trondheim, Norway
Citebase Search “Co-cited” 08 December 2018 And co-cited articles. The development version of Citebase also includes Related articles. 08 December 2018 ECDL 2003, Trondheim, Norway
Read/Cite Cycle 08 December 2018 ECDL 2003, Trondheim, Norway
Digitometric Services for OAI Tools for visualising research metadata Builds an analysis service on Citebase Knowledge mapping (co-authors, co-citation, etc.) 08 December 2018 ECDL 2003, Trondheim, Norway
Co-Citation Network 08 December 2018 ECDL 2003, Trondheim, Norway A co-citation map embedded within the Digitometric user interface. The nodes on the map represent individual publications. By hovering with the mouse pointer over a node, the user can generate details (title, author, abstract) in the information box. The arcs between the nodes represent a co-citation relationship. A cluster of related publications are evident in the centre of the map. Four distinct paths emanate out of this indicating the possibility of specialty fields arising out of the main cluster. 08 December 2018 ECDL 2003, Trondheim, Norway
Full Co-Citation Map 08 December 2018 ECDL 2003, Trondheim, Norway A full-sized co-citation map with a lower co-citation threshold resulting in more nodes being included. Several clusters (research fronts) are evident, in particular the large cluster towards the bottom right of the map. Researchers may get a better understanding of their research landscape by exploring these clusters and the relationships between them. Different colours are also used to indicate which nodes have been recently highly cited, paving the way for up-and-coming (or dying) research fronts to be identified. There are also several occurences of 5 or 6 nodes emanating sequentially out of a single node, indicating a sequence of papers being published that address a common problem or theme. 08 December 2018 ECDL 2003, Trondheim, Norway
Digitometric Services for Open Archives Environments http://www.openarchives.org/ http://opcit.eprints.org/ http://citebase.eprints.org/ http://www.eprints.org/ http://www.hyphen.info/ AKT Project (knowledge) Thank you for listening! Tim Brody 08 December 2018 ECDL 2003, Trondheim, Norway