Richard Jones, Systems Developer Technical Issues for Repository Software Theses Alive! Edinburgh University Library SHERPA Nottingham University Funded by JISC (Joint Information Systems Committee)
Richard Jones, Systems Developer Technical Issues for Repository Software My Role Within These Projects Evaluate, adapt and develop an open source package for use across the UK Produce an OAI-compliant E-Thesis repository Develop a pilot national service with the aim of supporting E-Theses creation and management for UK universities Provide support for the creation of an E-Prints service at Edinburgh University Library
Richard Jones, Systems Developer Technical Issues for Repository Software This Presentation The Institutional Repository Common Popular Open-Source Packages Generic Software Issues The Open Archives Initiative – OAI-PMH Specific Repository Software Issues Final Remarks
Richard Jones, Systems Developer Technical Issues for Repository Software The Institutional Repository A set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. Clifford Lynch Executive Director, Coalition for Networked Information (CNI)
Richard Jones, Systems Developer Technical Issues for Repository Software Common Popular Open-Source Packages DSpace ( MIT, HP, DSpace Federation EPrints.org ( University of Southampton Fedora ( University of Virginia, Cornell Univeristy ETD-db ( Virginia Tech Endorsed by NDLTD for E-Theses
Richard Jones, Systems Developer Technical Issues for Repository Software Generic Software Issues (1) Support & Development Support from authors Documentation essential, mailing lists etc. Continued development Bug fixes, feature requests, minimal local development
Richard Jones, Systems Developer Technical Issues for Repository Software Generic Software Issues (2) System Architecture Modular architecture Easy to upgrade, develop and customise Appropriate programming languages Stable and appropriate database system Easy to integrate into current web services Templates and styles, using language standards (e.g. HTML/CSS, XML/XSLT)
Richard Jones, Systems Developer Technical Issues for Repository Software Generic Software Issues (3) System Security Authentication methods Authorisation methods Authenticate-able content Secure supporting systems Well-known, open security systems and coherent standard architectures
Richard Jones, Systems Developer Technical Issues for Repository Software Generic Software Issues (4) System Administration Coherent user administration Different types of user and user groups Granular, distributable administration Delegate areas of the system to different administrators Access policies
Richard Jones, Systems Developer Technical Issues for Repository Software Generic Software Issues (5) Additional Functionality Public API (Application Programming Interface) Providing additional services from the same code base Coherent internal data structuring
Richard Jones, Systems Developer Technical Issues for Repository Software Open Archives Initiative – Protocol for Metadata Harvesting The OAI-PMH Lightweight web-service protocol Dublin Core based Data provider/Service provider architecture
Richard Jones, Systems Developer Technical Issues for Repository Software OAI-PMH Example (1) Verb: ListRecords From: Until: Metadata Prefix: oai_dc until= &metadataPrefix=oai_dc
Richard Jones, Systems Developer Technical Issues for Repository Software OAI-PMH Example (2) journalgroup:bmcbiology Microarray analysis of... Schwamborn, Jens Abstract Background Tumor necrosis factor... BioMed Central Ltd. Copyright 2003 Schwamborn The metadata made available via this...
Richard Jones, Systems Developer Technical Issues for Repository Software OAI-PMH Data/Service Providers
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (1) System Architecture Web services protocols for data retrieval OAI-PMH, Z39.50, SRW/U, SOAP Appropriate database system PostgreSQL (open-source), Oracle (proprietary)
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (2) System Security Authentication methods Most importantly: the one you use at your institution, with the option to insert your own Authorisation methods Integrate-able into current institutional information systems such as staff, student or course lists Authenticate-able content Provenance metadata, paper-trails, data checksums (e.g. MD5)
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (3) System Administration Coherent user administration Granular administration system Possible administrator types: Collection admin, User admin, User Group admin, Structure admin, Database Content admin, System Administrator Licensing System Related to access policies, with separate submitter, institution and user licences, ideally with a time- dependent facility Access Policies Possible requirements: domain restrictions, time- dependent restrictions, partial restrictions
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (4) Record Handling (1) Metadata Capture What metadata do you need? Flexible, appropriate schema (e.g. Qualified DC, ETD-MS (E-Theses), MARC21) Customisable Submission System Collects relevant metadata, and can be modified conditionally on the fly Ingest Methods Standard submission, batch import, harvesting (e.g. OAI-PMH (metadata only)), customised insert using native API
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (5) Record Handling (2) Extract Methods Native viewing system, batch export, metadata cross- walk, harvest (e.g. OAI-PMH (metadata only)), customised extract using API Item Wrappers Multiple files, multiple metadata records/schemas, internal structure mapping (e.g. METS, DIDL)
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (6) Digital Preservation (1) Persistent Identifiers Some available systems: Handle, PURL, URN, DOI, ARK Migration On Ingest (migrate submission to open format), or on request (preserve migration tool) Viewers Tools to render the format are preserved Emulation The original viewer is emulated in the new system
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (7) Digital Preservation (2) Universal Virtual Computer (UVC) On Ingest (migrate submission to open format), or on request (preserve migration tool) Representation Information Metadata regarding the representation of the file format Global Digital Format Registry (GDFR) Typed Object Model (TOM) Wheatley, P A way forward for developments in the digital preservation functions of DSpace: options, issues and recommendations (
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (8) Digital Preservation (3) File Formats Open Standards Rich Text (RTF) Widely Used LaTeX (TEX) Likely to be Preserved Portable Document Format (PDF) Human Readable Contents Plain Text (TXT) Human Readable Format Extensible Markup Language (XML)
Richard Jones, Systems Developer Technical Issues for Repository Software Specific Repository Software Issues (9) Additional Functionality Coherent data structuring An internal structure that can represent your institution in one or more overlaying schemas Native Browse Hierarchical browsing, filtering by structure and metadata; aids indexing by search engines Native Search Constrained search locations, using browse functionality to display results Full Text Indexing Public API (Application Programming Interface) Creating Portal-like services within the institution
Richard Jones, Systems Developer Technical Issues for Repository Software Final Remarks No systems yet deal with all issues Some good development work ongoing with the various packages Not all issues need to be solved: To provide an Institutional Repository For your institution The Institutional Repository is still in its infancy, and may not mature for another 10 years There are significant policy and community issues that also need to be addressed.
Richard Jones, Systems Developer Technical Issues for Repository Software Thanks for Listening Richard Jones JISC: This presentation: