Lawrence Webley, Hussein Suleman, Tatenda Chipeperekwa University of Cape Town Department of Computer Science Digital Libraries Laboratory
Present History Requirements ETD Environment in SA Design Principles System Architecture Screen Snaps Future work
Establishment of NDLTD Mid-1990s Early South African ETD sites at Rhodes/Wits/Pretoria Late-1990s OAI-PMH developed to interconnect repositories 2001 SA National working group formed 2007 First hosted collections at NRF 2009 First Version of Portal + Repository 2010 ETD 2011 in Cape Town! 2011
To link South Africa into international efforts To gather data on university output To deal with specific local issues To showcase local accomplishments locally and internationally To promote local universities To motivate institutions to have active ETD projects
Metadata only Metadata standards What to expose – Masters/PhD only? Who will provide support? What about small institutions? How to provide access? – OAI-PMH
Create reusable, customisable, open source ETD portal management software ◦ Preferable not to reinvent the wheel! ◦ Composed entirely of open source components ◦ Can be customised to meet other use cases Scalability ◦ National archives are constantly growing
Institution NRF Institution NRF... Institution X Institution Y... NRF Central Archive TD Archive NRF ETD Portal NDLTD Union Archive SCIRUS... SA NRF SA Universities and Technicons International Partners
ETD collections at approximately 12 institutions ◦ Mostly larger, research intensive institutions Various software packages in use ◦ Eprints, Dspace, ETD-db, other OAI-PMH support in all systems
ETD Collections hosted remotely at the NRF For smaller institutions with few resources and few ETDs Multiple instances of Dspace Temporary arrangement Technical support from NRF – collection management from institutions
Our repository software fits in here Collection of metadata records from all institutions Any/all metadata formats Harvested from institutions using OAI-PMH Provides OAI-PMH and RSS interfaces No digital objects
Web interface to collection Search/Browse/View metadata Statistics for collections Latest entries Administrative interface ◦ For managing source repositories
NDLTD Union Archive ◦ International Collection SCIRUS ◦ Science specific search engine
All modern Linux-based software components Multi-tiered, simple architecture of complex components Clean separation between components ◦ Scalability ◦ More easily customised (simply replace a component) ◦ Failure resistant Any metadata Simplicity (minimal dependencies) Java/Tomcat/Lucene
Database Harvester Harvester Web Interface RSS Feed OAI-PMH data provider portal Higher up repositories portal Institutions portal Summary Info portal
Retrieves metadata from a set of ETD repositories ◦ Via OAI-PMH interfaces Performs incremental harvests Performs record validation ◦ Simple validation checks Performs twice daily harvests Configurable via web frontend.
Provides machine access points to metadata harvested ◦ OAI-PMH interface Can use any SQL-compliant DB ◦ Our implementation used MySQL Additional services provided ◦ RSS feed of latest records ◦ Summary statistics for records from each institution Designed to fit into a hierarchy of OAI-PMH compliant DLs
Portal Web Interface Search, browse statistics Lucene Portal Database Harvester repository RSS Harvester Web Admin
Harvests from Repository into portal DB Lucene indexes records Portal provides human interface ◦ Allows keyword searching, browsing, category searches ◦ Also offers links to OAI-PMH and RSS interfaces
Packaging into Ubuntu repository Generic browsing categories Content Management System ◦ Favourites, citation Social media buttons ◦ Facebook like, google plus Bug fixes
Questions? Links Live Source Code