Data Wrangling at Rice University Denis Galvin Rice University MetaArchive Annual Membership Meeting Houston Texas
ETDs at Rice Dspace Collection in a database driven by programming 42,581 G Brief and Full records
ETD Structure Brief Full ?show=full PDFs /13401/ PDF?sequence=1
Testing All testing done on Centos using VMware Plugintool testing Run one daemon Copying other sites plugins
Manifest Page
Dublin Core request?verb=ListRecords&metadataPrefix=oai_dc&s et=hdl_1911_8299
Sub-Manifest Page Links to ETDs within DSpace
Plugin Configuration parameters: Base URL For the sub-manifest pages: Part (integer)
Crawl Rules
Crawl rules explained Include master manifest page: Include sub-manifest page: Include items under /bitstream Include OAI-PMH link
Crawl rules explained Include full record OAI-PMH link on manifest master Pulls in Dublin Core oai/request?verb=ListRecords&metadat aPrefix=oai_dc&set=hdl_1911_8299
Collection Sizes Recommended AU between 1G and 10G 5 AUs between 7 and 10G Create new AUs as collection grows
Tips Don’t trust testing with the plugin tool Read documentation Test with Run One Daemon Test on the caches Use expert mode to write plugin
Questions?