Brief Notes from Kew Mark Jackson Software Applications Manager
Focussing on... n Herbarium digitisation n electronic Plant Information Centre
Kew Herbarium n Guesstimated –7 million specimens –250,000 types n Less than 5% specimens databased n A variety of personal databases
Preparation for Digitisation n Computerise transactions n Agree and document policy and procedures n Establish core fields (HISPID pending ABCD) n Develop hardware and software infrastructure (e.g. catalogue database, mass storage)
Digitisation Strategy n Curators to barcode, database and image types for loan n Repatriation & research projects –to use infrastructure and core fields –data to be imported into Catalogue (eventually) n Pursue digitisation projects
Specimen imaging n Decision to try to match Cibachrome prints in terms of quality (e.g. suitable for many diagnostic purposes) – 600 dpi delivers 200MB images n Stored as uncompressed (but bzipped) TIFFs n Acquisition of mass storage
HerbScan n A3 flatbed scanner, inverted n Cradle for specimens n Distributed throughout Herbarium
Pros and cons n £30-40,000 n 200MB images barely achievable n 1 image per minute n Fixed n Versatile n £7,500 n 200MB images easily achievable n 10 images per hour n Some mobility n Suited to flat items 200 MB master images (600 dpi scans), based on capturing the level of detail of Cibachromes. Camera HerbScan
HerbCat Client Image Server Images Metadata image enquiries HerbCat enquiries
Focussing on... n Herbarium digitisation n electronic Plant Information Centre
n UK government funding for delivery of services electronically n Resource-discovery interface to multiple Kew data sources (not necessarily at Kew) n Data sources are heterogenous n Simple interface overlaying other systems ePIC Interface Data source
Data sources Interface (java servlet)/JSPs Multi-threaded Java server Request queue Handlers: one per data source one for logging one for spell-checking Requests Data sources Configuration files (XML) Results Architecture
n Web documents indexed using Lucene n Flora Zambesiaca digitised and marked-up with XML n Experimentation with options for query and output via Java servlet –using XSL to output selections –using Lucene to index the XML –importing the XML into a database n Other texts - jury still out, but Lucene route looks promising Texts
Feedback n mechanisms n Web usability testing/focus groups n Logging –Quantitative success levels of usage, patterns & trends beware: crawlers, testing & development staff, harvesters referring URLs, Google link: popularity of site country, domain –Qualitative success success of queries esp. zero hits (spelling, common names, families) performance & system monitoring number of queries per session, return visits results pages viewed
World distribution of queries
Future n More data sources, including texts and images n Hierarchical browsing front-end based around revamped Brummitt Families & Genera with phylogenetic classification n Looking forward to –using the GBIF Names Service… –links with DiGIR/BioCASE resources...