Brief Notes from Kew Mark Jackson Software Applications Manager.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

Web Mining.
Digital Storage Solutions John Southall ESDS Qualidata, University of Essex Sounds Good Improving Sound Archives in the East of England 19th November 2007.
Output Management 10 Ways To Improve The Document Processes In Your Organisation!
Database System Concepts and Architecture
Advanced Web Metrics with Google Analytics By: Carley Brown.
Electronic Theses and Dissertations: Benefits, Issues, and the University of Waterloo Approach
Altman IM Ltd | | capture | convert | route | connect | workflow ScanPath provides enhanced scanning & document processing.
Enterprise Integration Solutions SharePoint Imaging.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.
ARCHIVE IMAGING SEARCHABLE VIA THE WEBPAC Marthie de Kock The Hong Kong Institute of Education 9 December 2002.
Document Solutions Document Solutions Confidential Property of FileMark Corporation Document Solutions Document Solutions October 2009 Document Submission.
Information Retrieval in Practice
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Overview of Search Engines
Databases & Data Warehouses Chapter 3 Database Processing.
Putting it all together for Digital Assets Jon Morley Beck Locey.
Accounting & Financial Services OOA & UCDHS Electronic Document Management System July 2008 Project website:
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
Web Site Performance An analytical approach for benchmarking and tuning.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
WSRF Supported Data Access Service (VO-DAS)‏ Chao Liu, Haijun Tian, Dan Gao, Yang Yang, Yong Lu China-VO National Astronomical Observatories, CAS, China.
2. Database System Concepts and Architecture
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
More Pixels, Less Process: Decision making for minimal processing digitization Amanda Focke, Rice University
Opendap dev - meeting, Boulder, Feb 2007 OPeNDAP infrastructure in European Operational Oceanography T Loubrieu (IFREMER) T Jolibois (CLS)
Agenda  Records Retention Content Management Trends  Demonstration of Technology  Question and Answer Mark Weintraub Business Development Manager Image.
Utilizing OPAC Search Logs and Google Analytics Assessing OPAC Effectiveness and User Search Behavior VALE Users'/NJLA CUS/NJ ACRL Conference January 9,
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
NETWORK HARDWARE AND SOFTWARE MR ROSS UNIT 3 IT APPLICATIONS.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
TAPIR 1.0 Renato De Giovanni, Markus Döring, Javier de la Torre October 2006.
Document Solutions Document Solutions Confidential Property of FileMark Corporation Document Solutions Document Solutions July 2009 Repository for Submission.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
How I spend my money Software architecture course Mohan, Maxim.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
WEB SERVER SOFTWARE FEATURE SETS
Current Approaches to Web Site Development Brian Kelly UK Web Focus UKOLN University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Download Manager software Training Workshop Ostend, Belgium, 20 th May 2014 D.M.A. Schaap - Technical Coordinator.
Introduction. Internet Worldwide collection of computers and computer networks that link people to businesses, governmental agencies, educational institutions,
Getting Started with Quick Fields LAB 103 Jonathan Lai.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digital Repositories Build It & They Will Come Michael J. Bennett Access Services Supervisor C/WMARS,
Finding Full-text Articles in Periodicals in the Library By Barbara J. Hampton, J.D., M.L.S. Reference Librarian Ryan-Matura Library, Sacred Heart U.
Information Retrieval in Practice
WEB TESTING
Search Engine Architecture
What is WWW? The term WWW refers to the World Wide Web or simply the Web. The World Wide Web consists of all the public Web sites connected to the Internet.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Accessing a national digital library: an architecture for the UK DNER
Building Search Systems for Digital Library Collections
TYPES OF SERVER. TYPES OF SERVER What is a server.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
DIGITAL LIBRARY.
Chapter 21 Successfully Implementing The Information System
Presentation transcript:

Brief Notes from Kew Mark Jackson Software Applications Manager

Focussing on... n Herbarium digitisation n electronic Plant Information Centre

Kew Herbarium n Guesstimated –7 million specimens –250,000 types n Less than 5% specimens databased n A variety of personal databases

Preparation for Digitisation n Computerise transactions n Agree and document policy and procedures n Establish core fields (HISPID pending ABCD) n Develop hardware and software infrastructure (e.g. catalogue database, mass storage)

Digitisation Strategy n Curators to barcode, database and image types for loan n Repatriation & research projects –to use infrastructure and core fields –data to be imported into Catalogue (eventually) n Pursue digitisation projects

Specimen imaging n Decision to try to match Cibachrome prints in terms of quality (e.g. suitable for many diagnostic purposes) – 600 dpi delivers 200MB images n Stored as uncompressed (but bzipped) TIFFs n Acquisition of mass storage

HerbScan n A3 flatbed scanner, inverted n Cradle for specimens n Distributed throughout Herbarium

Pros and cons n £30-40,000 n 200MB images barely achievable n 1 image per minute n Fixed n Versatile n £7,500 n 200MB images easily achievable n 10 images per hour n Some mobility n Suited to flat items 200 MB master images (600 dpi scans), based on capturing the level of detail of Cibachromes. Camera HerbScan

HerbCat Client Image Server Images Metadata image enquiries HerbCat enquiries

Focussing on... n Herbarium digitisation n electronic Plant Information Centre

n UK government funding for delivery of services electronically n Resource-discovery interface to multiple Kew data sources (not necessarily at Kew) n Data sources are heterogenous n Simple interface overlaying other systems ePIC Interface Data source

Data sources Interface (java servlet)/JSPs Multi-threaded Java server Request queue Handlers: one per data source one for logging one for spell-checking Requests Data sources Configuration files (XML) Results Architecture

n Web documents indexed using Lucene n Flora Zambesiaca digitised and marked-up with XML n Experimentation with options for query and output via Java servlet –using XSL to output selections –using Lucene to index the XML –importing the XML into a database n Other texts - jury still out, but Lucene route looks promising Texts

Feedback n mechanisms n Web usability testing/focus groups n Logging –Quantitative success levels of usage, patterns & trends beware: crawlers, testing & development staff, harvesters referring URLs, Google link: popularity of site country, domain –Qualitative success success of queries esp. zero hits (spelling, common names, families) performance & system monitoring number of queries per session, return visits results pages viewed

World distribution of queries

Future n More data sources, including texts and images n Hierarchical browsing front-end based around revamped Brummitt Families & Genera with phylogenetic classification n Looking forward to –using the GBIF Names Service… –links with DiGIR/BioCASE resources...