Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California.

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

Summit 2012 October 23 – 24 reporting: Edward Gilbert, Debbie Paul.
Traditional Core & Advanced Capture Techniques. Agenda The Capture Process What’s New in Capture Workflow? Core and optional capture features Imports.
Web-based Specimen Databasing: Lessons from the Plant Bug Planetary Biodiversity Inventory Project presented by Randall T. Schuh Curator and Chair Division.
Importing Transfer Equivalencies: How to Maximize Efficiency How Columbia College Office of Registrar improved productivity through third party solutions.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Data Acquisition Lecture 8. Data Sources  Data Transfer  Getting data from the internet and importing  Data Collection  One of the most expensive.
Reference Manager Making your life easier! Updated September 2007.
Input devices, processing and output devices Hardware Senior I.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
The use of OCR in the digitisation of herbarium specimens Robyn E Drinkwater, Robert Cubey & Elspeth Haston.
Google Earth How to create a Google Earth Tour and place it in your Wiki.
Integrative research using digitized specimens: examples from the Consortium of California Herbaria Brent Mishler University and Jepson Herbaria University.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
CalBugDigitizing California’s Terrestrial Arthropods CalBug: Digitizing California’s Terrestrial Arthropods Peter T Oboyski, Joan Ball, Rosemary Gillespie,
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Microsoft Access 2000 Creating Tables and Relationships.
 By the end of this, you should be able to state the difference between DATE and INFORMAITON.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
SCAN Survey Results: Engaging the Public with Insect Digitization Workflows Dr. Melody Basham Hasbrouck Insect Collection Outreach Specialist Project Director.
VSoft Technologies – Confidential Cheque Image Processing Technology Outsourcing Opportunity.
TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library.
TEAM Basic TotalElectrostatic ManagementAwareness&
Deb Paul, Andrea Matsunaga, Miao Chen, Jason Best, Reed Beaman, Sylvia Orli, William Ulate iDigBio – Notes From Nature Hackathon December 2013 Increasing.
Transfer Credit and Degree Audit: Automation, Workflows and Business Processes to Speed Evaluation Greg von Lehmen The University of Maryland University.
Data Capture Overview United Nations Statistics Division
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Get your hands dirty cleaning data European EMu Users Meeting, 3rd June. - Elizabeth Bruton, Museum of the History of Science, Oxford
Understanding our world.. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Editing Versioned Geodatabases.
The Dark Side of Document Imaging: ‘The Hidden Cost of Capture’
Support.ebsco.com Introduction to EBSCOhost Tutorial.
© 2006 Formic Wednesday 7th November 2007 Formic Scoop Training Mikey Desai.
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Presenter: Tracy Wessler June 5, 2007 The Use of High Speed Data Processing to Capture Census Data U.S. Census Bureau Decennial Response Integration System.
Review of Data Capture. Input Devices What input devices are suitable for data entry? Keyboard Voice Bar Code MICR OMR Smart Cards / Magnetic Stripe cards.
IMu Rapid Data Entry Andrew Brown. Overview Browser-based Desktop Tablet Phone Project-based Authenticated access.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin.
IDigBio: Addressing a BIO Big Data Challenge. A. Matsunaga, et al IEEE e-Science. 2013: How iDigBio is Different.
Walkthrough – Wireframes – for Photo Upload process Purpose: To provide media handling screen to help UCJEPS grant. Proposed: Photographer to use barcode.
Introduction to EBSCOhost
TOPSpro Special Topics VI:TOPSpro for Instructors.
Advanced Informer Features
What are our collections being used for?
Lecture on Input Devices
Tips for Inserting Graphs or Images Tips for Title/Columns Colors
DATA COLLECTION Data Collection Data Verification and Validation.
Tips for Inserting Graphs or Images Tips for Title/Columns Colors
Crowd-sourcing, Public Participation, and Data Enrichment – Using crowd-sourcing tools Biological Collections Digitisation in the Pacific , Symposium.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
DESCRY: Design for Electrical Sensor Character Recognition Yoking
Building A Web-based University Archive
From Historic Ephemera to Economic Data
Introducing OmniPage Ultimate
ZIMS Studbooks Data Tracking, Reports, and Tools
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Data Capture Process Stages
The IsisCB Platform Digitizing and Refactoring a Large Bibliographic Dataset in History of Science Stephen P. Weldon, History of Science Editor, Isis Current.
DATA RECORDS & FILES By Sinkala.
Beyond Description: Metadata for Catalogers in the 21st Century
Introduction to EBSCOhost
Microsoft Office Illustrated Introductory, Windows XP Edition
Guidelines for Microsoft® Office 2013
INHS Insect collection digitization workflow
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
Presentation transcript:

Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California Berkeley, California, USA

Who is CalBug? Essig Museum of Entomology California Academy of Sciences California State Collection of Arthropods Bohart Museum, UC Davis Entomology Research Museum, UC Riverside San Diego Natural History Museum LA County Museum Santa Barbara Museum of Natural History

(Optional) Sort by locality, date, sex, etc. Remove labels, add unique identifier Replace labels, return to collection Manually enter data into MySQL database Online crowd-sourcing of manual data entry Optical Character Recognition (OCR) & Automated data parsing Error checking Geographic referencing Aggregate data in online cache Temporospatial analyses Take digital image, name and save file Digitization workflow Handling & Imaging Data CaptureData Manipulation

Why Image Specimens/Labels? Data capture can be done remotely Magnify difficult to read labels Potential for OCR Verbatim digital archive of label data

DinoLite 1 st generation - DinoLite digital microscope

2 nd generation – Digital Camera (Canon G9)

Higher resolution Labels flat & unobstructed Scale bar, controlled light Important to add species name to image or file name EMEC Paracotalpa ursina.jpg ~150,000 images waiting to database

Manually enter data into MySQL database Online crowd-sourcing of manual data entry Optical Character Recognition (OCR) & Automated data parsing Data capture Using our own MySQL database (EssigDB) Built-in error checking Data carry-over one record to next Taxonomy automatically added “Notes from Nature” Collaboration with Zooniverse Citizen Scientist transcription of labels Collaboration with UC San Diego Improved word spotting & OCR

Notes from Nature Citizen Science data transcription

Integrating OCR with crowd sourcing o Spotting words within images o Copy-paste, highlight-drag fields o Auto-detecting repeated “words” o eg. species, states, counties o Providing an additional “vote” for transcription consensus

The OCR challenge for specimen labels DETECTION: Finding text in a complex matrix Machine-typed vs. hand-written labels Sliding window classifier creating text bounding boxes >95% detection and localization using pixel- overlap measures

RECOGNITION: Using Tesseract OCR engine Machine Type 74% accuracy for word-level 82% accuracy for character-level Hand Writing 5.4% accuracy for word-level 9.2% accuracy for character-level Current Progress in OCR recognition

Where do we go from here? Improved recognition of hand-writing Incorporate OCR into crowd sourcing Develop (semi-) automated data parsing

Thank you