Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California.

Similar presentations


Presentation on theme: "Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California."— Presentation transcript:

1 Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California Berkeley, California, USA

2 Who is CalBug? Essig Museum of Entomology California Academy of Sciences California State Collection of Arthropods Bohart Museum, UC Davis Entomology Research Museum, UC Riverside San Diego Natural History Museum LA County Museum Santa Barbara Museum of Natural History

3

4 (Optional) Sort by locality, date, sex, etc. Remove labels, add unique identifier Replace labels, return to collection Manually enter data into MySQL database Online crowd-sourcing of manual data entry Optical Character Recognition (OCR) & Automated data parsing Error checking Geographic referencing Aggregate data in online cache Temporospatial analyses Take digital image, name and save file Digitization workflow Handling & Imaging Data CaptureData Manipulation

5 Why Image Specimens/Labels? Data capture can be done remotely Magnify difficult to read labels Potential for OCR Verbatim digital archive of label data

6 DinoLite 1 st generation - DinoLite digital microscope

7

8 2 nd generation – Digital Camera (Canon G9)

9 Higher resolution Labels flat & unobstructed Scale bar, controlled light Important to add species name to image or file name EMEC218958 Paracotalpa ursina.jpg ~150,000 images waiting to database

10 Manually enter data into MySQL database Online crowd-sourcing of manual data entry Optical Character Recognition (OCR) & Automated data parsing Data capture Using our own MySQL database (EssigDB) Built-in error checking Data carry-over one record to next Taxonomy automatically added “Notes from Nature” Collaboration with Zooniverse Citizen Scientist transcription of labels Collaboration with UC San Diego Improved word spotting & OCR

11

12 Notes from Nature Citizen Science data transcription

13

14

15 Integrating OCR with crowd sourcing o Spotting words within images o Copy-paste, highlight-drag fields o Auto-detecting repeated “words” o eg. species, states, counties o Providing an additional “vote” for transcription consensus

16 The OCR challenge for specimen labels DETECTION: Finding text in a complex matrix Machine-typed vs. hand-written labels Sliding window classifier creating text bounding boxes >95% detection and localization using pixel- overlap measures

17 RECOGNITION: Using Tesseract OCR engine Machine Type 74% accuracy for word-level 82% accuracy for character-level Hand Writing 5.4% accuracy for word-level 9.2% accuracy for character-level Current Progress in OCR recognition

18

19 Where do we go from here? Improved recognition of hand-writing Incorporate OCR into crowd sourcing Develop (semi-) automated data parsing

20 Thank you http://calbug.berkeley.edu


Download ppt "Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California."

Similar presentations


Ads by Google