Deb Paul, iDigBio aOCR WG

Slides:



Advertisements
Similar presentations
Summit 2012 October 23 – 24 reporting: Edward Gilbert, Debbie Paul.
Advertisements

GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
BGBM - Biodiversity Informatics04 June 2013 How the specimen data is organised and published at BGBM.
Unlocking a Biodiversity Resource for Understanding Biotic Interactions, Nutrient Cycling and Human Affairs Wordle based on proposal.
The Digital Facility – Supporting Digitisation Vladimir Blagoderov.
Digitizing Collections of the Angelo State Natural History Collections Marcia A. Revelez Collections Manager Angelo State University.
Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals Nico Franz 1, Corinna Gries 2, Thomas Nash III 2 & Edward Gilbert.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio Minimum Information Standards for Scientific Collections (MISC)/Authority Files Working Group Gil Nelson Andréa Matsunaga (on behalf of the WG)
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
WELCOME PROJECT GROUP MEMBERS  Orhan AKSOY  Rıdvan ÇELEBİ  Ulan BAYALİYEV  Mustafa BAL  Mehmet BIÇAK.
Group No. 4 Members- AKASH AGARWAL (Y08UC010) MAYANK INDORIA (Y08UC080)
Selection Sort
Discovering Effective Workflows How can iDigBio help the biological and paleontological community with workflow development? support from NSF grant: Advancing.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California.
The Macroalgal Herbarium Consortium ACCESSING 150 YEARS OF SPECIMEN DATA TO UNDERSTAND CHANGES IN THE MARINE/AQUATIC ENVIRONMENT.
The use of OCR in the digitisation of herbarium specimens Robyn E Drinkwater, Robert Cubey & Elspeth Haston.
1st iDigBio – BRIT Hackathon iDigBio Augmenting Optical Character Recognition Working Group (AOCR wg) February 13 – 14, 2013.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
Update from the Entomological Society of America (ESA) Systematics, Evolution, and Biodiversity (SysEB) Section Symposium: From Voucher.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
--Caesar Cat.  Write an optical character recognition application that identifies and recognizes printed text within an image.
Convenors: David Harris & Elspeth Haston: RBGE. CETAF members are free to join the CETAF Digitisation Working Group (contact:
OCR and SALIX Parsing Daryl Lafferty Arizona State University October, 2012.
Hugo Woolf CS Research 2009 Morphology based OCR.
OCR implementation in The Caribbean Plants Digitization Project A project to image and catalog over 150,000 Caribbean specimens at the New York Botanical.
Deb Paul, Andrea Matsunaga, Miao Chen, Jason Best, Reed Beaman, Sylvia Orli, William Ulate iDigBio – Notes From Nature Hackathon December 2013 Increasing.
Development of an OCR System Nathan Harmata TJHSST Computer Systems Lab
 How are changes in distribution patterns of lichens and bryophytes over time correlated with man-made environmental changes?  How accurately can we.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin.
Traveling Abroad Model. Purpose To help student chose where they want to study aboard based on, money, requirements, and preferences.
Assistive Technology. Assistive Technology is any tool that helps students with disabilities do things more quickly, easily, or independently. What is.
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
OCR within the digitisation workflow at RBGE Elspeth Haston, Hannah Atkins, Rob Cubey, Robyn Drinkwater, David Harris, Katherine O’Donnell, Martin Pullan.
Gili Werner. Motivation Detecting text in a natural scene is an important part of many Computer Vision tasks.
The Macroalgal Herbarium Consortium ACCESSING 150 YEARS OF SPECIMEN DATA TO UNDERSTAND CHANGES IN THE MARINE/AQUATIC ENVIRONMENT.
AUTOMATED NDNP QUALITY REVIEW Andrew Weidner Project Coordinator, New Mexico Historical Newspapers University of North Texas Libraries: Digital Newspaper.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Presented By Lingzhou Lu & Ziliang Jiao. Domain ● Optical Character Recogntion (OCR) ● Upper-case letters only.
Selection Sort
+ Selection Sort Method Joon Hee Lee August 12, 2012.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin.
IDigBio: Addressing a BIO Big Data Challenge. A. Matsunaga, et al IEEE e-Science. 2013: How iDigBio is Different.
Label Processing Methods for HelpingScience.org Developed by SilverBiology Michael Giddens.
Gil Nelson (on behalf of the WG) iDigBio Summit, Gainesville October , 2012 DROID DEVELOPING ROBUST OBJECT TO IMAGE TO DATA WORKFLOWS.
Royal Botanic Garden Edinburgh Funded mostly by Scottish Government Martin Pullan – Biodiversity informatics David Harris – Herbarium Curator.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Section 2B. Objectives List two reasons why some people prefer alternative methods of input over a standard keyboard or mouse. List three categories of.
M M M M 5. Not Listed
1a) Explain the use of voice recognition in embedded systems. Embedded systems are devices that contain a microprocessor controlling them Embedded systems.

Web Service Exchange Protocols Preliminary Proposal ISO TC37 SC4 WG1 2 September 2013 Pisa, Italy.
Digit Recognition Using SIS Testbed Mengjie Mao. Overview Cycle 1: sequential component AAM training Cycle 2: sequential components Identifier 0 Ten perfect.
What are our collections being used for?
Elspeth Haston, Robyn Drinkwater, Robert Cubey & Ruth Monfries
Data Management: The Data Repatriation Re-integration Step or …
Biodiversity Informatics 101
Morphological Image Processing
Label Name Label Name Label Name Label Name Label Name Label Name
Exploring and archiving Herbarium images
Presentation transcript:

Deb Paul, iDigBio aOCR WG Explore the data…SORT! SORT images: ML, NLP, handwriting, label-finding, sort (collector, country, place, language, researcher) set creation takes advantage of human learning (mastery) geography, handwriting, morphology human preferences (autonomy, purpose) humans are in-the-digitization-loop faster transcription with ordered datasets (RBGE) humans like ordered datasets (RBGE) transcription 30 % faster than typing alone (SALIX) PARSE getting better, share algorithms! label type dependent New Tool for parsing output of OCR output coming. iDigBio aOCR wg MaCC TCN SALIX The use of optical character recognition (OCR) in the digitisation of herbarium specimens October 2013 Biodiversity Information Standards (TDWG) 2013 Conference. Florence, Italy. Robyn E Drinkwater, Robert Cubey, Elspeth Haston Deb Paul, iDigBio aOCR WG