 Goals and Scope  Research Question  Overall Workflow  Imaging Approach  OCR, NLP, Geo-referencing  Outreach and Crowd Sourcing.

Slides:



Advertisements
Similar presentations
AUSTRALIA’S VIRTUAL HERBARIUM
Advertisements

New Developments in Library and Archives Canadas ETD Program 11 th International Symposium on ETDs Aberdeen, Scotland, June 5, 2008 Sharon Reeves, Manager,
Processing New Herbarium Collections Using EMu The last bits of paper Nicole Tarnowsky.
An Electronic Flora of South Australia – Current and future Towards a common approach to electronic floras workshop 3-4 December 2007.
SpeciesLink The Brazilian experience on setting up a network Renato De Giovanni Centro de Referência em Informação Ambiental, CrIA.
What is a Flora? Peter Hovenkamp. What is not a Flora? Labwork/ecology paper Species selection on non-taxonomic criteria No identification tool Character.
ETIS+: European Transport Policy Information System - Development and Implementation of Data Collection Methodology for EU Transport Modelling Funded by.
Summit 2012 October 23 – 24 reporting: Edward Gilbert, Debbie Paul.
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Sylvia OrliSylvia Orli Department of BotanyDepartment of Botany National Museum of Natural HistoryNational Museum of Natural History Smithsonian InstitutionSmithsonian.
SWITCH (S OUTH W EST I DAHO : T HE C OMPREHENSIVE H ERBARIUM ) Alexa DiNicola, The College of Idaho Dr. Don Mansfield, The College of Idaho Dr. James Smith,
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Regional Uses of Digitized Herbarium Data in the Pacific Northwest Regional Uses of Digitized Herbarium Data in the Pacific Northwest Ben Legler Consortium.
National Herbarium of New South Wales Royal Botanic Gardens & Domain Trust, Sydney New South Wales Flora Online Karen Wilson and Gary Chapple.
Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals Nico Franz 1, Corinna Gries 2, Thomas Nash III 2 & Edward Gilbert.
21 st CENTURY FLORAS New technologies to speed the process Arthur D. Chapman
Geodatabase basic. The geodatabase The geodatabase is a collection of geographic datasets of various types used in ArcGIS and managed in either a file.
Web development  World Wide Web (web) is the Internet system for hypertext linking.  A hypertext document (web page) is an online document. It contains.
Communiqué 5 (CQ5) WCM Author Training. Course Topics  Logging into CQ5  Introduction to CQ5  Comparing Collage to CQ5  Basic Navigation  Digital.
Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California.
Program Wednesday – Welcome and presentation, coffee – Presentation of Picturae – digitization projects (how much time do you.
The Role of Small Herbaria in Large Digitization Projects Chris Neefus, Albion Hodgdon Herbarium (NHA) University of New Hampshire, Durham, New Hampshire,
Virtual Federal Herbarium Prototype. What is a virtual federal herbarium? A “library” of specimen data and images of plants and fungi A searchable public.
The Macroalgal Herbarium Consortium ACCESSING 150 YEARS OF SPECIMEN DATA TO UNDERSTAND CHANGES IN THE MARINE/AQUATIC ENVIRONMENT.
NSF EF Welcome to Summit III University of Florida Florida State University.
Magnolia grandiFLORA: digitally linking herbaria to support botanical research and education in Mississippi Collaborators Delta State University: Nina.
ALLOWS FOR efficient computerization and management of biological collections and mobilization of specimen information onto the Internet.ALLOWS FOR efficient.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Update from the Entomological Society of America (ESA) Systematics, Evolution, and Biodiversity (SysEB) Section Symposium: From Voucher.
2.3 million specimens, 65 institutions, 1 year later DIGITIZING 'ALL' NORTH AMERICAN LICHEN AND BRYOPHYTE SPECIMENS Corinna Gries Edward Gilbert Thomas.
Meeting Discussions: Day 1  Issues related to data availability, interoperability and sharing in the region  Proposals on how to share biodiversity information.
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
An On-line Collaborative Data Management System Roger Curry 1, Cameron Kiddle 1, Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre,
P.W. Sweeney Yale Peabody Museum of Natural History Mobilizing New England vascular plant data to track environmental change: an overview and preliminary.
OCR and SALIX Parsing Daryl Lafferty Arizona State University October, 2012.
OCR implementation in The Caribbean Plants Digitization Project A project to image and catalog over 150,000 Caribbean specimens at the New York Botanical.
Field Work, Herbaria, Databases, Floras, and Monographs for Plant Systematics Spring 2014.
University of Florida Florida State University
 How are changes in distribution patterns of lichens and bryophytes over time correlated with man-made environmental changes?  How accurately can we.
Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin.
 Word Processing  Spreadsheets  Presentations  Drawings  Forms.
The Macroalgal Herbarium Consortium ACCESSING 150 YEARS OF SPECIMEN DATA TO UNDERSTAND CHANGES IN THE MARINE/AQUATIC ENVIRONMENT.
Corinna Gries Edward Gilbert Thomas H. Nash III. Lichens Bryophytes Climate Change  NSF ADBC funding 2011 ~ 2.3 million specimen (90%) ○ 900,000 lichens.
2.3 million specimens, 65 institutions, 1 year later DIGITIZING 'ALL' NORTH AMERICAN LICHEN AND BRYOPHYTE SPECIMENS Corinna Gries Edward Gilbert Thomas.
The Macroalgal Herbarium Consortium Accessing 150 Years of Specimen Data to Understand Changes in the Marine/Aquatic Environment Janet Sullivan and Chris.
Canadensys update. Canadensys: what is it? A Canadian network of 11 universities, 5 botanical gardens and 2 museums. Over 25 biological collections and.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Edward Gilbert Corinna Gries Thomas H. Nash III Robert Anglin.
Moodle Features Demo 1 Moodle General Features MOODLE Modular Object Oriented Dynamic Learning Environment Moodle is a course management system (CMS) -
HISCOM An Australian Virtual Herbarium Jim Croft Australian National Herbarium.
The William and Linda Steere Herbarium The New York Botanical Garden
 Research Question  Goals and Scope  Digitization Workflow  Geo-referencing  Dissemination  Outreach and Crowd Sourcing.
Mediterranean Plant Collections: The computerised way forward.
AUSTRALIA’S VIRTUAL HERBARIUM A national collaborative model for integrated access to distributed biological information Australian National Herbarium.
Dr. Patricia Mergen Biology Department Head of the Cyber-taxonomy and Biodiversity Information Unit Royal Museum For Central Africa (RMCA) Federal Scientific.
Centre for Environmental Data and Recording - CEDaR Established in 1995 to collect, collate and disseminate all biodiversity and geodiversity records for.
Jason W. Karl, Ph.D. Jeffrey K. Gillan Jason W. Karl, Ph.D. Jeffrey K. Gillan 23 October 2013 Ty Montgomery Richard Bliss Ty Montgomery Richard Bliss
Solving document management issues for over 30 years Bob Hawley (800) x140 Welcome to Metafile Information.
What are our collections being used for?
Papua New Forest Research Institute
Getting to know the data, Getting to know all about the data
International Congress of Entomology, Orlando
Herbaria Libraries of dried, pressed (and/or liquid-preserved) plants, algae, and fungi, arranged and labeled so that specific specimens can easily be.
Herbaria Libraries of dried, pressed (and/or liquid-preserved) plants, algae, and fungi, arranged and labelled so that specific specimens can easily be.
Website Testing Checklist
Dr. Patricia Mergen Biology Department
INHS Insect collection digitization workflow
Presentation transcript:

 Goals and Scope  Research Question  Overall Workflow  Imaging Approach  OCR, NLP, Geo-referencing  Outreach and Crowd Sourcing

 Collections  16 digitization centers (collaborators)  > 60 non-governmental US herbaria (95%)  ~ 2.3 million specimen (90%) ▪ 900,000 lichens ▪ 1.4 million bryophytes  Mobilizing existing  digital records

 Geographic Scope  Mexico  US  Canada

 National Portals  

 Lichen Consortium  Started in 2009  16 Collections  ~ 600,000 Records  Bryophyte Consortium  Started in 2010  7 Collections  > 800,000 Records

 Virtual Flora  Dynamic Checklists  Published Checklists  Dynamic Keys  Taxonomic Information  Species Pages  Distribution Maps  Observations

 How are changes in distribution patterns of lichens and bryophytes over time correlated with man-made environmental changes?  How accurately can we predict where specific species can be found using existing herbarium data?

 Different evolutionarily but similar in size and habitats occupied (epiphytes, soil mats, and rocks)  Both dominate much of the arctic and northern boreal regions (lichens in upland areas and bryophytes in wet habitats.  Both also occur commonly in many other ecosystems (deserts to tropics)  Bryophytes, particularly in peat bogs store a major part of the worlds organic carbon

Take Image File name: barcode Create Skeleton File Darwin Core Fields: barcode, species name + Upload to folder on FTP server Florida Image processing Create/Merge Record in Portal Link Image unprocessed Find Duplicates and Edit Record in Portal Pending review Existing Digitized Records Darwin Core+ Approve Record in Portal reviewed Central Processing OCR NLP Bulk processing Preliminary geo-referencing processed Manage Records in Portal Upload to folder on FTP server Florida Upload Record in Portal Link Image reviewed +

 Image all specimen / specimen labels  Upload to portal  Record exists => link image to existing record  Record absent => create empty record  Automated OCR label  Block of raw text => database  Automated NLP (field parsing)  Review data  Keystroke full record  Collector name & number => look for dups  Reparse full record => learnable parsers

 Tesseract V3  Dual cycle  Automatic  Manual review  Expected hurtles  Handwritten labels  Old fonts  Faded labels  Form labels  Adjustable image variables ¢_].L.|»‘¢.'».f.'._..‘~,(.J fin-x‘*\'a:"511z:1 wf.~\:'i/.onli State University P.’~.r"~2=,_. gg J:.2 " J*J*" ” (=:\‘-“ax "»..'\-12 ‘ “ "‘ ;T~;‘~7i?»-1_1_\f;>sf`;,' ESX Z»ie+‘-». “~'.»te;~:i_.t<» ff`t;~f3":.f.“ » »4 xx,, """‘“”T"’.t;;a¢f~rus ’ V4 J 'if. r°'° M '1?nies ivain.) Sav. neutal Station - " '1 ~»r';;4-\P ` 1. T11./P..,J..-. ELEV. ' `.fJL_\ LATL Q _‘ 1 _ Y’ DATE _,. W5. (> f-, -:‘; i f>i_T ~~. A 1: ». v\.-v »~. 4. a xvala 8/27/73 PLANTS OF NEW r~1ExIco Herbarium of Arizona State University Parmelia ulophyllodes (Vain.) Sav. COUNTY “°”““ Joranada Experimental Station - New Mexico State University "“““' on Juniperus ELEV. ‘ 4400 EEILLEETUR DATE DU T. H. Nash #7914 8/27/73 T. H. N.

 Dual cycle  Automated after import  Manual review  Various algorithms  Salix, Herbis, Apiary, Symbiota  Symbiota: trainable parsing algorithm  Parsing profiles  Recognize content / format  Look-up tables

 Large number of duplicates  Matching collector, number, date  Matching collection event  Adjacent numbers  Matching collection date  Combined algorithms  Duplicate index  Exsiccati index

 Supervised batch processing  GeoLocate web app using HTML5  Crowdsourcing  Make it fun!

 Weekly extracts of “Review-new”  Output as CSV and XML  Darwin Core  Extended formats  Import into central database main responsibility of collection  Specify import functions

 National Portals are Biodiversity Content Management Systems  User Access Management  Record / Observation Data Entry/Editing  QC Messaging  Taxonomy Management  Species Description Entry/Editing  Image Upload  Character and Character State Entry/Editing

 Extensive Existing Outreach Programs  Volunteer Programs at CAS, F, FH, PH, UC, WTU  Local workshops  Field Courses (Forays, Bio-Blitz)  Training in Taxonomy, Collecting, Preserving  Seminars  Internships

 Volunteer program within LBCC  Outreach at Local Amateur Meetings  Online Community Center ▪ Establish Event Information Center ▪ Online Training ▪ Online Question and Answer Service ▪ Online Seminars and Presentations ▪ Newsletter

 Online Community Center (cont.)  Discussion/Exchange Forum  Number of Records Entered Accounting  Competitions ▪ Data Entry (Monthly, Annual Champoins) ▪ Photography  Fun Facts ▪ The Most Obscure Label

 Online Data Entry/Editing  Sophisticated User and Workflow Management System in SYMBIOTA  Volunteer data entry  Volunteer data editing  Professional quality control

 Automating OCR, NLP, Geo-referencing  Data Quality  Recruiting/maintaining volunteer community

 Michael Adamo  Bruce Allen  Meredith Blackwell  Bill Buck  Alina Freire-Fierro  John Freudenstein  Alan Fryday  David Giblin  Karen Hughes  Steffi Ickert-Bond  Timothy James  Jennifer S. Kluse  Matt Von Konrat  Ben Legler  Tatyana Livshultz  Robert Lücking  Francois Lutzoni  Bob Magill  Andrew Miller  Brent Mishler  Donald Pfister  Richard Rabeler  Malcolm Sargent  Edward Schilling  Michaela Schmull  Blanka Shaw  Jon Shaw  Carol Shearer  Larry StClair  Barbara Thiers Funded by the NSF ADBC program Thomas H. Nash and Edward Gilbert, Marilyn Larsen