Discovering Effective Workflows How can iDigBio help the biological and paleontological community with workflow development? support from NSF grant: Advancing.

Slides:



Advertisements
Similar presentations
IPNI & PhytoKeys Integration Nicky Nicolson (RBG Kew)
Advertisements

IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Morphbank Image Repository Plus… Paleocollections Workshop April 26 – 28, 2012 Deborah L. Paul Support from NSF grants: Biological Databases.
NYBG + KE EMu The New York Botanical Garden + KE EMu Melissa Tulig Botanical Information Management.
BGBM - Biodiversity Informatics04 June 2013 How the specimen data is organised and published at BGBM.
Web-based Specimen Databasing: Lessons from the Plant Bug Planetary Biodiversity Inventory Project presented by Randall T. Schuh Curator and Chair Division.
Best Practices for Managing & Motivating the Digitizers Larry Gall Computer Systems Office Yale Peabody Museum of Natural History.
FOSSIL INSECT DIGITIZATION WORKFLOW AT THE UNIVERSITY OF COLORADO Talia Karim 1, Lindsay Walker 1, Richard Levy 2 1 CU Museum of Natural History 2 Denver.
Overview of the NMNH Collection Level Index (CLI) National Museum of Natural History (NMNH) Tom Hollowell 14 Oct 2014.
BiodIS K-State Biodiversity Information System David Allen and Mike Haddock K-State Libraries Coalition for Networked Information December 15, 2009.
People, Culture, Change Social Issues in Collaborative Digitization:
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio Minimum Information Standards for Scientific Collections (MISC)/Authority Files Working Group Gil Nelson Andréa Matsunaga (on behalf of the WG)
Scratchpads Publishing biodiversity: The interplay between Scratchpads and the Biodiversity Data Journal Dr Dimitrios Koureas Biodiversity Informatics.
Biodiversity Heritage Library by Connie Rinaldo. Overview History EOL/BHL: WHY? Members/Collaborators Process Governance Sustainability: Legal and Financial.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
The Role of Small Herbaria in Large Digitization Projects Chris Neefus, Albion Hodgdon Herbarium (NHA) University of New Hampshire, Durham, New Hampshire,
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
The use of OCR in the digitisation of herbarium specimens Robyn E Drinkwater, Robert Cubey & Elspeth Haston.
NSF EF Welcome to Summit III University of Florida Florida State University.
Roles and Goals Greg Riccardi. iDigBio People University of Florida o Larry Page, Jose Fortes, Pamela Soltis, Bruce McFadden, Renato Figueiredo, Reed.
1st iDigBio – BRIT Hackathon iDigBio Augmenting Optical Character Recognition Working Group (AOCR wg) February 13 – 14, 2013.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Advanced Computing and Information Systems laboratory iDigBio Cloud and Appliances: Concept, Processes and Progress Jose Fortes (on behalf of the iDigBio.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
Developing a cyberinfrastructure: Experiences from a regional network of plant taxonomists Zack Murrell, Derick Poindexter, and Michael Denslow Appalachian.
Update from the Entomological Society of America (ESA) Systematics, Evolution, and Biodiversity (SysEB) Section Symposium: From Voucher.
Herbarium Digitization Workshop Institute for Digital Information & Scientific Communication – Florida State University 1 Digitization Workflows Gil Nelson.
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
SCAN Survey Results: Engaging the Public with Insect Digitization Workflows Dr. Melody Basham Hasbrouck Insect Collection Outreach Specialist Project Director.
Data Curation Education and Biological Information Specialists DigCCurr 2007 Chapel Hill, April 20, 2007 P. Bryan Heidorn, Carole L. Palmer, Melissa H.
Resource Identification for a Biological Collection Information Service in Europe An introduction to the BioCISE project Walter G. Berendsohn Botanical.
Uganda Science Digital Library (USDL) Digitizing and publishing documents Bergen – Makerere visit February 2005.
P.W. Sweeney Yale Peabody Museum of Natural History Mobilizing New England vascular plant data to track environmental change: an overview and preliminary.
Tom Garnett April 12, 2007 Smithsonian Institution Libraries National Museum of Natural History Board Science Committee Meeting Biodiversity Heritage Library.
Digital Atlases of Fossil Collections: New Resources for the Public to Identify and Understand Ancient Biodiversity Jonathan R. Hendricks Dept. of Geology,
Biodiversity Informatics at the Natural History Museum Ed Baker Terrestrial Invertebrates, Department of Life Sciences & NHM Informatics Initiative
University of Florida Florida State University
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Raw Data Cleaning, Validation and Enhancement The Field Museum - Chicago, Illinois iDigBio Entomology Digitization Workshop Deborah Paul, iDigBio April.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
AADAPT Workshop South Asia Goa, December 17-21, 2009 Maria Isabel Beltran 1.
Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Katja Seltmann, TTD-TCN Project Manager Public Participation.
The Macroalgal Herbarium Consortium Accessing 150 Years of Specimen Data to Understand Changes in the Marine/Aquatic Environment Janet Sullivan and Chris.
KE Software presentation at: Icelandic Institute of Natural History 9 September 2005.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Consultant Advance Research Team. Outline UNDERSTANDING M&E DATA NEEDS PEOPLE, PARTNERSHIP AND PLANNING 1.Organizational structures with HIV M&E functions.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
A superior collections management system for the world’s largest: Museums Art Galleries Historical Societies Herbaria Botanic Gardens KE EMu.
IDigBio Georeferencing Working Group (GWG) Summit 2012 October 23 – 24 reporting: David Bloom, Shari Ellis, Debbie Paul.
HISCOM An Australian Virtual Herbarium Jim Croft Australian National Herbarium.
The William and Linda Steere Herbarium The New York Botanical Garden
Gil Nelson (on behalf of the WG) iDigBio Summit, Gainesville October , 2012 DROID DEVELOPING ROBUST OBJECT TO IMAGE TO DATA WORKFLOWS.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
The EDIT Partnership Network of 25 taxonomic institutions with the aim to integrate research and improve the production of knowledge Initiated by the.
Collections Management A superior collections management system for the world’s largest: Museums Art Galleries Historical Societies Herbaria Botanic Gardens.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Crowd-sourcing, Public Participation, and Data Enrichment – Using crowd-sourcing tools Biological Collections Digitisation in the Pacific , Symposium.
Summit 2017 Breakout Group 2: Data Management (DM)
RCN Development of an Online Database to Enhance the Conservation of SGCN Invertebrates in the Northeastern Region James W. Fetzner Jr. & John.
DATA COLLECTION, MANAGEMENT AND ANALYSIS
Biodiversity Informatics 101
Encouraging the Proliferation of Digital Data
Natural History Collections (NHC) Biodiversity Data Informatics 101
Data Capture Process Stages
Presentation transcript:

Discovering Effective Workflows How can iDigBio help the biological and paleontological community with workflow development? support from NSF grant: Advancing Digitization of Biological Collections Program (#EF ) Deborah Paul, Gil Nelson Paleocollections Workshop April 26 – 28, 2012

Overview Digitization survey Observing Collections Finding groups of common activities Workflow patterns Some important principles of effective workflows Examples of real workflows Digitization maturity

Informal Digitization Survey iDigBio staff compiled a survey – collection type information – protocols – specimen handling logistics – imaging – specimen data entry Community feedback: Survey helped – finding gaps in documentation – helping with writing of workflows & protocols – organizing workflows for sharing & archiving

Observing Collections iDigBio staff visited – 28 Collections in 10 different museums – Collections varied in size, kind, staffing entomology invertebrate paleontology invertebrate zoology ornithology botany vertebrate paleontology zoology

Characterizing Workflows We can divide activities into coherent groups Task Clusters – pre-digitization curation – data capture & processing – imaging capture – image processing – image storage All groups share concern for – Data Quality – Data Integrity

5 task clusters, workflow patterns blue shading indicates specimen handling steps

Pre-digitization Curation or “Staging” Image Capture Data Capture Image Processing Image/Data Storage Geo-referencing Personnel Written Protocols Biodiversity Informatics Manager 5 Task Clusters and Key Elements

5 important principles of effective workflows… The Bioinformatics Manager is key – protocols, workflows, data entry, imaging, storage, hardware, staffing … People do the digitization – Staff Processes and workflows vary according to – Type of collections: the physical specimens – Space – Labels and label formats Workflow (WF) and protocols – Must be carefully documented – Can be improved iteratively – Must be evaluated Communication among projects is crucial

workflow example finding students interview jobs: sorting by taxon, taxonomy verification, grouping by collecting event, data entry, imaging training re-box collecting event sort check taxon names enter taxon names rank sort pre-curation database entry into KE EMu image exemplars publish to web immediately hire students

workflow evolution: Things people changed tweezers autonomy to alleviate boredom keystroke short cuts ocr to validate scanned barcode entry capturing feedback from staff tracking flow – what happens when there’s a problem? – where do problems go? – losing information Institutional level – Coordinated effort – Ready and willing to try a new strategy – Avoiding the NIH syndrome

Documentation of Workflows Workflow (WF) – Documentation of WFs the informal survey is a start – Improving WFs Talking with others / comparing – Evaluating WFs WF mockup Example – for upcoming DROID workshop

Evaluating a Workflow Workflow (WF) – Documentation of WFs – Improving WFs Talking with others / comparing – Evaluating WFs See WF Mockups – for DROID workshop Pre Digitization Curation Tasks must be done before digitzation could be done at or after digitization Non expert Crowd Source? a step that could be automated a task needing QA / QC Can do with existing machine (Kirtas) Authority file exists or would be useful a physical tool exists easy to compute time / costs for this task formula exists interview staff hire staff train staff decide what to digitize pull specimens sort specimens (e.g., by taxon, sex, geographic region, collecting event, collector) add taxon names to database update taxonomic identification on specimens (e.g., vet type specimens)

Digitization Maturity? Six maturity levels for core digitization activities

What’s your Digitization Maturity? Find out here …on page 67+ Digitisation: A strategic approach for natural history collections. Canberra, Australia, CSIRO, Author: Bryan Kalms Available at content/uploads/2011/10/Digitisation-guide pdf. content/uploads/2011/10/Digitisation-guide pdf

What’s your Digitization Maturity? Level 1, 5

Assessing Digitization Workflows… Reed Beaman, James Macklin, Michael Donoghue, James Hanken Overcoming the Digitization Bottleneck in Natural History Collections: A summary report on a workshop held 7 – 9 September 2006 at Harvard University. [Accessed January 2012]. Overcoming the Digitization Bottleneck in Natural History Collections: A summary report on a workshop held 7 – 9 September 2006 at Harvard University. ĺñigo Granzow-de la Cerda and James H. Beach. December Semi-automated workflows for acquiring specimen data from label images in herbarium collections. Taxon 59 (6): Bryan Kalms. Digitisation: A strategic approach for natural history collections. Canberra, Australia, CSIRO, 2012.Digitisation: A strategic approach for natural history collections. John Tann & Paul Flemons Report: Data capture of specimen labels using volunteers. Australian MuseumReport: Data capture of specimen labels using volunteers. Ana Vollmar, James Alexander Macklin, Linda Ford Natural History Specimen Digitization: Challenges and Concerns. Biodiversity Informatics 7 (1): 93 – 112Natural History Specimen Digitization: Challenges and Concerns. iDigBio Developing Robust Object to Image to Data (iDigBio DROID) Workshop – May 30 – 31, 2012

Acknowledgments & Thank You from American Museum of Natural History (AMNH) Botanical Research Institute of Texas (BRIT) Florida Museum of Natural History (FLMNH) Florida State University (FSU) Harvard Herbarium (HUH) Museum of Comparative Zoology (Harvard) New York Botanical Garden (NYBG) Yale Peabody Museum (YPM) Southeast Regional Network of Expertise and Collections (SERNEC) Specify Software Project (University of Kansas) Symbiota Software Project (Arizona State University) Tall Timbers Research Station and Land Conservancy (TTRS) Tulane University Museum of Natural History University of Kansas Biodiversity Institute Entomology Department Valdosta State University (VSU) and all participants at the iDigBio Paleocollections Workshop