NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation.

Slides:



Advertisements
Similar presentations
CollectionSpace is an open-source, web- based software application for the description, management, and dissemination of museum collections information.
Advertisements

CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Heteroptera: True Bugs 7 infraorders 85 families 40,000 described species.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Virtualizing Entomology Collection Student: Di Wang (Alan) Sponsors: John Marris: Curator, Entomology Research Museum Stuart Charters: Department of Applied.
IOWAccess Project Request DPS Proposal Reports Online Project Missing Persons Information Clearinghouse (MPIC) May 11, 2005.
NYBG + KE EMu The New York Botanical Garden + KE EMu Melissa Tulig Botanical Information Management.
Web-based Specimen Databasing: Lessons from the Plant Bug Planetary Biodiversity Inventory Project presented by Randall T. Schuh Curator and Chair Division.
Importing Transfer Equivalencies: How to Maximize Efficiency How Columbia College Office of Registrar improved productivity through third party solutions.
VegBank.org: a Permanent, Open-Access Archive for Vegetation Plot Data. Michael T. Lee 1, Michael D. Jennings 2, Robert K. Peet 1. Interacting with the.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2005.
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
Database Management: Getting Data Together Chapter 14.
Integration of the UC Davis Biological Collections Data via a Web Portal [A Pilot Project] To develop a Web Portal allowing better & more use of the information.
Discovering Effective Workflows How can iDigBio help the biological and paleontological community with workflow development? support from NSF grant: Advancing.
Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Katja Seltmann, NSF ADBC Digitization TCN, iDigBio Paleocollections.
Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations NSF ADBC Digitization TCN Melissa Tulig, Toby Schuh & Rob.
Bar|Scan ® Asset Inventory System The leader in asset and inventory management.
IDigBio Augmenting OCR Workshop October 1, 2012 Plants, Herbivores, and Parasitoids NSF ADBC Digitization TCN Kimberly Watson.
IDigBio Botany 2012 Digitization Workshop July 12, 2012 Plants, Herbivores, and Parasitoids NSF ADBC Digitization TCN Kimberly Watson, Melissa Tulig.
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
PROCAL MULTI DISCIPLINE CALIBRATION SOFTWARE CALIBRATION PROCEDURE MANAGEMENT CONFIGURATION & CUSTOMISATION STAND-ALONE CERTIFICATE PRINTING.
Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
The Role of Small Herbaria in Large Digitization Projects Chris Neefus, Albion Hodgdon Herbarium (NHA) University of New Hampshire, Durham, New Hampshire,
NSF EF Welcome to Summit III University of Florida Florida State University.
ALLOWS FOR efficient computerization and management of biological collections and mobilization of specimen information onto the Internet.ALLOWS FOR efficient.
Public Participation in Digitization of Biodiversity Specimens Workshop Julie Speelman September 28, 2012.
Background on USPS mail forwarding operations Overview of PARS
What’s Important Is Information … and We Have Specimens, Too! Neftali Camacho and Darolyn Striley Natural History Museum of Los Angeles County We use databases.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Trimble Connected Community
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
Save Money, Forget Printing Anything - Go Paperless!
The Encyclopedia of Life: A Web Site for Every Species James Edwards Executive Director, EOL Barcode of Life Conference Taipei 20 September 2007.
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
SCAN Survey Results: Engaging the Public with Insect Digitization Workflows Dr. Melody Basham Hasbrouck Insect Collection Outreach Specialist Project Director.
Plants, Herbivores, and Parasitoids: A Model System for the Study of Tri-trophic Associations Robert Naczi 1, Melissa Tulig 1, Richard Rabeler 2, Robert.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Presented by: Michael Bevans Information Manager for Digitization
Plenary meeting 2015 – Chania - Crete CASCADE Data Services Yusuf Yigini, Panos Panagos, Martha B. Dunbar Joint Research Centre - European Commission.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Upgrading to IBM Cognos 10
University of Florida Florida State University
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
BioData a new bioassessment database for the USGS Briefing for the CDI
1 Integrated Services Program The Virginia Metadata Training Workshop Summer, 2006 Lyle Hornbaker Integrated Services Program
EASI a free web database application for collecting and managing monitoring records.
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Office Server Specific Web content management –Page structure, layouts, and controls –Publishing.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Katja Seltmann, TTD-TCN Project Manager Public Participation.
Kiran Barn, PI Users Conference 2000Slide 1 Kiran Barn, Principal Engineer. Slide 1Kiran Barn, PI Users Conference 2000.
Global Digital Format Registry Progress Andrea Goethals, Harvard University Library NDIIPP Digital Preservation Partners’ Meeting Arlington, VA July 9,
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
IABIN Pollinator Thematic Network: Overview Washington, DC 28 October 2008 Michael Ruggiero Smithsonian Institution, USA
The William and Linda Steere Herbarium The New York Botanical Garden
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
2012 TELPAS Online Testing & Data Collection. Disclaimer  These slides have been prepared by the Student Assessment Division of the Texas Education Agency.
Tibco Online Training. About us Hyderabadsys Online Training Institute ensuring accomplished carrier in IT Industry. Hyderabadsys provides best online.
Forum to improve your experience entering data into SRDR 1 SRDR is being developed and maintained by the Brown EPC under contract with the Agency for Healthcare.
Introducing ART UCSF’s Application, Review and Tracking (ART) System
WHY VIDEO SURVELLIANCE
Tri-Trophic Thematic Collection Network
Sue Sentance & Philip Howlett
WHY VIDEO SURVELLIANCE
Academy Hub An eUnomia Factory Solution.
Presentation transcript:

NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation by Randall Schuh, American Museum of Natural History Rob Naczi, New York Botanical Garden Christiane Weirauch, University of California Riverside Katja Seltmann, American Museum of Natural History,

The Tri-Trophic Approach Capturing Data for the Nearctic Biota 85% of 11,000 Hemiptera from the Nearctic are herbivorous with high host specificity Bias in plant groups attacked, e.g.,, Pinaceae, Poaceae, Asteraceae, Chenopodiaceae, Rosaceae Some serious agricultural pests (armored scales, mealy bugs, potato leafhoppers, Lygus bugs) Vectors of viral and bacterial diseases (green peach aphid is a vector of over 100 plant viruses) Parasitic Hymenoptera are beneficial as biological control agents

MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS Botanical Institutions

MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS SEINET CCH CPNH Botanical Institutions Botanical Data Providers

MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS SEINET CCH CPNH AMNH CDFA UCRC CAS BPBM MEM CMNH INHS CUIC CSUC TAMU OSAC NCSU SEMC UDCC EMEC UMEC UKIC Botanical Institutions Botanical Data Providers Entomological Collections

Project management Steering Committee of 10 PIs + Project Manager ▫ Decision-making on overall project goals, directions, and progress Full-time Project Manager at AMNH (Katja Seltmann) ▫ Day-to-day project management, technical capability, data analysis, training of entomology partners, vetting and upload of authority files, centralized georeferencing Full-time Project Coordinator at NYBG (Kim Watson) ▫ Training of botany partners, barcoding of NYBG specimens, and label-data capture for all partner institutions

Entomological Databasing

Streamlined Interface for Rapid Data Entry Taxon names Locality data Collection Events Specimen Data Host names

Database Attributes Web enabled Open-source software Centralized data storage, backup, and management Database Benefits Single-product management Simplified user training Centralized authority-file management Centralized georeferencing Data aggregation shifted to HUB and DiscoverLife.org

Authority Files Botanical Tropicos database used across entire project Entomological Published catalogs and unpublished lists from specialists Objectives Present uniform up-to-date taxonomy Reduce decision making by data-entry personnel Limit entry of new names by data-entry personnel

Data Aggregation and Dissemination leveraging DiscoverLife.org

Approaches to Outreach AMNH Short Course in Collection Databasing Fundamentals Train graduate-students through participant-support funding Involve students from multiple graduate programs Provide fundamentals, including database options, data structures, unique specimen identification, specimen handling, georeferencing, research tools, data dissemination Undergraduate Research Projects REU projects joining project data to student research involvement Community Outreach

Rob Naczi New York Botanical Garden

Botanical Specimen Imaging

Insect Specimen Imaging Image representative specimens for each species Use existing imaging stations at partner institutions About 30% of Hemiptera are already imaged Expect to produce about 20,000 new images

Use of OCR for Populating Botanical Records Workflow jpgs of specimen sheets batch-cropped to labels labels saved as new set of jpgs, then exported to ABBYY Fine Reader 11 Corporate Edition overnight, labels batch-processed through ABBYY each OCR output file saved as individual text file tied to barcode no. individual text files merged into Excel spreadsheet, in which data can be searched, grouped, and parsed parsed fields pushed to database Challenges increasing accuracy of parsing hand-written labels (now experimenting with out-sourcing)

Data Storage Issues Botany botanical images are valuable products of our digitization efforts, but also challenges, due to storage demands our concern is with long-term storage (archiving) of uncompressed, original images have encouraged home institutions of our partners to step up, but some unable/unwilling our solution for now is storage on portable drives, but this is tenuous fix and not reliable enough for truly archival storage Entomology no major issues

Christiane Weirauch University of California Riverside

Subcontract Management Setup 7 collaborating institutions, 27 subawards Benefit: long-term data capture across >30 institutions Issues 1) Delays: administrative and accounting issues 2) Database selection: which one to use? 3) Training: onsite versus remote training? 4) Tracking productivity of subawards not using PBI database Solutions/suggestions 1) Streamlined administrative and accounting procedures 2) Encourage use of a default database; more discussion 3) Combination of onsite and remote training and monitoring 4) Regular contact with subawards

Unique Specimen Identifiers (USIs) AMNH Matrix-code labels Setup: Matrix codes (barcode scanner) and string of prefix and 8-digit number (human eye) encode the same unique identifier Benefit: Tracking of specimens; connect images to records Format: Prefix (8 characters): acronym and identifier: e.g., UCRC_ENT XXXXXXXX Non-standard USIs: accepted in the database Exceptions: collections that were previously databased without USIs (e.g., Aphidoidea, certain mirid taxa)

Collection Staging Organizing, sorting, and identifying specimens in preparation for databasing Importance: highest identification level and accuracy will yield most useful data for future applications Priority: well-curated and well-identified collections TTD: limited budget for staging by experts; very successful for, e.g., Miridae and Membracidae Issue: routine staging more time-consuming than anticipated Possible solution: budget for graduate students or post docs to help with staging (and training/supervision of databasing crew)

Tri-trophic concept: Hemiptera, plants, parasitoids Capture of host data New TTD records: 26% with host records (compared to 24% previously databased); added >800 new hosts Challenges of integrating parasitoid data Level of identification of parasitoids (undescribed species; accurate identification requires skilled personnel) Level of host identification (e.g., “white fly”) Incorporation of host information from secondary sources (e.g., taxonomic literature)?  On the right track; prioritize specimens with quality host records & integrate secondary host information

Katja Seltmann The American Museum of Natural History

Efficiency of Data Capture: Insects Total as of October 17, 2012 = 198,409 ▫ Includes Illinois, Texas, and Kansas ▫ All 20 subcontracts are digitizing now ▫ 53 contributors for ttd-tcn project Numbers from NHCR database (central database at AMNH – 11 subcontracts) $20,000 in equipment costs Specimens per min average: 3-3.5min/specimen (range 1.2-6) Cost per specimen: $.93 (includes equipment) Peak in July (more hours digitizing) 65 collecting events on Christmas Day

Efficiency of Data Capture: Plants All but three institutions up and running As of October 9, 2012 have 102,651 images ▫ 3 of 15 institutions not yet begun 4 plant collections report: ▫ $ equipment costs ▫ $.73 cents a specimen image ▫ The unmentioned curator volunteerism  4-8 hrs/week depending on institution/taxon  ~19 hours a week total

Training Methods: Insects (NHCR Database) Curators also training (sexing specimens, database) Online training via Skype ▫ Digitizers clubhouse (building community) ▫ Online manuals ▫ Online videos ▫ Remote training Using central db can access quality of data ▫ Flag when new name is entered ▫ Flag when more than 10 specimens entered in one min by one person ▫ Flag when exact duplicate collecting events or localities (check training)

Training Methods: Plants ▫ Site visits to subcontract institutions  Kim Watson, Melissa Tulig  Install imaging equipment  Personal involvement

Quality Assessment of Transformed Records (NHCR) Determination Completeness Note Language (A,B,B) ; (A,A,A) ; (A,C,B)

Present total: Canada USA Mexico Georeferencing: NHCR database 130,000 specimen records

Georeferencing: NHCR database GEOLocate (North America) Discover Life validation Centralized and controlled georeferencing (NYBG, AMNH) Volunteer georeferencing

Difficult data Issues: specimen relationships

Difficult data Issues: means for curation?

Summary and Predictions: over 50,000 locality records from NHCR will reach 1 million new specimen records for insects (harder to predict for plants at the moment) less than $1 a specimen (inclusive) Arthropod (NHCR) data concerns will become more central as other groups come online

Thanks to National Science Foundation co-PIs and collaborators