Presentation is loading. Please wait.

Presentation is loading. Please wait.

NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation.

Similar presentations


Presentation on theme: "NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation."— Presentation transcript:

1 NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation by Randall Schuh, American Museum of Natural History Rob Naczi, New York Botanical Garden Christiane Weirauch, University of California Riverside Katja Seltmann, American Museum of Natural History, http://tcn.amnh.org

2 The Tri-Trophic Approach Capturing Data for the Nearctic Biota 85% of 11,000 Hemiptera from the Nearctic are herbivorous with high host specificity Bias in plant groups attacked, e.g.,, Pinaceae, Poaceae, Asteraceae, Chenopodiaceae, Rosaceae Some serious agricultural pests (armored scales, mealy bugs, potato leafhoppers, Lygus bugs) Vectors of viral and bacterial diseases (green peach aphid is a vector of over 100 plant viruses) Parasitic Hymenoptera are beneficial as biological control agents

3 MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS Botanical Institutions

4 MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS SEINET CCH CPNH Botanical Institutions Botanical Data Providers

5 MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS SEINET CCH CPNH AMNH CDFA UCRC CAS BPBM MEM CMNH INHS CUIC CSUC TAMU OSAC NCSU SEMC UDCC EMEC UMEC UKIC Botanical Institutions Botanical Data Providers Entomological Collections

6 Project management Steering Committee of 10 PIs + Project Manager ▫ Decision-making on overall project goals, directions, and progress Full-time Project Manager at AMNH (Katja Seltmann) ▫ Day-to-day project management, technical capability, data analysis, training of entomology partners, vetting and upload of authority files, centralized georeferencing Full-time Project Coordinator at NYBG (Kim Watson) ▫ Training of botany partners, barcoding of NYBG specimens, and label-data capture for all partner institutions

7 Entomological Databasing

8 Streamlined Interface for Rapid Data Entry Taxon names Locality data Collection Events Specimen Data Host names

9 Database Attributes Web enabled Open-source software Centralized data storage, backup, and management Database Benefits Single-product management Simplified user training Centralized authority-file management Centralized georeferencing Data aggregation shifted to HUB and DiscoverLife.org

10 Authority Files Botanical Tropicos database used across entire project Entomological Published catalogs and unpublished lists from specialists Objectives Present uniform up-to-date taxonomy Reduce decision making by data-entry personnel Limit entry of new names by data-entry personnel

11 Data Aggregation and Dissemination ------------------------ leveraging DiscoverLife.org

12

13

14 Approaches to Outreach AMNH Short Course in Collection Databasing Fundamentals Train graduate-students through participant-support funding Involve students from multiple graduate programs Provide fundamentals, including database options, data structures, unique specimen identification, specimen handling, georeferencing, research tools, data dissemination Undergraduate Research Projects REU projects joining project data to student research involvement Community Outreach http://research.amnh.org/pbi/heteropteraspeciespage/

15 Rob Naczi New York Botanical Garden

16 Botanical Specimen Imaging

17 Insect Specimen Imaging Image representative specimens for each species Use existing imaging stations at partner institutions About 30% of Hemiptera are already imaged Expect to produce about 20,000 new images

18 Use of OCR for Populating Botanical Records Workflow jpgs of specimen sheets batch-cropped to labels labels saved as new set of jpgs, then exported to ABBYY Fine Reader 11 Corporate Edition overnight, labels batch-processed through ABBYY each OCR output file saved as individual text file tied to barcode no. individual text files merged into Excel spreadsheet, in which data can be searched, grouped, and parsed parsed fields pushed to database Challenges increasing accuracy of parsing hand-written labels (now experimenting with out-sourcing)

19 Data Storage Issues Botany botanical images are valuable products of our digitization efforts, but also challenges, due to storage demands our concern is with long-term storage (archiving) of uncompressed, original images have encouraged home institutions of our partners to step up, but some unable/unwilling our solution for now is storage on portable drives, but this is tenuous fix and not reliable enough for truly archival storage Entomology no major issues

20 Christiane Weirauch University of California Riverside

21 Subcontract Management Setup 7 collaborating institutions, 27 subawards Benefit: long-term data capture across >30 institutions Issues 1) Delays: administrative and accounting issues 2) Database selection: which one to use? 3) Training: onsite versus remote training? 4) Tracking productivity of subawards not using PBI database Solutions/suggestions 1) Streamlined administrative and accounting procedures 2) Encourage use of a default database; more discussion 3) Combination of onsite and remote training and monitoring 4) Regular contact with subawards

22 Unique Specimen Identifiers (USIs) AMNH Matrix-code labels Setup: Matrix codes (barcode scanner) and string of prefix and 8-digit number (human eye) encode the same unique identifier Benefit: Tracking of specimens; connect images to records Format: Prefix (8 characters): acronym and identifier: e.g., UCRC_ENT XXXXXXXX Non-standard USIs: accepted in the database Exceptions: collections that were previously databased without USIs (e.g., Aphidoidea, certain mirid taxa)

23 Collection Staging Organizing, sorting, and identifying specimens in preparation for databasing Importance: highest identification level and accuracy will yield most useful data for future applications Priority: well-curated and well-identified collections TTD: limited budget for staging by experts; very successful for, e.g., Miridae and Membracidae Issue: routine staging more time-consuming than anticipated Possible solution: budget for graduate students or post docs to help with staging (and training/supervision of databasing crew)

24 Tri-trophic concept: Hemiptera, plants, parasitoids Capture of host data New TTD records: 26% with host records (compared to 24% previously databased); added >800 new hosts Challenges of integrating parasitoid data Level of identification of parasitoids (undescribed species; accurate identification requires skilled personnel) Level of host identification (e.g., “white fly”) Incorporation of host information from secondary sources (e.g., taxonomic literature)?  On the right track; prioritize specimens with quality host records & integrate secondary host information

25 Katja Seltmann The American Museum of Natural History

26 Efficiency of Data Capture: Insects Total as of October 17, 2012 = 198,409 ▫ Includes Illinois, Texas, and Kansas ▫ All 20 subcontracts are digitizing now ▫ 53 contributors for ttd-tcn project Numbers from NHCR database (central database at AMNH – 11 subcontracts) $20,000 in equipment costs Specimens per min average: 3-3.5min/specimen (range 1.2-6) Cost per specimen: $.93 (includes equipment) Peak in July (more hours digitizing) 65 collecting events on Christmas Day

27 Efficiency of Data Capture: Plants All but three institutions up and running As of October 9, 2012 have 102,651 images ▫ 3 of 15 institutions not yet begun 4 plant collections report: ▫ $30482.51 equipment costs ▫ $.73 cents a specimen image ▫ The unmentioned curator volunteerism  4-8 hrs/week depending on institution/taxon  ~19 hours a week total

28 Training Methods: Insects (NHCR Database) Curators also training (sexing specimens, database) Online training via Skype ▫ Digitizers clubhouse (building community) ▫ Online manuals ▫ Online videos ▫ Remote training Using central db can access quality of data ▫ Flag when new name is entered ▫ Flag when more than 10 specimens entered in one min by one person ▫ Flag when exact duplicate collecting events or localities (check training)

29 Training Methods: Plants ▫ Site visits to subcontract institutions  Kim Watson, Melissa Tulig  Install imaging equipment  Personal involvement

30 Quality Assessment of Transformed Records (NHCR) Determination Completeness Note Language (A,B,B) ; (A,A,A) ; (A,C,B)

31 Present total:1487 9134 Canada 14 96 USA 1441 8564 Mexico 32 474 Georeferencing: NHCR database 130,000 specimen records

32 Georeferencing: NHCR database GEOLocate (North America) Discover Life validation Centralized and controlled georeferencing (NYBG, AMNH) Volunteer georeferencing

33 Difficult data Issues: specimen relationships

34 Difficult data Issues: means for curation?

35 Summary and Predictions: over 50,000 locality records from NHCR will reach 1 million new specimen records for insects (harder to predict for plants at the moment) less than $1 a specimen (inclusive) Arthropod (NHCR) data concerns will become more central as other groups come online

36 Thanks to National Science Foundation co-PIs and collaborators http://tcn.amnh.org


Download ppt "NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation."

Similar presentations


Ads by Google