CalBugDigitizing California’s Terrestrial Arthropods CalBug: Digitizing California’s Terrestrial Arthropods Peter T Oboyski, Joan Ball, Rosemary Gillespie,

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

Database Basics. What is Access? Database management system Computer-based equivalent of a manual database Makes it easy to organize and update information.
ZIMS With Medical Release 2.0 R2 An overview of the Medical Module in ZIMS 1.
Development of a computer information system for wildlife conservation in Louisiana, with a prototype system for fishes Henry L. Bart Jr. and Nelson E.
HOW TO USE THE SYSTEM Specialty Crop Block Grant Program Online System.
Managing Data Resources
TREMA Tree Management and Mapping software Raintop Computing - Oxford.
Living in a Digital World Discovering Computers 2010.
Biodiversity and Climate Change
Digitizing California Arthropod Collections Peter Oboyski, Phuc Nguyen, Serge Belongie, Rosemary Gillespie Essig Museum of Entomology University of California.
Microsoft Access Database software. What is a database? … a database is an organized collection of data. A collection of data of similar information compiled.
Integrative research using digitized specimens: examples from the Consortium of California Herbaria Brent Mishler University and Jepson Herbaria University.
IWC Database Overview of technology and application 13 th July 2010.
ArcGIS Workflow Manager An Introduction
1 ThinkLink Learning Online User Manual for Predictive Assessment Series Go to www2.thinklinklearning.com/pas4mlwk. Click Educator Login. Your username.
Distance Diagnostics through Digital Imaging DDDI Distance Diagnostics through Digital Imaging DDDI
XP Chapter 4 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Collecting Data for Well-Designed Forms Chapter 4 “Making.
Unit J: Creating a Database Microsoft Office Illustrated Fundamentals.
Classroom User Training June 29, 2005 Presented by:
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
A Public Trust at Risk: The Heritage Health Index Report on the Condition of Alabama’s Collection.
1 Lesson 22 Getting Started with Access Essentials Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Lesson No:9 MS-Word Tools, Mail Merge and working with Tables CHBT-01 Basic Micro process & Computer Operation.
Office 2003 Advanced Concepts and Techniques M i c r o s o f t Access Project 5 Enhancing Forms with OLE Fields, Hyperlinks, and Subforms.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
MP Online Data Entry Project Update WMS / ROS August 2013 Troy Anderson.
Lesson 17 Getting Started with Access Essentials
VistA Imaging Capture via Scanning. October VistA Imaging Capture via Scanning The information in this documentation includes only new and updated.
Office 2003 Advanced Concepts and Techniques M i c r o s o f t Access Project 5 Enhancing Forms with OLE Fields, Hyperlinks, and Subforms.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
Key Applications Module Lesson 21 — Access Essentials
Georeferencing Methods. 1) Read Guidelines: Point-radius method Point radius method for georeferencing locality descriptions and calculating associated.
VistA Imaging Workstation Configuration. October The information in this documentation includes functionality of the software after the installation.
Microsoft Access. Microsoft access is a database programs that allows you to store retrieve, analyze and print information. Companies use databases for.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 11 – Building.
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
Basic Navigation in Oracle R12 BY: Muhammad Irfan.
Microsoft Excel Illustrated Introductory Workbooks and Preparing them for the Web Managing.
Managing Data Resources File Organization and databases for business information systems.
Section 3 Computing with confidence. The purpose of this section The purpose of this section is to develop your skills to achieve two goals: 1-Becoming.
There are three main screens, plus the summary page. The next three pages show each screen with instructions below. A symbol ( ) will appear on the picture.
TOPSpro Special Topics VI:TOPSpro for Instructors.
What are our collections being used for?
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
What’s New in ProMonitor 9
Science Reference Center
GO! with Microsoft Office 2016
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
Database application MySQL Database and PhpMyAdmin
Science Reference Center
GO! with Microsoft Access 2016
Enhancing Forms with OLE Fields, Hyperlinks, and Subforms
ZIMS Studbooks Data Tracking, Reports, and Tools
Introduction to the New SSA OnePoint Online Website
Data Management: The Data Repatriation Re-integration Step or …
eDIRECT: User Management
Managing Rosters Screener Training Module Module 5
Lessons Vocabulary Access 2016.
Georeferencing Introduction: Collaboration to Automation
The ultimate in data organization
Microsoft Office Illustrated Introductory, Windows XP Edition
Guidelines for Microsoft® Office 2013
Unit J: Creating a Database
Tutorial Introduction to help.ebsco.com.
Presentation transcript:

CalBugDigitizing California’s Terrestrial Arthropods CalBug: Digitizing California’s Terrestrial Arthropods Peter T Oboyski, Joan Ball, Rosemary Gillespie, Joyce Gross, Traci Grzymala, Gordon Nishida, Kipling Will Essig Museum of Entomology, University of California at Berkeley,USA Summary Databasing of entomology collections has lagged behind that of other disciplines primarily due to large collection sizes and the highly abbreviated and inconsistent data on very small specimen labels. CalBug is a National Science Fundation funded collaboration of the eight major entomology collections in California* that intends to capture 1.1 million specimen- level data records from our combined holdings. Data from all institutions will be combined in a single online cache. We will analyze these data using geospatial technology to explore the relationship between changes in distribution and habitat modification. Developing time- saving methods and technology for getting data from specimen labels into databases is paramount. We have focused on developing and testing methods and workflows to increase the rate of data capture, while maximizing data quality. Digital imaging of labels provides an easy-to-view verbatim archive of specimen data and allows remote data entry from image files through manual entry, crowd-sourcing, and automated OCR and data parsing. Specimen handling remains a significant obstacle for efficient data capture from entomological collections because of costs in time and risk to specimens. Georeferencing is also a challenge due to the highly abbreviated and inconsistent nature of location data on specimen labels. To address these challenges we are exploring strategies that combine computer and human data handling. Label Image Capture Georeferencing and Mapping * Collaborators : Bohart Museum – UC Davis, California Academy of Sciences, California State Collection of Arthropods, Entomology Research Museum – UC Riverside, Essig Museum of Entomology – UC Berkeley, LA County Natural History Museum, San Diego Natural History Museum, Santa Barbara Museum of Natural History Figure 6. Annual average high temperatures under a high emissions scenario of climate change (Source: Cal-Adapt and the Public Interest Energy Research program, California Energy Commission). Records of arthropod collections over the past 100 years along with projections of future climates will be used to predict the impact of climate change on arthropod distributions. Methods Taxa and localities to database: Priority species were selected to address urgent environmental issues and target localities to examine changes in biodiversity at sites with long-term sampling, including Natural Area Reserves. Sort specimens by location and date (optional): A “carry-over” function reduces time spent typing when consecutive specimens have similar data. Digital imaging: DinoLite® digital microscopes (Figure 1) capture images of label data in JPEG format. Manual data entry into MySQL database: Label data are interpreted and entered into appropriate database fields (Figure 4). Error checking: Records are successively sorted by locality and date to identify typographic errors/inconsistencies. Georeference locality data: Database records are uploaded to BioGeomancer georeferencing software (Figure 5) which suggests coordinates and an error radius for each locality based on standardized protocols. Upload data to cache (in development): At the completion of the project each institution will upload records to a central cache for inter-institution analyses (Figure 4). Temporospatial analyses (in development): GIS tools will be used to correlate species distributions with climate and habitat factors and to predict changes in species distributions based on climate change projections (Figure 6). Workflow Optional step In development Database Assessment and Progress Specimen handling: A significant time expenditure includes retrieval of individual specimens, positioning of labels for viewing, adding a catalog number label, and returning the specimen to its unit tray. Digital Imaging: Protocols for entering data directly from specimens into a verbatim field followed by parsing into interpreted fields proved slow. Digital imaging of specimen labels provides advantages, including a true verbatim digital archive, the ability to enlarge labels onscreen, and the opportunity for remote data entry and/or Optical Character Recognition (OCR) to automate data extraction. Using a naming convention that includes the specimen catalog number, digital images are automatically linked to database records. Each specimen takes ~2 seconds to photograph, but naming and saving files adds ~7-10 seconds/specimen. Databasing: Several fields, including higher taxonomy and “higher geography” are automatically filled names already in the database. Data are carried-over from one specimen to the next (yellow fields in Figure 1). These features, along with pick lists and controlled fields, reduce errors. Progress: 27,000 Hymenoptera; 8,400 Odonata; 7,000 Lepidoptera entered into Essig Database. 4,000 specimens fully georeferenced. 36,000 images taken with 24,000 awaiting data entry. Improving image & data acquisition Minimize imaging time: We are currently developing high-throughput assembly lines to increase the rate of image capture by spatial arrangement of handling tasks and automating file naming and saving. Online crowd-sourcing: We are collaborating with the Zooniverse citizen science program to engage thousands of volunteers in label data entry from digital images. Multiple volunteers enter data multiple times for each label, which are then compared for consistency (as a proxy for accuracy). OCR and automated data parsing: We are developing user dictionaries for Optical Character Recognition software to increase percent recognition and accuracy. We are also looking for programmers to create a “smart” parsing program that can assign data elements to appropriate database fields based on context and dictionary terms. Developing a data cache: Data from each collaborating institution will be added to a combined online cache (see required fields in Figure 4). 1. Select taxa for databasing 2. Sort specimens by location & date 2. Sort specimens by location & date 4. Take, name, and save digital image of labels 5a. Manually enter data into MySQL database with some error checking 5a. Manually enter data into MySQL database with some error checking 7. Georeference locality 5b. Online crowd-sourcing of manual data entry 5b. Online crowd-sourcing of manual data entry 5c. Optical Character Recognition & data parsing 3. Tease apart labels to view all text, add catalog # label 6. Error Checking 9. Temporospatial analyses 8. Upload data to cache Collecting Event Data eventID (DC) country (DC) stateProvince (DC) county (DC) locality (DC) minimumElevationMeters (DC) maximumElevationMeters (DC) decimalLatitude (DC) decimalLongitude (DC) coordinateUncertaintyMeters (DC) geodeticDatum (DC) verbatimCoordinateSystem (DC) georeferenceSources (DC) georeferencedBy (DC) georeferencedDate georeferenceRemarks (DC) collectionBeginDate (*) collectionEndDate (*) recordedBy (DC) = collectors samplingProtocol (DC) associatedTaxa (DC) sex (DC) individualCount (DC) Specimen Data catalogNumber (DC) institutionCode (DC) kingdom (DC) phylum (DC) class (DC) order (DC) family (DC) genus (DC) specificEpithet (DC) subspecies taxonIDCertainty scientificNameAuthorship (DC) identifiedBy (DC) dateIdentified (DC) eventID (DC) Bold = required Normal = recommended (DC) = Darwin Core field (*) = Darwin Core recommends one field that accommodates several date options. We prefer “begin” and “end” dates. Figure 4. Each institution uses its own database system. Records will be collected into a Darwin Core-compliant, flat-file, cache with required fields for collecting event data and specimen data as indicated in the above tables from the Essig database. Labels are often highly abbreviated – unrecognized abbreviations are entered “as is” and bulk updated after data entry is completed. Figure 1. (upper left) DinoLite® digital microscope and software used to capture images of specimens and labels. (upper right) Essig database data entry screen with specimen image – clicking on image icon makes image appear in a separate movable window. Yellow fields are carried-over to the next specimen. (lower right) Dragonfly with labels removed for imaging. Figure 5. Semi-automated programs, such as BioGeomancer, estimate latitude-longitude coordinates with an adjustable error radius based on text descriptions (above example: 15 miles E of Cloverdale, CA). Queries of georeferenced specimens are mapped “on-the-fly” using Berkeley Mapper (right example: specimens near Sacramento, California of Libellula luctuosa Burmeister dragonflies in the Essig Database). Figure 3. General workflow for image capture, databasing, georeferencing, and analysis. See Methods for workflow details. © Joyce Gross © PT Oboyski Response to climate change