This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF-1115210. Any opinions, findings, and conclusions.

Slides:



Advertisements
Similar presentations
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Advertisements

To share data, all providers must agree upon a data standard.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
Digitisation: What, why and how
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
1 Sharing Data decisions - opportunities - options support from NSF grant: Advancing Digitization of Biological Collections Program (#EF ) Deborah.
Data Cleaning, Validation and Enhancement iDigBio Wet Collections Digitization Workshop March 4 – 6, 2013 KU Biodiversity Institute, University of Kansas.
IDigBio Minimum Information Standards for Scientific Collections (MISC)/Authority Files Working Group Gil Nelson Andréa Matsunaga (on behalf of the WG)
A LOOMING CRISIS: MAINTAINING ACCESS TO ELECTRONIC RESEARCH PRODUCTS Daphne Fautin University of Kansas Gail Kampmeier Illinois Natural History Survey.
Discovering Effective Workflows How can iDigBio help the biological and paleontological community with workflow development? support from NSF grant: Advancing.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Database Software Application
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Roles and Goals Greg Riccardi. iDigBio People University of Florida o Larry Page, Jose Fortes, Pamela Soltis, Bruce McFadden, Renato Figueiredo, Reed.
Georeferencing Train-the-Trainers Survey Results Selected Findings.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
ALLOWS FOR efficient computerization and management of biological collections and mobilization of specimen information onto the Internet.ALLOWS FOR efficient.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
University of Florida Florida State University
GLOBAL BIODIVERSITY INFORMATION FACILITY TDWG 2009, Montpelier, November 12, 2009 Dag Endresen (NordGen)Samy Gaiji (GBIF) Dag Endresen (NordGen) & Samy.
Raw Data Cleaning, Validation and Enhancement The Field Museum - Chicago, Illinois iDigBio Entomology Digitization Workshop Deborah Paul, iDigBio April.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
Biodiversity Data Journal: mobilization, reuse and integration of small data Lyubomir D. Penev 1,3, Teodor A. Georgiev 3, Pavel E. Stoev 2,3, Jordan Bisserkov.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
Biocode Field Information Management System (FIMS) John Deck, UC Berkeley TDWG, 2014.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Coreoidea Species File Online Laurence Livermore 5 th IHS Quadrennial Meeting – July 2014 Lessons Learned in Creating a Comprehensive Taxonomic Inventory.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
OBIS IODE PO OBIS INCOIS OBIS- SEAMAP Separate files OBIS Nodes Data providers Separate files GBIFLifeWatchGEOSSEOL,…CBDFAOISA Fail-over mirrorGeo-load.
UMass Libraries 2009 Maxine Schmidt Integrated Sciences and Engineering Library Head University of Massachusetts Amherst, MA 01003
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
GEOSS Future Products Workshop: Session 5 – Interoperability and Resource Discovery NOAA, Silver Spring, MD 27 March 2013 Moderator: Steve Browdy Rapporteur:
GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Knowledge is power NASBLA and Knowledge Management
Data Management: Documentation & Metadata
Data Management: The Data Repatriation Re-integration Step or …
1B Publishing Primary Biodiversity Data
Designing, Implementing, and Benefiting from a Collections Attribution Channel: the view from iDigBio and the ADBC Alex Thompson, Deborah L. Paul, Gil.
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Cody W. Thompson, Ph.D. University of Michigan
Presentation transcript:

This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Exposing Data from Small Collections: common questions and solutions Deb – Florida State University Richard K. Rabeler – University of Michigan SPNHC Cardiff Mobilization

“If you are not getting your data to GBIF, you might as well not exist.”  What this comment means to us!!  What can we do to “exist”?  Mobilize  Mobilize data in the 21st century 2

Main Questions  1. What is mobilization?  2. What do I need to do to get my data ready for mobilization?  3. How do I mobilize my data once it’s ready? 3

1. What is mobilization? 4

species ranges outlier discovery new species gaps in collecting relationships predictive niche models collector maps… possibilitiespossibilities Manage data Data Provider Catalog UserUser TaxonomyTaxonomy GBIFGBIF BISONBISON iDigBioiDigBio ExportExport 5 concept by G. Riccardi

2. What do I need to do to get my data ready for mobilization? 6

Mobilization requires standard terms My data? Your data? map to a standard!

So what is standardization exactly? What do I need to do? standardization  Data needs standardization  use Darwin Core (dwc)  controlled values (e.g. holotype, lectotype,…) 8

So what is standardization exactly? What do I need to do? standardization  Data needs standardization  use Darwin Core (dwc)  controlled values (e.g. holotype, lectotype,…)  date formats, encoding, …  taxonomy 9

So what is standardization exactly? What do I need to do? standardization  Data needs standardization  use Darwin Core (dwc)  controlled values (e.g. holotype, lectotype,…)  date formats  taxonomy  How do I migrate to standards?  Consult experts at iDigBio or GBIF or US GBIF node …  Make changes to current practices 10 BIS (TDWG)

What data must I have?  What is missing from my data?  Minimum data field content  What, where, when, (who)  Should my data be georeferenced?  Yes, enables lots of research  Validation 11 Dupes

What are my georeferencing options?  inline, automated, by the crowd  For example,  Find georeferenced duplicates  Locality services  If done outside of the database, via a portal, for example  plan for re-integration 12

Who is going to enter / validate / georeference the data?  This is an opportunity! (Monfils, Harris)…  Students  Volunteers  Curatorial Assistants  Collection Managers  Curators  Researchers  Citizen Scientists (all of us!)  to quote Kari, “…it’s a matter of time.” 13

What about sensitive locality data?  Don’t share sensitive data  Aim for due diligence  Software can help, for example:  Do manage the time / effort for this  Consider:  Duplicate conundrum  Collector numbers  Publications, Google  Think about a public education strategy 14

What about barcodes? Do I need them? What are my options?  Barcodes facilitate automation  Managing connection between specimens, media and database records  You don’t have to have them, but … 15

What do bar codes do?  simplify:  image file naming  image processing, validation, and tracking  loan queries  specimen tracking  automated processing / sharing 16

Which kind of barcodes do I use?  Many options  1-D, 2-D  do put identifier in the barcode  do Not put taxon name in barcode matrix  can be a UUID, can be a darwin core triplet  in essence they are like a catalog number   institutioncode:collectioncode:  q-r code (2-D matrix)  urn:uuid:f47ac10b-58cc-4372-a567-0e02b2c3d47 17

I've heard of the need for my data (and media) to have "unique identifiers", but I don't know much about them. What are they good for? For my simple data set, who would assign them (and how)?  Globally unique identifiers for specimens and media are key for citation and feedback 18

I've heard of the need for my data (and media) to have "unique identifiers", but I don't know much about them. What are they good for? For my simple data set, who would assign them (and how and to what)?  Globally unique identifiers for specimens and media are key for citation and feedback  Best if provider (you!) assigns these UUID  assign a UUID to every specimen (and media) you have  Universal Unique Identifier  urn:uuid: f47ac10b-58cc-4372-a567-0e02b2c3d47 19 Don’t panic! It’s easy.

Do unique identifiers have to be on the physical object?  No.  They are stored in the database. UUID  But when providing data, a dwc:occurrenceID that is a globally unique identifier for the specimen is best and this would be a UUID. 20 Back to this in a bit…

Where do I get UUIDs? Do I have to use them?  It is easy to set up databases to have a UUID and to add a column with these if needed.  easy to create them, get them from the web  Other identifiers will work, including the Darwin Core triple  BEST Practice: register with GRBio to insure your triple will be unique. (grbio.org)  All bits need these 21 Some do this now

How do I choose a database, or collection management software?  Guidelines exist to help you decide  Considerations for Selecting a Collections Management System (Joanna McCaffrey, 2012) Considerations for Selecting a Collections Management System  Digitisation: A strategic approach for natural history collections. Canberra, Australia, CSIRO (Bryan Kalms, 2012) Digitisation: A strategic approach for natural history collections.  Initiating a Collection Digitisation Project (Frazier, Wall, Grant 2008) Initiating a Collection Digitisation Project  Your community 22

3. How do I mobilize my data once it’s ready?  So, your data is entered, cleaned up, standardized, georeferenced, validated what next?  or wait! Does it all have to be done before you mobilize it? No!  Trend: Minimal / Skeletal Data Records  Result: Need to develop robust strategies for completing / enhancing records 23

I work at a small collection and have a data set in Excel and want to get it exposed to GBIF. What are my options?  All roads lead to GBIF 24 Not a database Excel

Could I do something similar with an Access or FileMaker Pro database?  Yes. 25

I've heard of the IPT, what is it? What can it do for me?  IPT is Integrated Publishing Toolkit (IPT)  Software to help you make and enable you to share a tidy, standardized, dataset  Darwin Core Archive (at its simplest)  occurrence data  meta.xml  eml.xml  You can install it yourself, Your IT staff can set it up, You can use someone else’s IPT  ask them!  Media data, Genomic data, OCR output, …  UUIDs are key 26

Is there a "best place" to put my data?  Everywhere.  Facilitate data discovery, data use, data re- use, data enhancement.  Expect enhanced data.  Expect feedback about data issues.  (errors, typos, formatting, georeference issues, taxonomy issues,...)  Ask where your data is going 27

What about funding?  libraries (IMLS, …)  foundations  seek to establish a relationship with foundations whose missions, while perhaps different from yours, may overlap to benefit both of you  collaborations  your university  include students (undergraduates)  can bring funding opportunities 28

What about large collections? Do they have this all figured out?  Some do, some don’t, …  Those that do (small and large) – can help  Expertise sharing  Pain points (oops!)  Documentation  Software?... 29

More questions?  Let’s continue the conversation! Friday  See you Friday… Collections Digitization and Opportunities for International Collaboration, 11 AM  SPNHC 2014 Special Interest Group Session: Collections Digitization and Opportunities for International Collaboration, 11 AM  Diolch yn fawr! 30