Data Management: The Data Repatriation Re-integration Step or …

Slides:



Advertisements
Similar presentations
Campaign Planning – Direct Wines Using historic data to improve planning and forecasting TFM&A 2014 David Lockwood: Direct Wines Terry Hogan: Golden Orb.
Advertisements

GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Data Cleaning, Validation and Enhancement iDigBio Wet Collections Digitization Workshop March 4 – 6, 2013 KU Biodiversity Institute, University of Kansas.
IDigBio Minimum Information Standards for Scientific Collections (MISC)/Authority Files Working Group Gil Nelson Andréa Matsunaga (on behalf of the WG)
© 2004 University of Rochester LibrariesSlide 1 Enhancing DSpace Based on a Work-Practice Study DSpace Federation User Group Meeting March 10, 2004 Dave.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
The Macroalgal Digitization Project Chris Neefus, Department of Biological Sciences University of New Hampshire, Durham, New Hampshire.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
University of Florida Florida State University
Team-Based Inquiry ASTC Preconference Workshop, October 18
Raw Data Cleaning, Validation and Enhancement The Field Museum - Chicago, Illinois iDigBio Entomology Digitization Workshop Deborah Paul, iDigBio April.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.
Updated September 2011 Medical Applications in Nanotechnology Nano Gold Sensors Lab.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
This material is based upon work supported by the National Science Foundation under Cooperative Agreement EF Any opinions, findings, and conclusions.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
Leading By Convening: A Blueprint for Authentic Engagement September 13, 2014.
Using Kurator Tools for Data Quality and Cleaning Biodiversity Data
Methods of Science quiz review – blue page
Six-Sigma : DMAIC Cycle & Application
OpenPath – Improving Student Pathways to Computing Professions
AP CSP: Encode an Experience
Getting to know the data, Getting to know all about the data
THIS IS TO EVIDENCE YOUR WORK AND GET THE BEST GRADE POSSIBLE
Performance Management Done Differently
Using DLESE: Finding Resources to Enhance Teaching
Development of the Amphibian Anatomical Ontology
RCN Development of an Online Database to Enhance the Conservation of SGCN Invertebrates in the Northeastern Region James W. Fetzner Jr. & John.
Introduction to Comprehensive Evaluation
OntoMorphBankSter: Image-driven Ontology and/or Ontology-driven Image Annotation Greg Riccardi, Austin Mast Florida State U Dan Miranker, Ferner Cilloniz,
Regional business register
Innovative Uses of Collections Data (by & for collections!)
BUREAU VERITAS COMMODITIES
What is a Flow Chart ? An organized combination of shapes, lines, and text that graphically illustrates a process or structure A pictorial representation.
Data Quality By Suparna Kansakar.
Cole Elementary Grade Level Team Meetings
The Institute for Leadership in Education Development (I-LED) What’s Your Agenda? How to Craft Meaningful Agendas Jennifer L. White, JD This project.
Critical Analysis CHAPTER 7.
Unpacking the Essay Question
Hailey Mooney Data Services Coordinator and Social Sciences Librarian
Chapter 13 Quality Management
Data Quality: Why it Matters
Foster Carer Retention Project Michelle Galbraith Project Manager
Bird of Feather Session
The Nature of Science.
Time Scheduling and Project management
This material is based upon work supported by the National Science Foundation under Grant #XXXXXX. Any opinions, findings, and conclusions or recommendations.
Developing SMART Professional Development Plans
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Integrating Educational Technology into the Classroom
Conference name, location
Conference name, location
Presentation transcript:

Data Management: The Data Repatriation Re-integration Step or … Loaded terms, Pronoun Trouble, Achievable Expectations Introducing the: iDigBio Data Management Interest Group Webinar Kick-off, 7 August 2014 Deb Paul, Greg Riccardi, The focus of this talk is on the data management issues / challenges / expectations surrounding getting enhanced data from the cloud, back into one’s local database. iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF-1115210). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

predictive niche models collector maps… possibilities concept by G. Riccardi Data Provider Catalog Manage data Export Standaridization Outlier Detection Duplicate Detection Annotation Taxonomy Georeferencing Encoding The World Focus on the blue dashed arrow. species ranges outlier discovery new species gaps in collecting relationships predictive niche models collector maps… possibilities Researcher

Aggregator representation Provider data Aggregator representation of provider data Why darwin core / georeferencing standards? http://prezi.com/iib3pqk-kyd-/curators-workbench/ Why care about standards? What do they have the potential to accomplish? Collection Managers are doing what they need to do – for themselves, their collections. When we share, we need standards. Data becomes useful for others / other purposes. a common vocab is required Feedback and Attribution become possible. The collection gets used, more, increasing the value of the collection. indirect, subtle Putting identifiers on specimens --- makes more useful to others. consistency is important! http://www.britishmuseum.org/images/rosettawriting384.jpg

Pronoun trouble Provider Aggregator / Integrator the institution / collection giving data to the aggregator individual researcher Aggregator / Integrator a source, combining and providing access to enhanced data from many providers in one place

Questions: Provider Expectations Where is the most up-to-date data? Is it rational to expect that the provider’s data is the best / most accurate version of the data? Will you, the aggregator / integrator be modifying my data? If so, how? why? How do I (the provider) get the modified data, back into my database (if I want to)? What are the hurdles? What if I do not want to? What if I cannot get the modified data back into my database? How do I keep original data, and consume the enhanced data? Best data comes from source (aka the “provider”) Is this a reasonable proposition / expectation? IF Yes, when might it actually be true? IF No, where is the place? Is there one place? Won’t different aggregators do different things to / with the data? Won’t different aggregators provide different tools?

Questions: Community / Researcher Expectations Where is the most up-to-date best / most accurate data to address (my) research questions? Where does the researcher expect they will find the best data? Process: discovery – acquisition – refinement Will errors I find and enhancements I make be accepted by the provider? Are these expectations realistic? achievable? or desirable? Is a GenBank-like model coming? most up-to-date version of data is in the cloud? IF the cloud is the solution (of the near future), what do we do in the meanwhile? Can I find all the potential versions (discovery) Then what?

What updates might the provider expect? standardization country, state, province, dates, type status, synonyms, authorities, taxon names, abbreviations, preparation types, collector names, … georeferences determination annotations image annotations general comments encoding

Goals of provider… Researcher goals… Most accurate / complete data to share Does not want to be (cannot be) the source of all the possible information about a given specimen They want to curate the data that satisfies their needs Researcher goals… Creation of best dataset possible If problems found and noted, expects issues found to be fixed

What are the challenges for re-integration? Provider database doesn’t have required fields to hold data and database is not modifiable. Provider database doesn’t have required fields to hold enhanced data but database is modifiable. Provider database can be modified to hold enhanced data but provider doesn’t know how. Provider doesn’t know if their database is modifiable or not. Provider has skills but doesn’t have time to make the updates happen. Provider doesn’t want the enhanced data (for some reason/s). Provider does want to be the authoritative source… Curation process Example: your local “filedas name” is not something you would change.

Cycle Researcher Provider (who is a Consumer) Aggregator Analyzers Provider – Consumer relationship at several levels The Catalog

(Some) Reasons why errors are hard to fix too “annoyed” to fix broken links don’t have the time lacking skills (computational and data literacy) database modifications needed database modifications impossible people make mistakes can’t automate all fixes human who knows what data ought to look like must be involved ...

Data Management Interest (DMI) Group… Your input, next steps, thank you! Create list of relevant publications List issues that make data re-integration a challenge Sort the issues Topics for future meetings Meet – monthly or bi-monthly Next meeting October 2014 Refine scope / goals of the group, subgroups if desired Overlap with other groups TDWG Data Quality Group DMI Wiki