Download presentation
Presentation is loading. Please wait.
Published byΕυγένιος Βενιζέλος Modified over 6 years ago
1
Data Management: The Data Repatriation Re-integration Step or …
Loaded terms, Pronoun Trouble, Achievable Expectations Introducing the: iDigBio Data Management Interest Group Webinar Kick-off, 7 August 2014 Deb Paul, Greg Riccardi, The focus of this talk is on the data management issues / challenges / expectations surrounding getting enhanced data from the cloud, back into one’s local database. iDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
2
predictive niche models collector maps… possibilities
concept by G. Riccardi Data Provider Catalog Manage data Export Standaridization Outlier Detection Duplicate Detection Annotation Taxonomy Georeferencing Encoding The World Focus on the blue dashed arrow. species ranges outlier discovery new species gaps in collecting relationships predictive niche models collector maps… possibilities Researcher
3
Aggregator representation
Provider data Aggregator representation of provider data Why darwin core / georeferencing standards? Why care about standards? What do they have the potential to accomplish? Collection Managers are doing what they need to do – for themselves, their collections. When we share, we need standards. Data becomes useful for others / other purposes. a common vocab is required Feedback and Attribution become possible. The collection gets used, more, increasing the value of the collection. indirect, subtle Putting identifiers on specimens --- makes more useful to others. consistency is important!
4
Pronoun trouble Provider Aggregator / Integrator
the institution / collection giving data to the aggregator individual researcher Aggregator / Integrator a source, combining and providing access to enhanced data from many providers in one place
5
Questions: Provider Expectations
Where is the most up-to-date data? Is it rational to expect that the provider’s data is the best / most accurate version of the data? Will you, the aggregator / integrator be modifying my data? If so, how? why? How do I (the provider) get the modified data, back into my database (if I want to)? What are the hurdles? What if I do not want to? What if I cannot get the modified data back into my database? How do I keep original data, and consume the enhanced data? Best data comes from source (aka the “provider”) Is this a reasonable proposition / expectation? IF Yes, when might it actually be true? IF No, where is the place? Is there one place? Won’t different aggregators do different things to / with the data? Won’t different aggregators provide different tools?
6
Questions: Community / Researcher Expectations
Where is the most up-to-date best / most accurate data to address (my) research questions? Where does the researcher expect they will find the best data? Process: discovery – acquisition – refinement Will errors I find and enhancements I make be accepted by the provider? Are these expectations realistic? achievable? or desirable? Is a GenBank-like model coming? most up-to-date version of data is in the cloud? IF the cloud is the solution (of the near future), what do we do in the meanwhile? Can I find all the potential versions (discovery) Then what?
7
What updates might the provider expect?
standardization country, state, province, dates, type status, synonyms, authorities, taxon names, abbreviations, preparation types, collector names, … georeferences determination annotations image annotations general comments encoding
8
Goals of provider… Researcher goals…
Most accurate / complete data to share Does not want to be (cannot be) the source of all the possible information about a given specimen They want to curate the data that satisfies their needs Researcher goals… Creation of best dataset possible If problems found and noted, expects issues found to be fixed
9
What are the challenges for re-integration?
Provider database doesn’t have required fields to hold data and database is not modifiable. Provider database doesn’t have required fields to hold enhanced data but database is modifiable. Provider database can be modified to hold enhanced data but provider doesn’t know how. Provider doesn’t know if their database is modifiable or not. Provider has skills but doesn’t have time to make the updates happen. Provider doesn’t want the enhanced data (for some reason/s). Provider does want to be the authoritative source… Curation process Example: your local “filedas name” is not something you would change.
10
Cycle Researcher Provider (who is a Consumer) Aggregator Analyzers
Provider – Consumer relationship at several levels The Catalog
11
(Some) Reasons why errors are hard to fix
too “annoyed” to fix broken links don’t have the time lacking skills (computational and data literacy) database modifications needed database modifications impossible people make mistakes can’t automate all fixes human who knows what data ought to look like must be involved ...
12
Data Management Interest (DMI) Group…
Your input, next steps, thank you! Create list of relevant publications List issues that make data re-integration a challenge Sort the issues Topics for future meetings Meet – monthly or bi-monthly Next meeting October 2014 Refine scope / goals of the group, subgroups if desired Overlap with other groups TDWG Data Quality Group DMI Wiki
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.