Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Needs Little CRUD:

Similar presentations


Presentation on theme: "Big Data Needs Little CRUD:"— Presentation transcript:

1 Big Data Needs Little CRUD:
data co-creation and integration for a comprehensive taxonomic framework 13 February 2018 David F. Mitchell and Thomas Orrell Integrated Taxonomic Information System, Washington, DC

2 Biodiversity Big Data is Organized by Taxonomy
Taxonomy extends to all formally described species and offers an axis as wide as life itself to organize, integrate, and present biological information ITIS and Species 2000 work together as full partners to generate the Catalogue of Life, an expert (taxonomist) knowledge base composed of Global species datasets that cover most of described, organized hierarchically. Create attributes for structuring, organizing and labeling

3 Finding the Relevant Biodiversity Objects
To improve recall 1 – Add synonymy 2 – Handle homonyms 3 – Find and fix the gaps 4 – Be current When sci name linked to object does not match the name included in the classification… what happens. It negatively impacts precision and recall. What happens when the classification is not acceptable to the searcher? Poor precision - Classification has children considered irrelevant to the searcher - Homonyms - Incongruent concepts between searcher and the classification organizing objects Poor recall - Taxa excluded from the relevant objects because the searcher’s classification doesn’t match the one that organizes the objects - Synonymy of target taxa is lacking Without synonymy objects that should be in the relevant circle are not

4 Taxonomic Workbench Co-creation Model
we are here to share why and how we are building a new data editing platform for taxonomic data - TWB 6.0 ITIS taxonomic data development we believe a collaborative platform is how to scale to meet the total addressable challenge - a maintainable and regularly updated global taxonomy the new version changes this; data stewards and taxonomic experts become active participants and co-create ITIS content by      making direct collaboration on projects possible      streamlining our DQ process Data Quality process adapted from Björgvinsson, Tryggvi. In Press. The Art of Data Usability, Manning Publications,

5 Create, Read, Update, Delete – What does a co-creation platform need?
Features 1 – Data Integrity & Ease of Use 2 – Efficient Edits 3 – Import, Compare, & Merge 4 – Annotations 5 – Manage Work 6 – Activity Log Design Principles 1 – Certainty Effect 2 – Decisions Under Risk 3 – User’s Decision Point 4 – Disposition Effect, Sticky Behaviors 5 – Familiarity Heuristic, Reciprocity 6- Availability Heuristic What will users use Principles of Design Design for Certainty Effect Design for Decisions Under Risk Design for a User's Decision Point Design for Disposition Effect Design for Familiarity Heuristic Design for Availability Heuristic Design for Sticky Behaviors Design for Reciprocity Key features: Data integrity combined with ease of use – keep taxonomic assertions are consistent nomenclatural rules are followed Edit efficiently - Add bulk updating features, like associating multiple names to single publication, and bulk addition of names based on a template makes our CRUD paradigm within a project faster human engagement - features like inviting and signing up users, and logging of activities, solves the problem of bringing users on board, and giving them feedback on what they have accomplished in different stages of the DQ lifecycle Data Import and ITIS compare (leverage existing work) Social ITIS - add annotations against elements in a project to facilitate communication about data quality issues, this is how we meet the challenge of communication between ITIS and data stewards, and among the multiple experts who will be collaborating on a single project Manage work - integrated work management features, and project metrics, solves the problem of how to prioritize current work, and provide feedback on what has been accomplished to management stakeholders

6 Building an Information Infrastructure for Scientific Names
1 – Integrate and cross-link nomenclatural content with CoL & other databases 2 – Broaden names coverage for taxonomic groups being updated 3 – Anchoring name records in ITIS to Protonym identifiers 4 – Incorporate stable names and TNU identifiers to differentiate concepts 5 – Expand name content to include original usages/combination, subsequent combinations (homotypic synonyms), and orthographic variants 6 – Leveraging robust nomenclatural elements from ZooBank and lexical reconciliation from Global Names Index (GNI) 7 – Providing links to literature 1 crosslink 2 broaden names coverage there ~2M described species, but there are ~24M distinct name strings in GNI broaden by adding synonyms, alt nomenclatural combinations, alt spellings,

7 Catalogue of Life Plus (CoL+)
Submitted NSF grant proposal by Pyle, Evenhuis, Mozzherin, Orrell, and Whitton, ’Developing a Common Global Infrastructure for Scientific Names’ with the goals - Separate nomenclature and taxonomy using different identifiers and authorities for names and taxa - Ensure a sustainable, robust, and dynamic IT infrastructure for maintaining CoL+ - Establish a clearinghouse for nomenclature and taxonomy to reconcile sources - Establish partnerships, governance, and roadmap for the infrastructure

8 Big Data needs a little CRUD because
Transaction processing at the level of the scientific names is required for structuring, organizing and labeling of big biodiversity data Trustworthy Sense Making Transaction processing at the level of the scientific names is required for a robust structuring, organizing and labeling of big biodiversity data We are designing the way people will make sense of information and share it in a very real sense we are not designing and building protocols, standards, user experiences, api’s, identifiers, and services. We are intervening in an information ecosystems that needs to provide feedback to searches in a feedback loops that allow for trustworthy sense making.


Download ppt "Big Data Needs Little CRUD:"

Similar presentations


Ads by Google