Big Data Needs Little CRUD:

Slides:



Advertisements
Similar presentations
Cardiff School of Computer Science & Informatics Biodiversity Informatics at COMSC Andrew Jones & Richard White School of Computer Science & Informatics.
Advertisements

EDIT General Meeting Carvoeiro, January 2008.
Diana Hernandez Integrating the catalogue of Mexican biota: different approaches for different client perspectives.
Integrated Taxonomic Information System Janet Gomon, Deputy Director, ITIS Smithsonian Institution Museum of Natural History The.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Next Steps in the Catalogue of Life Frank Bisby, Sp2000 and Thomas Orrell, ITIS Catalogue of Life Partnership.
1 Cataloging for School Librarians — It Matters! Margaret Maurer Head, Catalog and Metadata Kent State University Libraries and Media Services 2006 ILF.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
Meeting SB 290 District Evaluation Requirements
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
JOINT STRATEGIC NEEDS ASSESSMENT Rebecca Cohen Policy Specialist, Chief Executive’s.
Quality Assurance. Identified Benefits that the Core Skills Programme is expected to Deliver 1.Increased efficiency in the delivery of Core Skills Training.
An Online Knowledge Base for Sustainable Military Facilities & Infrastructure Dr. Annie R. Pearce, Branch Head Sustainable Facilities & Infrastructure.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
The Global Names Architecture: Integration In Action (NOT “Inaction”) 1.Overview of GNA, GNI & GNUB (15 mins) 2.Questions, Elaborations & Clarifications.
A curation interface for reconciliation of species names for India. Thomas Vattakaven and R. Prabhakar, India Biodiversity Portal, Strand Life Sciences,
Information for decision making Migrating from fragmented visions to solve punctual problems (reacting to crisis) to Systemic and integrated approaches.
1. Housekeeping Items June 8 th and 9 th put on calendar for 2 nd round of Iowa Core ***Shenandoah participants*** Module 6 training on March 24 th will.
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
Christina Flann Species 2000 October 2014 Catalogue of Life Indexing The World’s Known Species Connecting the taxonomic community and the names infrastructure.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Taxonomic verification: Species 2000 and the Catalogue of Life Frank Bisby.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Progress Alastair Culham. i4Life – the BIG aim To move Catalogue of Life from a research project to a sustainable service 1.To enhance the content 2.To.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
GBIF - ECAT  Electronic Catalogue of Names of Known Organisms  Program Officer;  Per de Place Bjørn 
Agree on deployment, UNEP Live – uneplive.unep.org.
Stages of Research and Development
Connecting Networks to Make EO Data More Accessible for the SDGs
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Summon® 2.0 Discovery Reinvented
Unpacking the Challenges in Designing and implementing integrated career pathways Tara Smith Texas Association for Literacy and Adult Education Conference.
Update from the Faster Payments Task Force
Connecting to the Global Data Ecosystem
AIM Operational Concept
Development of the Amphibian Anatomical Ontology
Director of Planning, DCLG
Overview – Guide to Developing Safety Improvement Plan
The Biodiversity and Protected Areas Management (BIOPAMA) Programme
“CareerGuide for Schools”
9/22/2018.
Content & the Supply Chain
Bringing It All Together: The PCI Framework
Overview – Guide to Developing Safety Improvement Plan
Opening Remarks European Commission CEOS 2018 Chair
FRAMEWORK FOR BUSINESS ACTION ON WASH
UNEP Live – uneplive.unep.org
National Community of Practice on Transition
Prioritised Action Frameworks for financing Natura 2000
Evaluation in the GEF and Training Module on Terminal Evaluations
CVE.
Cyber security Policy development and implementation
A Funders Perspective Maria Uhle Co-Chair, Belmont Forum Directorates for Geosciences, US National Science Foundation.
United Nations Statistics Division
SBSTA Research Dialogue: Perspectives from the United States
Hans Dufourmont Eurostat Unit E4 – Structural Funds
Leveraging partnership for the DRR knowledge hub
MSDI training courses feedback MSDIWG10 March 2019 Busan
A Guide to the Sharing Information on Progress (SIP)
Hans Dufourmont Eurostat Unit E4 – Structural Funds
The Discourse of Civic Entrepreneurs
Unit 14 Emergency Planning IS 235
KEY INITIATIVE Financial Data and Analytics
1 Envision 3 Outline 4 Design
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Wide Ideas Idea Management Software Idea Management Process
HUD’s Coordinated Entry Data & Management Guide
OU BATTLECARD: Oracle WebCenter Training
Presentation transcript:

Big Data Needs Little CRUD: data co-creation and integration for a comprehensive taxonomic framework 13 February 2018 David F. Mitchell and Thomas Orrell Integrated Taxonomic Information System, Washington, DC

Biodiversity Big Data is Organized by Taxonomy Taxonomy extends to all formally described species and offers an axis as wide as life itself to organize, integrate, and present biological information   ITIS and Species 2000 work together as full partners to generate the Catalogue of Life, an expert (taxonomist) knowledge base composed of Global species datasets that cover most of described, organized hierarchically. Create attributes for structuring, organizing and labeling

Finding the Relevant Biodiversity Objects To improve recall 1 – Add synonymy 2 – Handle homonyms 3 – Find and fix the gaps 4 – Be current When sci name linked to object does not match the name included in the classification… what happens. It negatively impacts precision and recall. What happens when the classification is not acceptable to the searcher? Poor precision - Classification has children considered irrelevant to the searcher - Homonyms - Incongruent concepts between searcher and the classification organizing objects Poor recall - Taxa excluded from the relevant objects because the searcher’s classification doesn’t match the one that organizes the objects - Synonymy of target taxa is lacking Without synonymy objects that should be in the relevant circle are not

Taxonomic Workbench Co-creation Model we are here to share why and how we are building a new data editing platform for taxonomic data - TWB 6.0 ITIS taxonomic data development we believe a collaborative platform is how to scale to meet the total addressable challenge - a maintainable and regularly updated global taxonomy the new version changes this; data stewards and taxonomic experts become active participants and co-create ITIS content by      making direct collaboration on projects possible      streamlining our DQ process Data Quality process adapted from Björgvinsson, Tryggvi. In Press. The Art of Data Usability, Manning Publications, https://www.manning.com/books/the-art-of-data-usability

Create, Read, Update, Delete – What does a co-creation platform need? Features 1 – Data Integrity & Ease of Use 2 – Efficient Edits 3 – Import, Compare, & Merge 4 – Annotations 5 – Manage Work 6 – Activity Log Design Principles 1 – Certainty Effect 2 – Decisions Under Risk 3 – User’s Decision Point 4 – Disposition Effect, Sticky Behaviors 5 – Familiarity Heuristic, Reciprocity 6- Availability Heuristic What will users use Principles of Design Design for Certainty Effect Design for Decisions Under Risk Design for a User's Decision Point Design for Disposition Effect Design for Familiarity Heuristic Design for Availability Heuristic Design for Sticky Behaviors Design for Reciprocity Key features: Data integrity combined with ease of use – keep taxonomic assertions are consistent nomenclatural rules are followed Edit efficiently - Add bulk updating features, like associating multiple names to single publication, and bulk addition of names based on a template makes our CRUD paradigm within a project faster human engagement - features like inviting and signing up users, and logging of activities, solves the problem of bringing users on board, and giving them feedback on what they have accomplished in different stages of the DQ lifecycle Data Import and ITIS compare (leverage existing work) Social ITIS - add annotations against elements in a project to facilitate communication about data quality issues, this is how we meet the challenge of communication between ITIS and data stewards, and among the multiple experts who will be collaborating on a single project Manage work - integrated work management features, and project metrics, solves the problem of how to prioritize current work, and provide feedback on what has been accomplished to management stakeholders

Building an Information Infrastructure for Scientific Names 1 – Integrate and cross-link nomenclatural content with CoL & other databases 2 – Broaden names coverage for taxonomic groups being updated 3 – Anchoring name records in ITIS to Protonym identifiers 4 – Incorporate stable names and TNU identifiers to differentiate concepts 5 – Expand name content to include original usages/combination, subsequent combinations (homotypic synonyms), and orthographic variants 6 – Leveraging robust nomenclatural elements from ZooBank and lexical reconciliation from Global Names Index (GNI) 7 – Providing links to literature 1 crosslink 2 broaden names coverage there ~2M described species, but there are ~24M distinct name strings in GNI broaden by adding synonyms, alt nomenclatural combinations, alt spellings,

Catalogue of Life Plus (CoL+) Submitted NSF grant proposal by Pyle, Evenhuis, Mozzherin, Orrell, and Whitton, ’Developing a Common Global Infrastructure for Scientific Names’ with the goals - Separate nomenclature and taxonomy using different identifiers and authorities for names and taxa - Ensure a sustainable, robust, and dynamic IT infrastructure for maintaining CoL+ - Establish a clearinghouse for nomenclature and taxonomy to reconcile sources - Establish partnerships, governance, and roadmap for the infrastructure

Big Data needs a little CRUD because Transaction processing at the level of the scientific names is required for structuring, organizing and labeling of big biodiversity data Trustworthy Sense Making Transaction processing at the level of the scientific names is required for a robust structuring, organizing and labeling of big biodiversity data We are designing the way people will make sense of information and share it in a very real sense we are not designing and building protocols, standards, user experiences, api’s, identifiers, and services. We are intervening in an information ecosystems that needs to provide feedback to searches in a feedback loops that allow for trustworthy sense making.