DCO-DS: Moving Forward DCO Synthesis Meeting. Oct , 2015 DCO-DS = DCO Data Science
Vision… “Our vision is to develop, facilitate, and maintain sustained multi-way engagement of carbon scientists in multi-scale local to global networks” [for the transformation of our understanding of carbon in Earth]. Organization is required so participants can carry out their mission(s) Those participants (by defn.) may never be in a single organization -> virtual organization
Virtual Organizations as Socio-Technical Systems ‘ …a geographically distributed organization whose members are bound by a long-term common interest or goal, and who communicate and coordinate their work through information technology’ (Ahuja) ‘These members assume well defined roles and status relationships within the context of the virtual group that may be independent of their role and status in the organization employing them’ (Ahuja et al., 1998) Technology Communication Patterns Organizational Structure
Virtual Organization Feature: Outcomes/ values Dynamic versus static Evolvable/ ecosystem-like Heterogenetic tolerance Attributes of the organization Roles/ responsibilities Scale or scalability
Strategy…
Mapping… goal -> use case participation -> team(s), vetting, acceptance outcomes/ value -> goals, metrics, evaluation, incentives, data/information/ knowledge projects, responses, decisions dynamic -> agile working format, small iterations evolution -> rapid development, evaluation and iteration (open)
Methodology…
DCO-DS Evaluation Form as key input to DCO-DS ●Focused on the evaluation of Deep Carbon virtual Observatory ●Evaluation questions will help determine DCvO's role in ○Increasing members, activity and awareness of DCO activities ○Enabling search, access, exchange and use of data & information for DCO scientific and educational needs ○Needs to further integrate with DCO Members' essential technologies ●Phased roll-out to begin early Oct ○Wave 1: Executive Committee, Secretariat, Community leads, selected others ○Wave 2: DCO SSCs, Engagement ○Waves 3, 4, 5, 6: DCO Communities
Value Philosophy Value focuses on organizational outputs (or outcomes) rather than inputs For example: Deployed knowledge and skills vs research budgets Value relates to benefit of outcomes, rather than outcomes themselves Products and services enabled by knowledge and skills Value implies relative, useful, and usable outcomes Beneficiaries have to understand and appreciate Credit: B. Rouse (BEVO) 2008
Leveraging existing data resources Interface between DCO Data Portal and other data repositories – key part of post-2019 efforts (e.g. Spring 2015 effort with CoDL/ MBL) Incorporate specific metadata requirements into the DCO Knowledge Store Extend DCO Ontology for incorporation of other repository data, and/or utilize existing schema Provide data in a variety of formats for use (non-specialists) Populate the metadata and data repository for DCO projects that do not already have their own portal Work on and develop new boundary activities
DCO-DS Boundary Activities
Moving Forward A technology refresh for major platform components for the DCO network, and a “network” succession plan Prioritized efforts based on evaluations (Nov-Dec) Inputs from DCO synthesis discussions and post-2019 committees/ task groups Significant efforts on data registration and data legacies And continue to work on existing and develop new boundary activities
Questions? Comments? Patrick West, Peter Fox, The Team: Lead: Peter Fox, Staff: Patrick West, Stephan Zednik and John Erickson, Post Doc: Marshall Ma, Graduate Students: Han Wang, Hao Zhong, Ahmed Eleish
DCO Knowledge Graph Analytics 1.Identified key areas of DCO for analysis and visualization, initially: ○Publications and publication keywords ○User registrations ○DCO Member areas of expertise 2.Instance Creation statistics: who is creating what and associated with what communities. 3.What would you like to see?
DCO Knowledge Graph Analytics Publication Subject Area Word Cloud
Current Work: Thermodynamic Data Rescue ●A large number of geoscience publications contain publication datasets that are not expressed external to the publication text ●Extracting, organizing, and reusing these datasets is valuable ●Data Science Team and Extreme Physics and Chemistry community member Mark Ghiorso identified thermodynamic datasets about the enthalpy and entropy of chemicals
Current Work: Geo Sample curation and IGSN ●Have GeoSample as a class in DCO ontology and collect the core metadata items for sample registration in the DCO data portal; ●Interface between the DCO IGSN Allocation Agent and the IGSN registry agent, with two potential functionalities: ○Assign IGSN to a sample record through the DCO data portal in collaboration with UT funded activity ○Use IGSN to import sample records from existing repositories to the DCO data portal, if there is a mature IGSN metadata API
Future Work: Instrument Reporting and Browsing* ●Progress to-date: ○Reporting on DCO-funded Instrument use by Projects and Field Studies ○Referencing DCO Instrument use within Grant Summary Reports ■within Instrument grants and related project/field study grants ●Future work: The Instrument Browser ○Dynamically generated instrument list and instrument summary page ○A faceted search interface for instruments ○Instrument discovery based on nature of use, data collected, projects and point of contact * Outcome from the DCO Data Science day at RPI in 2014!!!
Future Work: Deep Carbon Science Trend Analysis ●Natural Language Processing (NLP) based analysis of Deep Carbon publication corpus ○Extracts entities and relations from the corpus ○Constructs a Deep Carbon Knowledge Base consisting of unified entities and relations ○Provides structured knowledge for downstreaming applications and analysis ●Includes retrieval of authoritative metadata into DCO Knowledge Graph ●Includes Deep Carbon Science Visualization Dashboard