Brief: Data Science Progress/ Activities and Renewal Plans DCO Executive Committee. Oct. 8-9, 2015. Rome (IT) DCO-DS = DCO Data Science.

Slides:



Advertisements
Similar presentations
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Advertisements

DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Ontology Classifications Acknowledgement Abstract Content from simulation systems is useful in defining domain ontologies. We describe a digital library.
Issues in the Transfer of Help Tools to Government Agencies: The Example of the Statistical Interactive Glossary (SIG) Stephanie W. Haas School of Information.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Overview of Search Engines
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
RMIS - Building a Research Management Information System at the University of Glamorgan Leanne Beevers & Neil Williams.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Progress in Open-World, Integrative, Web-based Collaborative Research Platforms Peter Fox and the DCO-DS* Team Tetherless World Constellation.
The Digital Library for Earth System Education: A Community Resource
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
An Example in The DCO Data Portal Formal Specification of Data Types in the Deep Carbon Observatory Data Portal Xiaogang (Marshall) Ma
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
1 Data Integration Community of Practice Meeting September 15, 2009 Science Data Integration.
DCO's Data Science Day Introduction June 5, 2014, Troy NY Peter Fox (Rensselaer Polytechnic Institute)
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
GEO Work Plan Symposium 2012 ID-03: Science and Technology in GEOSS ID-03-C1: Engaging the Science and Technology (S&T) Community in GEOSS Implementation.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
VIVO and Scholarly Repositories: Synergistic Opportunities.
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
DCO-DS: Moving Forward DCO Synthesis Meeting. Oct , 2015 DCO-DS = DCO Data Science.
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
The Value of E Books: Beyond a Good Read Mark Schregardus, VP - International Sales Ovid Technologies Informatio Medicato 2004 MOKSZ Budapest, 2004.
International Planetary Data Alliance Registry Project Update September 16, 2011.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
User Characterization in Search Personalization
DataNet Collaboration
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Data Management: Documentation & Metadata
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Health Ingenuity Exchange - HingX
Bird of Feather Session
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Brief: Data Science Progress/ Activities and Renewal Plans DCO Executive Committee. Oct. 8-9, Rome (IT) DCO-DS = DCO Data Science

Since March (Intl. Science Mtg.) DCO Statistics (now): Over 5,500 people across 698 organizations. Over 2,100 publications (548 via DCO^). Over 216 projects, field studies (76), equipment (43),... Over 1,995 research topics. Over 155 datasets (-24 Igor, +). DCO Data Types … Even more “objects”, reports,... DCO Statistics: Over 4,700 people across 567 organizations. Over 1,400 publications*. Over 210 projects including field studies. Over 1,600 research topics. Over 160 datasets. Over 590 research locations. Over objects.

Aug. 2014

DCO Context: Virtual Observatory and Virtual Organization Linking the resources! Deep Carbon Observatory Online Peter Fox, and Janet Kozyra, 2015, eScience and Informatics for international science programs, Progress in Earth and Planetary Science, 2:12, pp. 9. doi: /s

Strategy…

info.deepcarbon.net dx.deepcarbon.net data.deepcarbon.net

DCO Knowledge Graph: Refactoring and Resolving ●Ontology is an important contribution to the scientific ontology ecosystem; organizes scientific knowledge and unlocks the “data” about DCO ●Meets specific ontology best practices ●Recognized opportunities for ontology reuse, esp... ○representing datasets using WWW Dataset Catalog (W3C DCAT) ontology ○incorporating provenance into DCO using WWW Provenance Standard (W3C PROV-O) ●Clarified labels and descriptions of all concepts and relationships ●Added annotations ●Made DCO Ontology browsable and resolvable via content negotiation ●

New Faceted Search Interface ●Implementing for: People, Publications, Projects, Field Studies, Datasets, DataTypes ●Replaces slower, prototype faceted browser ●Using open source platform "ElasticSearch” ●Provides faster text-based searching (based on inverted indices) ●Faster and easier to develop and maintain

on.net/vivo/publicatio ns

DCO Knowledge Graph Analytics 1.Identified key areas of DCO for analysis and visualization, initially: ○Publications and publication keywords ○User registrations ○DCO Member areas of expertise 2.Implemented simple visualizations using open source visualization libraries 3.Generated dynamically via direct queries to DCO Knowledge Graph 4.What would you like to see?

DCO Knowledge Graph Analytics Publication Subject Area Word Cloud

DCO-DS Boundary Activities

EPC: Thermodynamic Data Rescue ●A large number of geoscience publications contain publication datasets that are not expressed external to the publication text ●Extracting, organizing, and reusing these datasets is valuable ●Data Science Team and Extreme Physics and Chemistry community member Mark Ghiorso identified thermodynamic datasets about the enthalpy and entropy of chemicals

Thermodynamic Data Rescue ●Method for extracting ‘dark data’ in publications Locate and download journal article (PDF document) Generate metadata about material, experiment and results Tabulate results from document and run OCR over it to generate data Generate candidate dataset using OCR software Deposit data into data repository; link to original document DCO Knowledge Store DCO Data Repository Data Review and Evaluation

The data rescue work of each paper has a card Move the card to the next step when the task of previous step is done Members can communicate within a card and paste links to relevant resources Implementation of the data rescue workflow

Thermodynamic Data Rescue: Output New datasets available via dataset browser Includes citations to the originating publication Data files accessible through dataset repository Replicable to other Communities, e.g. R&F

DCO-DS Evaluation Form as key input to DCO-DS renewal ●Focused on the evaluation of Deep Carbon virtual Observatory ●Evaluation questions will help determine DCvO's role in ○Increasing members, activity and awareness of DCO activities ○Enabling search, access, exchange and use of data & information for DCO scientific and educational needs ○Needs to further integrate with DCO Members' essential technologies ●Phased roll-out to begin early Oct ○Wave 1: Executive Committee, Secretariat, Community leads, selected others ○Wave 2: DCO SSCs, Engagement ○Waves 3, 4, 5, 6: DCO Communities

Current work (examples) DCO Data Legacy preparation (see later on the agenda) DCO data registration from all DCO projects – by DCO data curator hosted at LDEO, funded by secretariat DCO project reporting Deep Time Data Infrastructure (Keck)

Current Work: Geo Sample curation and IGSN ●Have GeoSample as a class in DCO ontology and collect the core metadata items for sample registration in the DCO data portal; ●Interface between the DCO IGSN Allocation Agent and the IGSN registry agent, with two potential functionalities: ○Assign IGSN to a sample record through the DCO data portal in collaboration with UT funded activity ○Use IGSN to import sample records from existing repositories to the DCO data portal, if there is a mature IGSN metadata API

Future Work: Instrument Reporting and Browsing* ●Progress to-date: ○Reporting on DCO-funded Instrument use by Projects and Field Studies ○Referencing DCO Instrument use within Grant Summary Reports ■within Instrument grants and related project/field study grants ●Future work: The Instrument Browser ○Dynamically generated instrument list and instrument summary page ○A faceted search interface for instruments ○Instrument discovery based on nature of use, data collected, projects and point of contact * Outcome from the DCO Data Science day at RPI in 2014!!!

Future Work: Deep Carbon Science Trend Analysis ●Natural Language Processing (NLP) based analysis of Deep Carbon publication corpus ○Extracts entities and relations from the corpus ○Constructs a Deep Carbon Knowledge Base consisting of unified entities and relations ○Provides structured knowledge for downstreaming applications and analysis ●Includes retrieval of authoritative metadata into DCO Knowledge Graph ●Includes Deep Carbon Science Visualization Dashboard

Future Work: Leveraging existing data resources Interface between DCO Data Portal and other data repositories – key part of post-2019 efforts (e.g. Spring 2015 effort with CoDL/ MBL) Incorporate specific metadata requirements into the DCO Knowledge Store Extend DCO Ontology for incorporation of other repository data, and/or utilize existing schema Provide data in a variety of formats for use (non-specialists) Populate the metadata and data repository for DCO projects that do not already have their own portal Disseminate template data management plan for new projects

Future Work: Continue Infrastructure Evolution Better integration between Community Portal and Data Portal Easier data entry for key concepts (project updates, datasets, publications, etc) DevOps enhancements for easier and faster deployment of infrastructure updates Improve new user onboarding Improve usability of Data Portal Create annotations for representation of evolving deep carbon concepts and relationships

Expected work in renewal To lead into 2019 – a technology refresh for major platform components for the DCO network, and a “network” succession plan Prioritized efforts based on evaluations (Oct-Dec) Inputs from DCO synthesis discussions and post-2019 committees/ task groups Significant efforts on data registration and data legacies Compete key boundary activities Two years or 3.5? Draft in December. Your inputs are essential!