Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1

Slides:



Advertisements
Similar presentations
The Anatomy and Physiology of Data Science Peter Fox 1 ( 1.
Advertisements

A Framework for Earth Science Search Interface Development Designing and Implementing S2S Eric Rozell, Tetherless World Constellation, RPI.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Ontology and Application for Reusable Search Interface Design Plans for Advanced Semantic Technologies Final Project Eric Rozell, Tetherless World Constellation.
Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Configurable User Interface Framework for Cross-Disciplinary and Citizen Science Presented by: Peter Fox Authors: Eric Rozell, Han Wang, Patrick West,
Progress in Open-World, Integrative, Web-based Collaborative Research Platforms Peter Fox and the DCO-DS* Team Tetherless World Constellation.
Towards a Proxy Architecture for Semantic Web Services Eric Rozell, Tetherless World Constellation (
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Data Science and Analytics Curriculum development at Rensselaer (and the Tetherless World Constellation) (Adapted from NRC BigData Education Was April.
References: [1] [2] [3] Acknowledgments:
What has been lacking, until recently, is a successful method to develop, implement and sustain informatics solutions to modern application problems, such.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Ethel Stanley, BioQUEST Curriculum Consortium Sam Donovan, University of Pittsburgh Jackson State University Jackson, MS April 25, 2013 Cyberlearning in.
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Biological and Chemical Oceanography Data Management Office slide 1 of 19 CAMEO Data Management Bob Groman Biological and Chemical Oceanography Data Management.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
DCO-DS: Moving Forward DCO Synthesis Meeting. Oct , 2015 DCO-DS = DCO Data Science.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Information Model Driven Semantic Framework Architecture and Design for Distributed Data Repositories AGU 2011, IN51D-04 December 9, 2011 Peter Fox (RPI)
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Biological and Chemical Oceanography Data Management Office slide 1 of 22 Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological.
A Framework for Earth Science Search Interface Development Design and Implementation of S2S Presented by: Stephan Zednik, Tetherless World Constellation.
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
The Semantic eScience Framework AGU FM10 IN22A-02 Deborah McGuinness and Peter Fox (RPI) Tetherless World Constellation.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Data types and persistent identifiers in
Modeling Data Set Versioning Operations
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Science Data Platforms: Informatics Architectures at the Forefront.
Modeling Data Set Versioning Operations
Presentation transcript:

Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1 ( 1 Rensselaer Polytechnic Institute th St., Troy, NY, United States – see Acknowledgements) Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute S2S – S2S (!) SESF – Semantic eScience Framework BCO-DMO – Biological and Chemical Oceanography Data Management Office Acknowledgments: SeSF Project Team: Eric Rozell, Han Wang, Jin Zheng, Patrick West, Stephan Zednik, Jim Hendler, Deborah McGuinness BCO-DMO Staff: Cyndy Chandler, Adam Shephard, Bob Groman Sponsors: National Science Foundation Tetherless World Constellation MOTIVATION  In the emergent “fourth paradigm” (data-driven) science, the scientific method is enhanced by the integration of significant data sources into the practice of scientific research.  To address Big Science, there are challenges in understanding the role of data in enabling researchers to attack not just disciplinary issues, but also the system-level, large-scale, and transdisciplinary global scientific challenges facing society.  Recognizing that the volume of data is only one of many dimensions to be considered, there is a clear need for improved data infrastructures to mediate data and information exchange, which we contend will need to be powered by semantic technologies.  One clear need is to provide computational approaches for researchers to discover appropriate data resources, rapidly integrate data collections from heterogeneously resources or multiple data sets, and inter- compare results to allow generation and validation of hypotheses.  Another trend is toward automated tools that allow researchers to better find and reuse data that they currently don’t know they need, let alone know how to find.  Again semantic technologies will be required.  Finally, to turn data analytics from "art to science", technical solutions are needed for cross-dataset validation, reproducibility studies on data- driven results, and the concomitant citation of data products allowing recognition for those who curate and share important data resources. Semantic eScience Framework Five Generations of Mediation – Borgman et al. (2008) CyberLearning Report Cognitive Computing Realizing the 6 th Generation and the Integration of the Other 5!  Schematic of a Cognitive Computing Archeitecture (courtesy Jim Hendler)  Smart data agents are part of the next generation of computing infrastructure mediating research  These agents are a fundamental part of the new cognitive computing platforms being developed  Open-world (versus Closed-world) is essential  Linked data will be a fundamental enabler  Smart applications! AGUFM14 – IN23C-3737 (MS Hall A-C) Framework and relation to external sources  Needed evolution of cognitive systems where humans, many humans are in the loop – bringing generations 1, 2 and 3 together with generations 3, 4, 5 and now 6. All these generations of mediation are in effect as we conduct research!! NOTE: INCREASING COMPLEXITY Smart agents. Open world, semantic agents, with rules… it is notable that these capabilities NEVER made it into the top row of capabilities… in main figure. Data agents. Ones that can find data for you, and perhaps even convert it to the right format, find contextual information, etc. Illustration by Roy Pea and Jillian C. Wallis, from C. L. Borgman, H. Abelson, L. Dirks, R. Johnson, K. R. Koedinger, M. C. Linn, C. A. Lynch, D. G. Oblinger, R. D. Pea, K. Salen, M. S. Smith, and A. Szalay, “Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge. A 21st Century Agenda for the National Science Foundation. Report of the NSF Task Force on Cyberlearning,” Office of Cyberinfrastructure and Directorate for Education and Human Resources. National Science Foundation, Washington, D.C., Mediation Example: S2S Framework and Application  Application Integration  Smart Faceted Browse Dashboards  Linked vocabularies (not Brokering) Smart Text Agents Smart Data Agents Relationship and Assoication Rules Cognitive Collaboration Linked Vocabulary S2S Application Ontology Figure: While initially developed for CyberLearning, these mediation modes apply more generally (courtesy Jim Hendler) Use of Linked vocabularies for categories of variables, instruments, and other terms enables discovery. Memory Reasoning Decision Making Watson, Cogito, and Clarion