Get the poster at Semantic Visualization Provenance Records:

Slides:



Advertisements
Similar presentations
Geoinformatics 2008 Fox Semantic Provenance 1 Semantic Provenance for Image Data Processing Peter Fox (HAO/ESSL/NCAR) Deborah McGuinness (RPI) Jose Garcia,
Advertisements

Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
High Altitude Observatory (HAO) – National Center for Atmospheric Research (NCAR) The National Center for Atmospheric Research is operated by the University.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Experiences Developing a User- centric Presentation of A Domain- enhanced Provenance Data Model Cynthia Chang 1, Stephan Zednik 1, Chris Lynnes 2, Peter.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Configurable User Interface Framework for Cross-Disciplinary and Citizen Science Presented by: Peter Fox Authors: Eric Rozell, Han Wang, Patrick West,
Progress in Open-World, Integrative, Web-based Collaborative Research Platforms Peter Fox and the DCO-DS* Team Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Semantic Cyberinfrastructure for Knowledge and Information Discovery (SCiKID) Proposal Principle Investigator: Eric Rozell Tetherless World Constellation.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
Semantically-Enabled Science Data Integration (SESDI) and The Virtual Solar-Terrestrial Observatory (VSTO) Semantically-enabled (large-scale) Scientific.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Human-Aware Sensor Network Ontology (HASNetO): Semantic Support for Empirical Data Collection Paulo Pinheiro 1, Deborah McGuinness 1, Henrique Santos 1,2.
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
Provenance Capture in Data Access And Data Manipulation Software
improve the efficiency, collaborative potential, and
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Deep Carbon Observatory Data Science Platform
Data types and persistent identifiers in
Modeling Data Set Versioning Operations
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu Xiaogang Ma Patrick West
Modeling Data Set Versioning Operations
Presentation transcript:

Get the poster at Semantic Visualization Provenance Records: IN51D-1713 Semantic Visualization Provenance Records: Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis (michaelis@cs.rpi.edu), Deborah L. McGuinness (dlm@cs.rpi.edu), Stephan Zednik (zednis@rpi.edu), Patrick West (westp@rpi.edu), Peter Arthur Fox (pfox@cs.rpi.edu) Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States (http://www.lmsal.com/hek/index.html) Opening: * For this work, we are interested in approaches for management of collections of time-series data, gathered on the solar corona. * Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). * Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable these needed analyses. * From the start of my involvement with the work, two challenges were emphasized ** Only small sections of the data will typically contain content of interest to scientists ** Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) * Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Case Study: CoMP * Our work was conducted based on a set of HAO pipelines, the most recent of which being the CoMP pipeline - designed to measure light polarization from the solar corona. * CoMP gathers raw data from the MLSO observatory in Hawaii. * At MLSO, staff maintain observation records - intended to detail things that could impact data gathering (e.g., instrument or weather events). * Additionally, MLSO maintains activity logs - intended to detail solar activities (e.g., Active Regions, Coronal Mass Ejections). * The raw data from MLSO is then sent to HAO in Boulder, where it is processed by a local data pipeline into visualizations usable by scientists. During HAO's processing, quality metrics are applied to the data to enable fitness for use assessment. Primary metric: GBU (good, bad, ugly) measures amount of noise detected in image data. Project Goals: * Encode provenance of individual solar visualizations - to enable comparison in calculation conditions. For example - which flat file was used to transform this set of data? * Attach to provenance ontology-backed data, corresponding to: (i) quality metrics applied, and (ii) records of the observations applied to generate the data - based on RPI's STOM ontology. * Encode semantics of observation + activity logs to enable search + cross-referencing with data records. Encoding Semantics of Individual Visualizations: * This was a foundational step to the work - conducted based on work conducted during 2010-11 between HAO and RPI. * For individual visualizations, we established an RDF-based strategy for encoding the steps taken in the local HAO pipeline for transforming data from MLSO into usable visualizations. * Encoding based on Open Provenance Model. * Using this encoding as a foundation, we were then able to attach details about the (i) Observation made to get the raw data, and (ii) the GBU quality metric applied. Usage of Datacube: * Means of expressing multidimensional data. * Enables expression of aggregations of data values. * Presently being applied by RPI in other projects requiring multidimensional data analysis (studying trends in research communities based on document statistics (CITE ISWC poster)). * At a high level, data cube defined by DAM and OSD TIME1 TIME2 TIME3 TIME4 TIME5 GBU Value 600 540 780 620 500 Relevant Datacube Processing: - Constrained retrieval of data points. - Aggregation (applied based on data cube encodings). Use Cases For this set of images exhibited this type of solar phenomena. - Return the aggregated GBU result. - Return a data cube chunk for further exploration. For this set of images utilizing the following flat field: For this set of images running based on version blah of demod.pro: For the observer log comment BLAH: - Return a range of images around this comment, based on a defined temporal range. Future Work: Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory. Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledgebase (http://www.lmsal.com/hek/index.html). Expanded use of dimensions in data cube, to incorporate FITS header data. Motivations and Challenges: Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable needed analyses. This work focuses on two challenges: Only small sections of the data will typically contain content of interest to scientists Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Data Management Strategies: Provenance records for individual visualizations. Ontological classification of visualizations, using DQ and STOM Encoding records in RDF Datacube [2] (proposed) Datacube Basics: Properties attached to datasets/slices/observati ons: Dimensions: Year, Metric Attributes: GBU Metric Measures: 146 (the value) Case Study: Coronal Multi-channel Polarimeter (CoMP): Mauna Loa Solar Observatory (MLSO) Hawaii Intensity Visualizations Raw Image Data Captured National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Follow-up Processing on Raw Data Publishes Time-stamped Observation Logs, maintained by MLSO staff. Comments on: Weather + Instrument conditions Datacube Usage: For HAO visualization records, Datacube can be used in two ways: Returning aggregations of statistics for images (e.g., GBU results). - Returning sets of visualizations (data points) for further exploration, based on constraints (e.g., temporal range). Use Cases: - Activity Log Usage: Return images corresponding to a specific solar event record. - Provenance (utilized data product): For this set of images utilizing the following flat field configuration file. - Provenance (utilized process): For this set of images running based on version 2.0 of process “Extract Intensity”. - Observer Log Usage: For the following observer log comment, return visualizations within 2 hours of the comment timestamp. Time-stamped Activity Logs, maintained by MLSO staff. Comments on solar events (Coronal Mass Ejections, Active Regions) http://bit.ly/VaKADB Get the poster at Sponsors: National Science Foundation Next Steps - Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory . - Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledge Base [3]. - Expanded use of dimensions in data cube, to include FITS header data. Poster: MT15A-08 Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute VSTO – Virtual Solar Terrestrial Observatory. FITS – Flexible Image Transport System References: [1] Mauna Loa Solar Observatory (High Altitude Observatory Site): http://mlso.hao.ucar.edu/ [2] RDF Datacube Vocabulary: http://www.w3.org/TR/vocab-data-cube/ [3] Heliophysics Event Knowledge Base: http://www.lmsal.com/hek/index.html Acknowledgments: Sapan Shah and Naveen Sridhar from the Tetherless World Constellation at RPI Joan Burkepile, Steve Tomczyk and Leonard Sitongia at the High Altitude Observatory.