Download presentation
Presentation is loading. Please wait.
Published byWhitney Benson Modified over 6 years ago
1
Get the poster at Semantic Visualization Provenance Records:
IN51D-1713 Semantic Visualization Provenance Records: Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis Deborah L. McGuinness Stephan Zednik Patrick West Peter Arthur Fox Rensselaer Polytechnic Institute 110 8th St., Troy, NY, United States ( Opening: * For this work, we are interested in approaches for management of collections of time-series data, gathered on the solar corona. * Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). * Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable these needed analyses. * From the start of my involvement with the work, two challenges were emphasized ** Only small sections of the data will typically contain content of interest to scientists ** Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) * Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Case Study: CoMP * Our work was conducted based on a set of HAO pipelines, the most recent of which being the CoMP pipeline - designed to measure light polarization from the solar corona. * CoMP gathers raw data from the MLSO observatory in Hawaii. * At MLSO, staff maintain observation records - intended to detail things that could impact data gathering (e.g., instrument or weather events). * Additionally, MLSO maintains activity logs - intended to detail solar activities (e.g., Active Regions, Coronal Mass Ejections). * The raw data from MLSO is then sent to HAO in Boulder, where it is processed by a local data pipeline into visualizations usable by scientists. During HAO's processing, quality metrics are applied to the data to enable fitness for use assessment. Primary metric: GBU (good, bad, ugly) measures amount of noise detected in image data. Project Goals: * Encode provenance of individual solar visualizations - to enable comparison in calculation conditions. For example - which flat file was used to transform this set of data? * Attach to provenance ontology-backed data, corresponding to: (i) quality metrics applied, and (ii) records of the observations applied to generate the data - based on RPI's STOM ontology. * Encode semantics of observation + activity logs to enable search + cross-referencing with data records. Encoding Semantics of Individual Visualizations: * This was a foundational step to the work - conducted based on work conducted during between HAO and RPI. * For individual visualizations, we established an RDF-based strategy for encoding the steps taken in the local HAO pipeline for transforming data from MLSO into usable visualizations. * Encoding based on Open Provenance Model. * Using this encoding as a foundation, we were then able to attach details about the (i) Observation made to get the raw data, and (ii) the GBU quality metric applied. Usage of Datacube: * Means of expressing multidimensional data. * Enables expression of aggregations of data values. * Presently being applied by RPI in other projects requiring multidimensional data analysis (studying trends in research communities based on document statistics (CITE ISWC poster)). * At a high level, data cube defined by DAM and OSD TIME1 TIME2 TIME3 TIME4 TIME5 GBU Value Relevant Datacube Processing: - Constrained retrieval of data points. - Aggregation (applied based on data cube encodings). Use Cases For this set of images exhibited this type of solar phenomena. - Return the aggregated GBU result. - Return a data cube chunk for further exploration. For this set of images utilizing the following flat field: For this set of images running based on version blah of demod.pro: For the observer log comment BLAH: - Return a range of images around this comment, based on a defined temporal range. Future Work: Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory. Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledgebase ( Expanded use of dimensions in data cube, to incorporate FITS header data. Motivations and Challenges: Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable needed analyses. This work focuses on two challenges: Only small sections of the data will typically contain content of interest to scientists Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Data Management Strategies: Provenance records for individual visualizations. Ontological classification of visualizations, using DQ and STOM Encoding records in RDF Datacube [2] (proposed) Datacube Basics: Properties attached to datasets/slices/observati ons: Dimensions: Year, Metric Attributes: GBU Metric Measures: 146 (the value) Case Study: Coronal Multi-channel Polarimeter (CoMP): Mauna Loa Solar Observatory (MLSO) Hawaii Intensity Visualizations Raw Image Data Captured National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Follow-up Processing on Raw Data Publishes Time-stamped Observation Logs, maintained by MLSO staff. Comments on: Weather + Instrument conditions Datacube Usage: For HAO visualization records, Datacube can be used in two ways: Returning aggregations of statistics for images (e.g., GBU results). - Returning sets of visualizations (data points) for further exploration, based on constraints (e.g., temporal range). Use Cases: - Activity Log Usage: Return images corresponding to a specific solar event record. - Provenance (utilized data product): For this set of images utilizing the following flat field configuration file. - Provenance (utilized process): For this set of images running based on version 2.0 of process “Extract Intensity”. - Observer Log Usage: For the following observer log comment, return visualizations within 2 hours of the comment timestamp. Time-stamped Activity Logs, maintained by MLSO staff. Comments on solar events (Coronal Mass Ejections, Active Regions) Get the poster at Sponsors: National Science Foundation Next Steps - Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory . - Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledge Base [3]. - Expanded use of dimensions in data cube, to include FITS header data. Poster: MT15A-08 Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute VSTO – Virtual Solar Terrestrial Observatory. FITS – Flexible Image Transport System References: [1] Mauna Loa Solar Observatory (High Altitude Observatory Site): [2] RDF Datacube Vocabulary: [3] Heliophysics Event Knowledge Base: Acknowledgments: Sapan Shah and Naveen Sridhar from the Tetherless World Constellation at RPI Joan Burkepile, Steve Tomczyk and Leonard Sitongia at the High Altitude Observatory.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.