Get the poster at Semantic Visualization Provenance Records: IN51D-1713 Semantic Visualization Provenance Records: Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis (michaelis@cs.rpi.edu), Deborah L. McGuinness (dlm@cs.rpi.edu), Stephan Zednik (zednis@rpi.edu), Patrick West (westp@rpi.edu), Peter Arthur Fox (pfox@cs.rpi.edu) Rensselaer Polytechnic Institute 110 8th St., Troy, NY, 12180 United States (http://www.lmsal.com/hek/index.html) Opening: * For this work, we are interested in approaches for management of collections of time-series data, gathered on the solar corona. * Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). * Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable these needed analyses. * From the start of my involvement with the work, two challenges were emphasized ** Only small sections of the data will typically contain content of interest to scientists ** Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) * Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Case Study: CoMP * Our work was conducted based on a set of HAO pipelines, the most recent of which being the CoMP pipeline - designed to measure light polarization from the solar corona. * CoMP gathers raw data from the MLSO observatory in Hawaii. * At MLSO, staff maintain observation records - intended to detail things that could impact data gathering (e.g., instrument or weather events). * Additionally, MLSO maintains activity logs - intended to detail solar activities (e.g., Active Regions, Coronal Mass Ejections). * The raw data from MLSO is then sent to HAO in Boulder, where it is processed by a local data pipeline into visualizations usable by scientists. During HAO's processing, quality metrics are applied to the data to enable fitness for use assessment. Primary metric: GBU (good, bad, ugly) measures amount of noise detected in image data. Project Goals: * Encode provenance of individual solar visualizations - to enable comparison in calculation conditions. For example - which flat file was used to transform this set of data? * Attach to provenance ontology-backed data, corresponding to: (i) quality metrics applied, and (ii) records of the observations applied to generate the data - based on RPI's STOM ontology. * Encode semantics of observation + activity logs to enable search + cross-referencing with data records. Encoding Semantics of Individual Visualizations: * This was a foundational step to the work - conducted based on work conducted during 2010-11 between HAO and RPI. * For individual visualizations, we established an RDF-based strategy for encoding the steps taken in the local HAO pipeline for transforming data from MLSO into usable visualizations. * Encoding based on Open Provenance Model. * Using this encoding as a foundation, we were then able to attach details about the (i) Observation made to get the raw data, and (ii) the GBU quality metric applied. Usage of Datacube: * Means of expressing multidimensional data. * Enables expression of aggregations of data values. * Presently being applied by RPI in other projects requiring multidimensional data analysis (studying trends in research communities based on document statistics (CITE ISWC poster)). * At a high level, data cube defined by DAM and OSD TIME1 TIME2 TIME3 TIME4 TIME5 GBU Value 600 540 780 620 500 Relevant Datacube Processing: - Constrained retrieval of data points. - Aggregation (applied based on data cube encodings). Use Cases For this set of images exhibited this type of solar phenomena. - Return the aggregated GBU result. - Return a data cube chunk for further exploration. For this set of images utilizing the following flat field: For this set of images running based on version blah of demod.pro: For the observer log comment BLAH: - Return a range of images around this comment, based on a defined temporal range. Future Work: Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory. Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledgebase (http://www.lmsal.com/hek/index.html). Expanded use of dimensions in data cube, to incorporate FITS header data. Motivations and Challenges: Analysis of solar data necessary for space weather modeling and forecasting – which have broad implications for terrestrial activity (e.g., communication grid reliability). Time series visualizations of solar activity, created by the High Altitude Observatory [1], enable needed analyses. This work focuses on two challenges: Only small sections of the data will typically contain content of interest to scientists Subsets of time-series data may correspond to an event of interest at a particular time (e.g., a solar event) Based on these challenges, one goal in this work was to enable scientists to get back data sets corresponding to desired data products - to facilitate further analysis. Data Management Strategies: Provenance records for individual visualizations. Ontological classification of visualizations, using DQ and STOM Encoding records in RDF Datacube [2] (proposed) Datacube Basics: Properties attached to datasets/slices/observati ons: Dimensions: Year, Metric Attributes: GBU Metric Measures: 146 (the value) Case Study: Coronal Multi-channel Polarimeter (CoMP): Mauna Loa Solar Observatory (MLSO) Hawaii Intensity Visualizations Raw Image Data Captured National Center for Atmospheric Research (NCAR) Data Center. Boulder, CO Follow-up Processing on Raw Data Publishes Time-stamped Observation Logs, maintained by MLSO staff. Comments on: Weather + Instrument conditions Datacube Usage: For HAO visualization records, Datacube can be used in two ways: Returning aggregations of statistics for images (e.g., GBU results). - Returning sets of visualizations (data points) for further exploration, based on constraints (e.g., temporal range). Use Cases: - Activity Log Usage: Return images corresponding to a specific solar event record. - Provenance (utilized data product): For this set of images utilizing the following flat field configuration file. - Provenance (utilized process): For this set of images running based on version 2.0 of process “Extract Intensity”. - Observer Log Usage: For the following observer log comment, return visualizations within 2 hours of the comment timestamp. Time-stamped Activity Logs, maintained by MLSO staff. Comments on solar events (Coronal Mass Ejections, Active Regions) http://bit.ly/VaKADB Get the poster at Sponsors: National Science Foundation Next Steps - Deployment of provenance record retrieval as part of Virtual Solar Terrestrial Observatory . - Semantic Encoding of MLSO Event Logs - or data from Lockheed Martin's Heliophysics Events Knowledge Base [3]. - Expanded use of dimensions in data cube, to include FITS header data. Poster: MT15A-08 Glossary: RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute VSTO – Virtual Solar Terrestrial Observatory. FITS – Flexible Image Transport System References: [1] Mauna Loa Solar Observatory (High Altitude Observatory Site): http://mlso.hao.ucar.edu/ [2] RDF Datacube Vocabulary: http://www.w3.org/TR/vocab-data-cube/ [3] Heliophysics Event Knowledge Base: http://www.lmsal.com/hek/index.html Acknowledgments: Sapan Shah and Naveen Sridhar from the Tetherless World Constellation at RPI Joan Burkepile, Steve Tomczyk and Leonard Sitongia at the High Altitude Observatory.