Download presentation
Presentation is loading. Please wait.
Published byOpal Clark Modified over 6 years ago
1
Analytics to Better Interpret and Use Large Amounts of Heterogeneous Data
IN51B-3778 Tiffany Mathews, Walter Baskin, and Pamela Rinsland NASA Atmospheric Science Data Center (ASDC), Langley Research Center (LaRC), Hampton, VA Earth Science Data Analytics at the NASA Atmospheric Science Data Center (ASDC) Data scientists at NASA’s Atmospheric Science Data Center (ASDC) are seasoned software application developers who have worked with the creation, archival, and distribution of large datasets (multiple terabytes and larger). In order for ASDC data scientists to effectively implement the most efficient processes for cataloging and organizing data access applications, they must be intimately familiar with data contained in the datasets with which they are working. Key technologies that are critical components to the background of ASDC data scientists include: large RBMSs (relational database management systems) and NoSQL databases; web services; service-oriented architectures; structured and unstructured data access; as well as processing algorithms. However, as prices of data storage and processing decrease, sources of data increase, and technologies advance - granting more people to access to data at real or near-real time - data scientists are being pressured to accelerate their ability to identify and analyze vast amounts of data. With existing tools this is becoming exceedingly more challenging to accomplish. For example, NASA Earth Science Data and Information System (ESDIS) alone grew from having just over 4PBs of data in 2009 to nearly 6PBs of data in This amount then increased to roughly10PBs of data in With data from at least ten new missions to be added to the ESDIS holdings by 2017, the current volume will continue to grow exponentially and drive the need to analyze more data even faster. Though there are many highly efficient, off-the-shelf analytics tools available, these tools mainly cater towards business data, which is predominantly unstructured. Inadvertently, there are very few known, off the shelf, analytics tools that interface well to archived Earth science data, which is predominantly heterogeneous and structured. Earth Science Data Analytics Types Use Case There are five types of analytics relevant to analyzing Earth science data. Though they are also relevant to business analysis, they apply quite differently. Business analytics ranks these analytics types to show increased value, as numbered in the diagram to the right with 1 adding the least value to 5 adding the most. Therefore, they are often depicted in a line chart. This diagram was created to show that each data analytics type is equally important in an Earth science data analytics paradigm. In Earth science data analytics, each type of analytics is a critically part to better understand the effects of human activity on the Earth’s atmosphere and the role it plays in climate change. 1 Descriptive What happened? Diagnostic Why did it happen? Discovery What approach to take to learn from the data? Prescriptive What’s the best course of action? Predictive What is likely to happen? Deriving Information on Surface conditions from Column and Vertically Resolved Observations Relevant to Air Quality (DISCOVER-AQ) is a joint project between scientists with the Environmental Protection Agency (EPA), the DISCOVER-AQ team, and NASA. This is a four-year campaign that was created to alleviate challenges faced using satellites to monitor air quality for public health and environmental benefit. Space-based instruments struggle to distinguish between air quality high in the atmosphere and air quality near the surface where people are most affected. DISCOVER-AQ offers scientists targeted airborne and ground-based observations to better use current and future satellites to diagnose ground level conditions influencing air quality. This will require 5 2 Analytics Tools The color of each box below corresponds to the matching analytics type in the chart to the left. Each box includes examples of analytics tools, mainly homegrown, that have been created to work with Earth science data. Visualization Tools Tools that enable data users to consolidate data, whether from the same project or different projects measuring the same parameters, into one visualization that helps people to quickly diagnose what has happened and why. Examples include: UV-CDAT: An open source tool that performs parallel processing, data reduction and analysis to generate 3-D visualizations. It takes place via ParaView (in the future also by VisIt) and multiple views are displayed back to users. A consortium maintains the tool. Current members of the consortia include the Department of Energy (DOE), two universities, NASA, and two private companies (Kitware and tech-X). ArcGIS: a tool developed by Esri that enables maps, models, and tools to be distributed within and outside of an organization. Semantic Tools Tools that offer a more dynamic search to help researchers at any experience level decide the best approach to take (what data to use) to learn from the data. Examples include: Ontology-Driven Interactive Search Environment for Earth Science (ODISEES) Semantic Web for Earth and Environmental Terminology (SWEET) Subsetting Tools: Tools that enable users to refine data orders to only include the data that they need for their research to more quickly learn from it. They generally offer three options (date, time, and geolocation) that can be used individually or in combination with one another. ASDC Search and Subset Web Applications Simple Subset Wizard (SSW) Data Modeling Tools Tools that enable data users to model specific events, based on what has happened in the past under certain circumstances, to better predict future Earth science phenomena. GRid Analysis and Display System (GrADS): an interactive desktop tool that offers easy access, manipulation, and visualization of earth science data. It offers two data models for handling gridded and station data and supports many data file formats (NetCDF, HDF4, HDF5, etc.). Multi-Instrument Inter-Calibration (MIIC II): A software framework that provides better access to distributed data for inter-calibration by finding and acquiring matched samples for instruments on separate spacecraft. It offers support for both low Earth orbiting (LEO) Geosynchronous (GEO) satellite instruments. The Earth science data ingested, archived, and distributed at NASA data centers does not readily lend itself to prescriptive data analytics, as NASA data centers focus on the stewardship and quality of the data. It is likely that the organizations that use NASA data have applied their own prescriptive analytics tools to determine the best course of action to take based on their findings. Data Conversion Tools: Tools that help to get data in a readable format to be able to describe what happened. Examples include: HDF Tools such as HDUMP, a utility program to read in any gridded MISR data that is written in the HDF-EOS grid format; HDFView for converting images from GIF, JPG, BMP, and PNG to the HDF format and back; HDF to NetCDF; and more. 4 3 Conclusion Though analytics tools exist for Earth science data, very few of those tools are available off-the-shelf. Analytics tools developed in-house often require several modifications before they can be used with data in which they were not specifically designed. For example, subsetters that are created to search for specific parameters are created for specific data sets. The framework, however, can often be applied to other datasets after modifications are made. In the future we are hopeful that there will be more off-the-shelf tools available to more quickly analyze the valuable Earth science data and limit time needed to modify tools. References Types of Analytics : Subsetting: GrADS: ODISEES: HDF Tools: UV-CDAT: Acknowledgements A special thanks to Steve Kempler of Goddard Space Flight Center (GSFC). Without his keen interest in Earth Science Data Analytics and boldness in venturing into the newly chartered waters of Earth science data analytics we would not be as far along the path as we are now. Earthdata QR Poster
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.