IN51B-3778 Tiffany Mathews, Walter Baskin, and Pamela Rinsland

Slides:



Advertisements
Similar presentations
1 NASA CEOP Status & Demo CEOS WGISS-25 Sanya, China February 27, 2008 Yonsook Enloe.
Advertisements

Future Directions and Initiatives in the Use of Remote Sensing for Water Quality.
Integrating NOAA’s Unified Access Framework in GEOSS: Making Earth Observation data easier to access and use Matt Austin NOAA Technology Planning and Integration.
Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements Workforce Demand and Career Opportunities From.
02/07/2001 EOSDIS Core System (ECS) COTS Lessons Learned Steve Fox
NASA Goddard Space Flight Center Direct Readout Laboratory NPP/JPSS HRD/LRD Status Patrick Coronado NASA Goddard Space Flight Center directreadout.sci.gsfc.nasa.gov/ipopp.
Obtaining MISR Data and Information Jeff Walter Atmospheric Science Data Center April 17, 2009.
SAP Dashboard Mohammed Wahaj. What is SAP Dashboard “Interactive analytics is an analytic capability and “Dashboard” is an information delivery capability.”
Metr 415/715 Monday May Today’s Agenda 1.Basics of LIDAR - Ground based LIDAR (pointing up) - Air borne LIDAR (pointing down) - Space borne LIDAR.
FP OntoGrid: Paving the way for Knowledgeable Grid Services and Systems WP8: Use case 1: Quality Analysis for Satellite Missions.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
, Data for Disaster Planning, Response, Management and Awareness ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Promising data analytics technologies Tiffany Mathews.
Data Merge Examples, Toolsets for Airborne Data (TAD): Customized Data Merging Function ASDC Introduction The Atmospheric Science Data Center (ASDC) at.
, Increasing Discoverability and Accessibility of NASA Atmospheric Science Data Center (ASDC) Data Products with GIS Technology ASDC Introduction The Atmospheric.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
EOSDIS FY2010 Annual Metrics Report Prepared By: Hyo Duck Chang Adnet, Inc. Brian Krupp Adnet, Inc. Lalit Wanchoo Adnet, Inc. February 2011.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
August 2003 At A Glance VMOC-CE is an application framework that facilitates real- time, remote cooperative work among geographically dispersed mission.
Transitioning Low Earth Orbit Satellite Archive Data from Informix (Geodetic DataBlade) to PostgreSQL (PostGIS) Churngwei Chu [
EOSDIS Status 9/29/2010 Dan Marinelli, NASA GSFC
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
By Bryan Gentry DIRECTIONAL WELL PLANNING & PROJECT DATABASE.
EOSDIS Status 10/16/2008 Dan Marinelli, Science Systems Development Office.
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Jianchun Qin, Liguang Wu, Michael Theobald, A. K. Sharma, George Serafino, Sunmi Cho, Carrie Phelps NASA Goddard Space Flight Center, Code 902 Greenbelt,
March 2004 At A Glance autoProducts is an automated flight dynamics product generation system. It provides a mission flight operations team with the capability.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Tools for Coordinating Aircraft During Hurricane Field Campaigns: Real Time Mission Monitor and Waypoint Planning Tool Richard Blakeslee / NASA Marshall.
Monitoring Global Droughts from Space Zhong Liu 1,4, W.L. Teng 2,4, S. Kempler 4, H. Rui 3,4, G. Leptoukh 4, and E. Ocampo 3,4 1 George Mason University,
Oceanobservatories.org Funding for the Ocean Observatories Initiative is provided by the National Science Foundation through a Cooperative Agreement with.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
CEOS Working Group on Information System and Services (WGISS) Data Access Infrastructure and Interoperability Standards Andrew Mitchell - NASA Goddard.
DataGrid France 12 Feb – WP9 – n° 1 WP9 Earth Observation Applications.
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 STAR Enterprise Synthesis.
Making Satellite Datasets Accessible for Everyone A look into my NASA Internship – summer 2015 Aaron Scott University of North Dakota.
Building a Data Warehouse
Data Browsing/Mining/Metadata
2nd GEO Data Providers workshop (20-21 April 2017, Florence, Italy)
Zhong Liu George Mason University and NASA GES DISC
Case Study -- Weather system
CERES Data Management Team
SAMPLE Glimpse Into the Future Using Predictive HR Analytics
Global Precipitation Data Access, Value-added Services and Scientific Exploration Tools at NASA GES DISC Zhong Liu1,4, D. Ostrenga1,2, G. Leptoukh4, S.
MERRA Data Access and Services
Succeeding as a Systems Analysts
GEOSS Air Quality Community Infrastructure
Real IBM C exam questions and answers
CERES Data Management Team Science Data Processing Workshop 2002
Big Data The huge amount of data being collected and stored about individuals, items, and activities and to the process of drawing useful information from.
Goddard Contractor Association
Prepared by: Jennifer Saleem Arrigo, Program Manager
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
Visualization and Analysis of Air Pollution in US East Coast Cities
Manuscript Transcription Assistant Initiative
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
Data Discovery Tools and Services Part B
Problem Statement and Significance
Computer Services Business challenge
Big DATA.
Planning for TEMPO data access via U.S. EPA Remote Sensing Gateway
Robert Dattore and Steven Worley
Maria Teresa Capria December 15, 2009 Paris – VOPlaneto 2009
Presentation transcript:

Analytics to Better Interpret and Use Large Amounts of Heterogeneous Data IN51B-3778 Tiffany Mathews, Walter Baskin, and Pamela Rinsland NASA Atmospheric Science Data Center (ASDC), Langley Research Center (LaRC), Hampton, VA Tiffany.J.Mathews@nasa.gov, Walter.E.Baskin@nasa.gov, Pamela.L.Rinsland@nasa.gov Earth Science Data Analytics at the NASA Atmospheric Science Data Center (ASDC) Data scientists at NASA’s Atmospheric Science Data Center (ASDC) are seasoned software application developers who have worked with the creation, archival, and distribution of large datasets (multiple terabytes and larger).  In order for ASDC data scientists to effectively implement the most efficient processes for cataloging and organizing data access applications, they must be intimately familiar with data contained in the datasets with which they are working. Key technologies that are critical components to the background of ASDC data scientists include: large RBMSs (relational database management systems) and NoSQL databases; web services; service-oriented architectures; structured and unstructured data access; as well as processing algorithms. However, as prices of data storage and processing decrease, sources of data increase, and technologies advance - granting more people to access to data at real or near-real time - data scientists are being pressured to accelerate their ability to identify and analyze vast amounts of data. With existing tools this is becoming exceedingly more challenging to accomplish.   For example, NASA Earth Science Data and Information System (ESDIS) alone grew from having just over 4PBs of data in 2009 to nearly 6PBs of data in 2011. This amount then increased to roughly10PBs of data in 2013. With data from at least ten new missions to be added to the ESDIS holdings by 2017, the current volume will continue to grow exponentially and drive the need to analyze more data even faster. Though there are many highly efficient, off-the-shelf analytics tools available, these tools mainly cater towards business data, which is predominantly unstructured. Inadvertently, there are very few known, off the shelf, analytics tools that interface well to archived Earth science data, which is predominantly heterogeneous and structured. Earth Science Data Analytics Types Use Case There are five types of analytics relevant to analyzing Earth science data. Though they are also relevant to business analysis, they apply quite differently. Business analytics ranks these analytics types to show increased value, as numbered in the diagram to the right with 1 adding the least value to 5 adding the most. Therefore, they are often depicted in a line chart. This diagram was created to show that each data analytics type is equally important in an Earth science data analytics paradigm. In Earth science data analytics, each type of analytics is a critically part to better understand the effects of human activity on the Earth’s atmosphere and the role it plays in climate change. 1 Descriptive What happened? Diagnostic Why did it happen? Discovery What approach to take to learn from the data? Prescriptive What’s the best course of action? Predictive What is likely to happen? Deriving Information on Surface conditions from Column and Vertically Resolved Observations Relevant to Air Quality (DISCOVER-AQ) is a joint project between scientists with the Environmental Protection Agency (EPA), the DISCOVER-AQ team, and NASA. This is a four-year campaign that was created to alleviate challenges faced using satellites to monitor air quality for public health and environmental benefit. Space-based instruments struggle to distinguish between air quality high in the atmosphere and air quality near the surface where people are most affected. DISCOVER-AQ offers scientists targeted airborne and ground-based observations to better use current and future satellites to diagnose ground level conditions influencing air quality. This will require 5 2 Analytics Tools The color of each box below corresponds to the matching analytics type in the chart to the left. Each box includes examples of analytics tools, mainly homegrown, that have been created to work with Earth science data. Visualization Tools Tools that enable data users to consolidate data, whether from the same project or different projects measuring the same parameters, into one visualization that helps people to quickly diagnose what has happened and why. Examples include: UV-CDAT: An open source tool that performs parallel processing, data reduction and analysis to generate 3-D visualizations. It takes place via ParaView (in the future also by VisIt) and multiple views are displayed back to users. A consortium maintains the tool. Current members of the consortia include the Department of Energy (DOE), two universities, NASA, and two private companies (Kitware and tech-X). ArcGIS: a tool developed by Esri that enables maps, models, and tools to be distributed within and outside of an organization.    Semantic Tools Tools that offer a more dynamic search to help researchers at any experience level decide the best approach to take (what data to use) to learn from the data. Examples include: Ontology-Driven Interactive Search Environment for Earth Science (ODISEES) Semantic Web for Earth and Environmental Terminology (SWEET) Subsetting Tools: Tools that enable users to refine data orders to only include the data that they need for their research to more quickly learn from it. They generally offer three options (date, time, and geolocation) that can be used individually or in combination with one another. ASDC Search and Subset Web Applications Simple Subset Wizard (SSW) Data Modeling Tools Tools that enable data users to model specific events, based on what has happened in the past under certain circumstances, to better predict future Earth science phenomena. GRid Analysis and Display System (GrADS): an interactive desktop tool that offers easy access, manipulation, and visualization of earth science data. It offers two data models for handling gridded and station data and supports many data file formats (NetCDF, HDF4, HDF5, etc.). Multi-Instrument Inter-Calibration (MIIC II): A software framework that provides better access to distributed data for inter-calibration by finding and acquiring matched samples for instruments on separate spacecraft. It offers support for both low Earth orbiting (LEO) Geosynchronous (GEO) satellite instruments. The Earth science data ingested, archived, and distributed at NASA data centers does not readily lend itself to prescriptive data analytics, as NASA data centers focus on the stewardship and quality of the data. It is likely that the organizations that use NASA data have applied their own prescriptive analytics tools to determine the best course of action to take based on their findings. Data Conversion Tools: Tools that help to get data in a readable format to be able to describe what happened. Examples include: HDF Tools such as HDUMP, a utility program to read in any gridded MISR data that is written in the HDF-EOS grid format; HDFView for converting images from GIF, JPG, BMP, and PNG to the HDF format and back; HDF to NetCDF; and more. 4 3 Conclusion Though analytics tools exist for Earth science data, very few of those tools are available off-the-shelf. Analytics tools developed in-house often require several modifications before they can be used with data in which they were not specifically designed. For example, subsetters that are created to search for specific parameters are created for specific data sets. The framework, however, can often be applied to other datasets after modifications are made. In the future we are hopeful that there will be more off-the-shelf tools available to more quickly analyze the valuable Earth science data and limit time needed to modify tools. References Types of Analytics :http://www.informationbuilders.es/intl/co.uk/presentations/four_types_of_analytics.pdf Subsetting: https://subset.larc.nasa.gov/ GrADS: http://iges.org/grads/ ODISEES: http://adsabs.harvard.edu/abs/2013AGUFMIN52B..01H HDF Tools: http://www.hdfgroup.org/products/hdf4_tools/toolsbycat.html UV-CDAT: http://uvcdat.llnl.gov/mission.html Acknowledgements A special thanks to Steve Kempler of Goddard Space Flight Center (GSFC). Without his keen interest in Earth Science Data Analytics and boldness in venturing into the newly chartered waters of Earth science data analytics we would not be as far along the path as we are now. Earthdata QR Poster