Cloud Computing for the NASA Atmospheric Sciences Data Center with Amazon Web Services Cloud Computing for the NASA Atmospheric Sciences Data Center with Amazon Web Services Pilot Interface Project Lifecycle Jonathan Gleason and Mike Little NASA Langley Research Center, Hampton, VA The Atmospheric Science Data Center Introduction The Atmospheric Science Data Center (ASDC) is located at NASA Langley Research Center in the Science Directorate. The Science Directorate’s Climate Science Branch, Atmospheric Composition Branch, and Chemistry and Dynamics Branch work with the ASDC to study changes in the Earth and its atmosphere. Data products translate those findings into meaningful knowledge that inspires action by scientists, educators, decision makers, and the public. The ASDC archives and distributes datasets from NASA spaceborne and aircraft-based instruments relating to the study of Earth’s Radiation Budget, Clouds, Aerosols, and Tropospheric Chemistry. The ASDC provides compute services for processing production science data products for the Clouds and Earth’s Radiant Energy System (CERES) and Multi-angle Imaging SpectroRadiometer (MISR) science teams. The ASDC archives include data from the following instruments and missions: CERES Instrument Flight Models 1-5 on board the Terra, Aqua and NPP spacecraft The MISR instrument on the Terra spacecraft The Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and Imaging Infrared Radiometer (IRR) instruments on the Cloud- Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) spacecraft Data from multiple operational weather satellites as produced by the International Satellite Cloud Climatology Project (ISCCP) The Tropospheric Emissions Spectrometer (TES) on the Aura spacecraft The Measurements Of Pollution In The Troposphere (MOPITT) instrument on the Terra spacecraft Aircraft based instrument measurements collected during the DISCOVER/AQ, AirMISR, INTEX-A&B and many other field campaigns The pilot business interface maximizes potential data sharing and minimizes user access constraints by utilizing the AWS public cloud resources instead of restricting users to the Amazon GovCloud (US) Region only. Earth’s Surface Testing Goals Primary services for testing are: Amazon Simple Storage Service (S3) Elastic Compute Cloud (EC2) Simple Queue Service (SQS) Goal 1 – Understand how to effectively create and use AWS tags to accurately track incurred costs. Goal 2 – Be able to pre-calculate cost for a given processing effort and map to actual costs incurred after the fact. Verify AWS actual charges incurred match user understanding of advertised rates. Goal 3 – Execute a single CERES PGE use case scenario on the cloud, retrieve data from cloud and identify all costs incurred. Goal 4 – Understand capabilities and restrictions offered by AWS and the LITES business interface implementation and their applicability to Langley Science Directorate compute scenarios. Goal 5 – Characterize Langley campus network to AWS S3 and reverse data transfer performance and determine feasibility of using a just-in-time data upload approach. The Atmospheric Science Data Center Quick Facts Located within the Science Directorate at NASA Langley Research Center Archives and distributes scientific datasets relating to Earth’s Radiation Budget, Clouds, Aerosols, & Tropospheric Chemistry fields of study Data holdings from spaceborne instruments including CERES, MISR, CALIPSO, ISCCP, SAGE III, MOPITT, TES and aircraft field campaigns including DISCOVER/AQ, AirMISR, INTEX-A&B Process production science data products for the CERES and MISR science teams Distribute over 300 unique science data products In 2012, 590 Terabytes of data were distributed to over 142,000 customers in 160 countries Data holdings exceeded 2.41 Petabytes of data as of December 2012 Over 88 million files (1.874 Terabytes) maintained on high-speed disk for quick access Introduction NASA science and engineering efforts rely heavily on compute and data handling systems. The nature of NASA science data is such that it is not restricted to NASA users, instead it is widely shared across a globally distributed user community including scientists, educators, policy decision makers, and the public. Therefore NASA science computing is a candidate use case for cloud computing where compute resources are outsourced to an external vendor. Amazon Web Services (AWS) is a commercial cloud computing service developed to use excess computing capacity at Amazon, and potentially provides an alternative to costly and potentially underutilized dedicated acquisitions whenever NASA scientists or engineers require additional data processing. AWS desires to provide a simplified avenue for NASA scientists and researchers to share large, complex data sets with external partners and the public. AWS has been extensively used by JPL for a wide range of computing needs and was previously tested on a NASA Agency basis during the Nebula testing program. Its ability to support the Langley Science Directorate needs to be evaluated by integrating it with real world operational needs across NASA and the associated maturity that would come with that. The strengths and weaknesses of this architecture and its ability to support general science and engineering applications has been demonstrated during the previous testing. The Langley Office of the Chief Information Officer in partnership with the Atmospheric Sciences Data Center (ASDC) has established a pilot business interface to utilize AWS cloud computing resources on a organization and project level pay per use model. This poster discusses an effort to evaluate the feasibility of the pilot business interface from a project level perspective by specifically using a processing scenario involving the Clouds and Earth’s Radiant Energy System (CERES) project. CERES Production Processing The ASDC provides production processing compute services for the CERES team in order to produce the CERES data products. Due to the presence of multiple interdependent processing streams, in many cases where output of one stream serves as input for multiple streams, this processing scenario was identified to test the pilot AWS business interface. Production processing utilizes configuration controlled software versions that undergo relatively few software updates compared to interactive science software processing. The pseudo-static nature of this software makes this an ideal scenario for testing and to leverage for identifying a process for configuration control of cloud virtual machine images. CERES is a NASA broadband radiometer with five flight models flying on the Terra and Aqua missions of NASA’s Earth Observing System Terra and Aqua missions as well as the Suomi National Polar-orbiting Partnership (NPP) platform. The CERES science team integrates data collected by the five CERES instruments with nine additional space based instruments, including the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Visible, Infrared Imager Radiometer Suite (VIIRS-) instruments, to produce Climate Data Records of the Earth’s radiation budget. NASA Langley Research Center, Hampton, VirginiaFirst Light incoming solar radiation from CERES on Suomi NPP