User-Friendly Website for Accessing NOAA/ESRL Chemical Sciences Division (CSD) Tropospheric Airborne Data Archive Ken Aikin - NOAA CSD and CIRES, University of Colorado J. Cozic, photo ESIP Meeting: Copper Mountain, July, 2014
Outline I.Motivation: To make the Tropospheric Chemistry Group data archive accessible to the scientific community. II.Data formats used in the archive III.Features of the website that facilitate ease of access to data IV.Making the data more widely available V.Future development
Tropospheric Chemistry Data Archive The Tropospheric Chemistry Group has conducted intensive field missions using mobile- and ground-based platforms for over 20 years to study air pollution, air quality, and climate change. The data constitutes a unique and rich set of measurements. Instruments on the aircraft make in-situ measurements of a comprehensive suite of parameters including: meteorological parameters trace chemical species aerosol size and composition
NOAA P-3 Data Coverage
Data Archive Properties The challenge is how to conveniently allow users to access the data acquired during field missions. Each field mission may have 20 or more flights, and there are 20 or more instruments on the aircraft. Each instrument produces its own data file, sometimes containing many species. Instruments produce different time resolutions, mainly 1Hz. Important exception: flask samples measure dozens of species with their own start/stop times. Data is submitted to me and is incorporated into the data archive. ★ Data is already time-aligned to “master” time-base.
File formats ICARTT: Adopted by NOAA, NASA, NCAR File header includes metadata: data revision, measurement uncertainties, specific issues for a particular flight, information on the instrument making the measurement Text format is easy to read and load into analysis software Flexible format for different types of data Online file format-checkers and format information available online at NOAA and NASA websites. Igor: Extra convenience to users of Igor Pro software Still contains full metadata in header Data in the archive is stored in ICARTT format and in Igor binary format
ICARTT File Format ProsCons Metadata and the actual data are in the same file, so they can’t get separated Some header parameters are not entered in consistent, machine- readable format Easy data revision trackingSome PIs don’t fill in all the header parameters Easy format to readNot as widely used as NetCDF Standard format adopted by NOAA, NASA, and NCAR aircraft missions Standard model for one independent variable (time) doesn’t quite fit when instrument uses start/stop time
Data Products Available on Website Individual files for each instrument for a flight All files for one instrument for all flights Merged files containing all measurements for one flight all in one file Merged files averaged to flask-sample start/stop times KML files viewable in Google Earth Static time-series plots for “quick-look” data Map of flight tracks, model-emissions data, satellite images, point sources of emissions Files that can be downloaded or viewed online:
Data ID listed for each instrument for a flight
Highlights of Website Immediate download of subset of files (chemical, aerosol, etc., or all files) all at once with one click of an icon. (Files for downloading are zipped dynamically on the web server.) Automatic display of current data revision on data-download page Data policy that tries to balance responsibility of users with ease of access Consistent file names and variable names across missions At-a-glance assessment of data availability for a flight “Under the hood”: PHP allows automation of web pages User-friendly features of website that have generated positive feedback:
Users of Our Data Field mission participants Other atmospheric scientists Modelers (CCMI) CCMI Goals: Evaluate global CCM simulations using integrated in-situ observations from research aircraft. Improve online access to observational data sets that have been evaluated for quality and formatted for consistency to facilitate model-data intercomparison studies. Chemistry-Climate Model Initiative Expanding the user-base of the data archive
Future Development Still need to work on “discoverability” Submit to more data centers. The data archive is currently in NASA’s GCMD. Implement data DOI Offer data in NetCDF format? Develop online data visualization tools? Before beginning work on the tools, we need to determine whether there is an audience for this.
Summary I.The Tropospheric Chemistry Group has a rich data archive composed of files from mobile- and ground-based platforms. Tools are needed to access and serve this data archive to the scientific community. I.We developed the website with particular features that facilitate ease of access to data that users appreciate. Those features include: Time-aligned data Ability to download subsets of files all at once Automatic display of current data revision in data download page Data policy that balances responsibility of users with ease of access Consistent file names and variable names across missions
41, 1001 Ken Aikin NOAA ESRL Chemical Sciences Division various aircraft sensors SENEX , , 06, 18, 2013, 11, 22 1 AOCTimewave, seconds from midnight 8 1, 1, 1, 1, 1, 1, 1, , -9999, -9999, -9999, -9999, -9999, -9999, Attack, degrees CabinPrs, mb GndSpd, m/s Heading, degrees Pitch, degrees Roll, degrees Slip, degrees TrueAirSpd, m/s 1 SPECIAL COMMENTS: Aborted St. Louis flight. Flaps light came on. Data gaps: none 18 PI_CONTACT_INFO: NOAA ESRL Chemical Sciences Division, 325 Broadway, Boulder, CO PLATFORM: NOAA WP-3D ICARTT File Example p. 1
LOCATION: The aircraft was stationed in Smyrna, Tennessee. Aircraft location... ASSOCIATED_DATA: See Aircraft Position file INSTRUMENT_INFO: Various instruments run by AOC DATA_INFO: Units: Attack: degrees; CabinPrs: millibar; Ground Speed (GndSpd): m/s; UNCERTAINTY: Attack: ±0.2 deg; CabinPrs: N/A; GndSpd: ±3.4m/s; Heading: ±0.5 deg; ULOD_FLAG: ULOD_VALUE: N/A LLOD_FLAG: LLOD_VALUE: N/A DM_CONTACT_INFO: Data Manager:Ken Aikin; PROJECT_INFO: SENEX 2013 Study; June 3 - July 10, 2013 STIPULATIONS_ON_USE: Use of these data require prior OK from PI. OTHER_COMMENTS: None REVISION: 0 R0: Final Data. AOCTimewave, Attack, CabinPrs, GndSpd, Heading, Pitch, Roll, Slip, TrueAirSpd , 1.9, 995.8, 68.8, 320.8, 0.90, -0.78, 0.7, , 4.2, 996.7, 69.9, 320.8, 3.86, -0.49, 1.8, , 6.4, 997.2, 70.7, 321.2, 6.62, -0.38, -0.5, , 6.2, 997.3, 71.1, 322.2, 7.33, -0.08, -1.7, , 6.1, 998.5, 71.4, 323.2, 7.84, 0.44, -2.9, 71.2 ICARTT File Example p. 2
Submitting Data to the NOAA Data Centers The Challenges: Non-uniform data (in time and location) may not fit into the NOAA Data Center model. Different studies require different instrumentation. Need a mechanism for revising files. Need to determine whether to submit data for entire archive or by project. Does the data fit with an existing NOAA Data Center? It’s unclear whether data listed with a NASA Data Center would qualify for a data DOI from NOAA. There may be issues with maintenance of links and common look-and-feel of web pages, etc. Data centers greatly increase the visibility of data sets, and registration is required to obtain a data DOI.
Website construction Faceted search page available to data Slight technical digression: PHP-driven website allows automation and is low-maintenance PCL-zip (PHP-based) library dynamically zips requested data for immediate download