A User’s Perspective on Acquisition and Management of CMIP5 Data

Slides:



Advertisements
Similar presentations
SolidWorks Enterprise PDM Data Loading Strategies
Advertisements

Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013.
Atlas III Improvements Expands on Atlas II capabilities – Faceted Navigation – counts are displayed next to selectable attribute – Lunar Map interface.
Brian Doty and Jennifer Adams
Preparing CMOR for CMIP6 and other WCRP Projects
CMIP5 Download Tutorial Jennifer M. Adams 12 January 2012 /data/cmip5/extras/CMIP5_Tutorial.pptx.
A Very Brief Introduction to iRODS
Versioning Extensions for Linux CS736 Spring 1999 J. Adam Butts Paramjit Oberoi.
File System Interface CSCI 444/544 Operating Systems Fall 2008.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
An Update on GrADS and the GDS and their Application to a Searchable Metadata Catalog Jennifer Miletta Adams IGES/COLA.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 10: File-System Interface.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
The Cornell Veterinarian A Metadata Perspective.
January, 23, 2006 Ilkay Altintas
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
WEB API: WHY THEY MATTER ECOL 453/ Nirav Merchant
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
A User’s Perspective on Acquisition and Management of CMIP5 Data Jennifer Miletta Adams George Mason University / COLA ESGF2F, December 2014.
A/WWW Enterprises 28 Sept 1995 AstroBrowse: Survey of Current Technology A. Warnock A/WWW Enterprises
Improved Access to RDA from the MSS OSD Executive Meeting April 28, 2009.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
1 Earth System Modeling Framework Documenting and comparing models using Earth System Curator Sylvia Murphy: Julien Chastang:
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
Data Abstraction and Time-Series Data CS 4390/5390 Data Visualization Shirley Moore, Instructor September 15,
Product-Generation in ESG: some explorations of the user experience Steve Hankin – March, 2007.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
Metadata Content Entering Metadata Information. Discovery vs. Access vs. Understanding Cannot search on content if it is not documented. Cannot access.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
Enabling the Transition of CPC Products to GIS Format Brian Doty Jennifer Adams Michael Halpert Viviane Silva.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
W4118 Operating Systems Instructor: Junfeng Yang.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Stavroula Balopoulou , Angelo Lykiardopoulos, Sissy Iona HCMR-HNODC
Making FAAM Flights Discoverable
CHaRy Software Synthesis for Hard Real-Time Systems
Data quality & VALIDATION
What’s New in GridPro v6.6.
Status Report of EDI on the CAA
Data-Basics Training & Support
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Flanders Marine Institute (VLIZ)
MATLAB Basics Nafees Ahmed Asstt. Professor, EE Deptt DIT, DehraDun.
Lawrence Livermore National Laboratory
Steering Group Member, Link Digital
TIGGE Data Archive and Access System at NCAR
Final Project – Anomalies Detection
Chapter 11: File-System Interface
Updating GML datasets S-100 WG TSM September 2017
A Web-Based Tool for Gathering Ordinal Rankings
National Center for Atmospheric Research
TRAINING OF FOCAL POINTS on the CountrySTAT SYSTEM based on FENIX
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
Task 5 : Supporting CCI Contributions to Obs4MIPs
Code Analysis, Repository and Modelling for e-Neuroscience
Laura Bright David Maier Portland State University
Publishing data and metdata From iRODS to repositories
CMIP6 use case and adoption of RDA outputs
Overview of Workflows: Why Use Them?
Code Analysis, Repository and Modelling for e-Neuroscience
Lecture 4: File-System Interface
Digital Object Management for ENES: Challenges and Opportunities
ECMWF usage, governance and perspectives
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

A User’s Perspective on Acquisition and Management of CMIP5 Data Jennifer Miletta Adams George Mason University / COLA ESGF2F, December 2014 [Abstract] The complexity, volume, and distributed nature of the CMIP5 data collection has left many users struggling to acquire the CMIP5 data they need. This presentation outlines strategies that were developed to overcome the challenges CMIP5 data users face: authentication, searching for published data that match a list of desired experiments and variables, acquisition of wget scripts, managing wget script execution and the high wget failure rate, retention of critical metadata not present in the data files, version control, local data management, and setting up the data for analysis and visualization using GrADS. All these strategies exist in an automated workflow that is completely independent of any browser interface. 

COLA’s CMIP5 Data Collection Automated record keeping of acquired datasets began in February 2012. Periodic checks on disk volume highlighted in red.

COLA’s CMIP5 Data Collection Reconstructed history of CMIP5 acquisition

Workflow Requirements No , , , , et al. Script-Based Flexible Automated Runs in a UNIX environment

Workflow Elements Create list of desired data: ”All available models and ensembles for a subset of experiments, realms, frequencies, and variables” Keep track of what has already been acquired Identify what data are available Get needed data Make data user-friendly This workflow was designed during a time when new CMIP5 data sets were constantly being published – the available data was always expanding.

Programmatic View of Workflow while(1) { list(acquired); for(desired) { search(available); for(available) { if(!acquired) needed; } download(needed);

Keep Track of Acquired Data 11 keywords are required: cmip5 /data /Experiment /Realm /Frequency /MIP-Table /Variable /Institute.Model /Ensemble /Version /datafiles.nc Workflow Element #2 Local data management structure 11 keywords uniquely identify each data set (keyword product ‘output1/output2’ is not retained). Not all keywords are present in the data file name, or in the data file attributes, so this metadata is retained in the directory path. Version control: the version number is not present in a CMIP5 data file, but it is available during the wget script acquisition process, so it must be deliberately retained in local directory structure.

Discovery of Available Data Build a Dataset search URL: http://pcmdi9.llnl.gov/esg-search/search?type=Dataset &latest=true &replica=false &facets=id &limit=0 &project=CMIP5 &experiment=piControl &realm=atmos &time_frequency=mon &cmor_table=Amon &variable=clt&variable=hfls….&variable=vas Workflow Element #3 Using keywords from the ‘desired’ text file (format not shown here, keywords highlighted in pink), build a URL to search for datasets that match the experiment description. The results of this search are compared to the list of what has been acquired to determine what needs to be downloaded.

Download Needed Data Build a file search URL to determine number of files for each data set Build a wget URL to download wget scripts; then give them unique names Keep authentication certificates up-to-date Monitor execution of wget scripts in a staging area Put files in place under local directory structure Workflow Element #4 Downloading needed data has its own sub-workflow.

Make Data User-Friendly Create GrADS descriptor files Aggregate files over time dimension Make use of ensemble dimension when appropriate Identify missing or overlapping time periods Assign non-standard dimensions (e.g. basin averages) Handle 365-day calendars Interpolate data on non-rectilinear grids For ocean and sea ice realms ESMF’s RegridWeightGen generates the interpolation weights Rotate vector fields from grid-relative to Earth-relative coordinates before interpolation Workflow Element #5 After the data files have been collected and placed in position, there is still more work to do to make them easy to use with GrADS.

Complications Solutions Version number not with data Retained during wget script acquisition 1000 File limit per wget script Please minimize file granularity! User authentication Automated with MyProxyClient Errors from wget Never mind why, just keep trying. Failure is an option. Some data nodes are friendlier than others Data node blacklist Missing or overlapping data DO NOT hide missing data with a non-linear time axis! Rotation of grid-relative vectors Please publish gridspec files! Data on wacky grids ESMF’s RegridWeightGen Special thanks to: Luca Cinquini, Estani Gonzalez, Gavin Bell, Lawson Hanson, and the CMIP5 Helpdesk!