Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013.

Slides:



Advertisements
Similar presentations
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Advertisements

QCDgrid User Interfaces James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Pakiti.
John Kewley CCLRC Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Accessing the NW-GRID (from Linux) John Kewley Grid Technology Group E-Science.
An ESG Walkthrough -ESG Federation website -- DCC File system for ESG Muhammad Atif.
Shell Script Assignment 1.
Building CCSM2.0 Brian Kauffman CCSM Software Engineering Group
Program Management Portal: Overview for the Client
1 Sylvia Murphy National Center for Atmospheric Research File Handling with CMPS.
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
Working with pig Cloud computing lecture. Purpose  Get familiar with the pig environment  Advanced features  Walk though some examples.
Atlas III Improvements Expands on Atlas II capabilities – Faceted Navigation – counts are displayed next to selectable attribute – Lunar Map interface.
Preparing CMOR for CMIP6 and other WCRP Projects
CMIP5 Download Tutorial Jennifer M. Adams 12 January 2012 /data/cmip5/extras/CMIP5_Tutorial.pptx.
Linux+ Guide to Linux Certification, Second Edition
Now, return to the Unix Unix shells: Subshells--- Variable---1. Local 2. Environmental.
Very Quick & Basic Unix Steven Newhouse Unix is user-friendly. It's just very selective about who its friends are.
Search on Journal of Dairy Science ® An Overview April
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1. This presentation covers :  User Interface Administration  Files System and Services Management 2.
Advanced File Processing
FireRMS SQL Audit, Archiving & Purging Presented by Laura Small FireRMS Quality Assurance.
Using IC KY Data Extract Utility and VB Scripts to automate student account management. Time: 1:15-2:15 Session #4 Date: 3/8/2012 Session Room: Beckham.
8 Shell Programming Mauro Jaskelioff. Introduction Environment variables –How to use and assign them –Your PATH variable Introduction to shell programming.
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Datasets for evaluating climate models and their projections: Obs4MIPs Robert Ferraro Jet Propulsion Laboratory Presented at the CCI CMUG Fourth Integration.
Ts_print in a few easy steps There are four screens: Entities, Data Items, Date, and Report Format.
WEB API: WHY THEY MATTER ECOL 453/ Nirav Merchant
Eurotrace Hands-On The Eurotrace File System. 2 The Eurotrace file system Under MS ACCESS EUROTRACE generates several different files when you create.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Tools Menu and Other Concepts Alerts Event Log SLA Management Search Address Space Search Syslog Download NetIIS Standalone Application.
Index Building Overview Database tables Building flow (logical) Sequential Drawbacks Parallel processing Recovery Helpful rules.
Session 2 Wharton Summer Tech Camp Basic Unix. Agenda Cover basic UNIX commands and useful functions.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Linux+ Guide to Linux Certification, Third Edition
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
CEN 5070 – Software V&V Automation for Program Testing © , E.L. Jones.
A User’s Perspective on Acquisition and Management of CMIP5 Data Jennifer Miletta Adams George Mason University / COLA ESGF2F, December 2014.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Leveraging Globus Services to Support Climate Model Data Access Through the Earth System Grid Federation (ESGF) Brian Knosp 1, Luca Cinquini 1, Lukasz.
1 Earth System Modeling Framework Documenting and comparing models using Earth System Curator Sylvia Murphy: Julien Chastang:
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Searching and Sorting. Why Use Data Files? There are many cases where the input to the program may come from a data file.Using data files in your programs.
Λειτουργικά Συστήματα - Lab1 Γιάννης Πετράκης. The Operating System  Unix is a layered operating system  The innermost layer is the hardware that provides.
Executable scripts. So far We have made scripts echo hello #for example And called it hello.sh Run it as sh hello.sh This only works from current directory?
Agenda Basic Unix Commands (Chapters 2 & 3) Miscellaneous Commands: which, passwd, date, ps / kill Working with Files: file, touch, cat, more, less, grep,
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
TIBCO BusinessWorks.  Generating the key   You will have to create a certificate as you own the server.  The ‘Keytool’ is a utility provided in the.
Product-Generation in ESG: some explorations of the user experience Steve Hankin – March, 2007.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
1 Overall Architectural Design of the Earth System Grid.
Using the ICAT API to ingest business and experiment metadata Tom Griffin, STFC ISIS Facility NOBUGS 2012 ICAT Workshop
Linux+ Guide to Linux Certification, Second Edition
The status of data server for MICS Asia project Qizhong Wu, Zifa Wang, Zhe Wang, et al th international Workshop on Atmospheric Modeling.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GGUS Report Generator Günter Grein, KIT Helmut Dres, KIT Torsten Antoni,
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED ADMINISTRATION.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
British Library Document Supply Service (BLDSS) API
data on marble: what and where
A User’s Perspective on Acquisition and Management of CMIP5 Data
David Adams Brookhaven National Laboratory September 28, 2006
Visualizing Intermodel Comparison
Shell Script Assignment 1.
Guide To UNIX Using Linux Third Edition
National Center for Atmospheric Research
Task 5 : Supporting CCI Contributions to Obs4MIPs
ARCCSSive Scott Wales & Paola Petrelli ARCCSS CMS
PubMed Database Interface (Basic Course: Module 4)
Presentation transcript:

Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013

Workflow Requirements No,,,, et al. Script-Based Flexible Automated Runs in a UNIX environment

Workflow Elements 1. Create list of desired data: all available models and ensembles for a subset of experiments, realms, frequencies, and variables 2. Keep track of what has already been acquired 3. Identify what is available 4. Get needed data 5. Make data user-friendly

Specification of Desired Data piControl/atmos/mon/Amon/ clt hfls hfss hurs pr prsn prw ps psl rlds rlus rlut rlutcs rsdt rsut rsutcs tas uas vas piControl/atmos/day/day/ clt hfls hfss huss pr prsn psl rlds rlus rlut tas uas vas piControl/ocean/mon/Omon/ msftmyz psu rhopoto so thetao tos uo vo piControl/ocean/day/day/ tos piControl/land/mon/Lmon/ evspsblsoi evspsblveg lai mrro mrso mrsos tran piControl/land/day/day/ mrsos piControl/landIce/mon/LImon/ snc snw piControl/seaIce/mon/OImon/ sic sit historical/atmos/mon/Amon/ clt hfls hfss hurs pr prsn prw ps psl rlds rlus rlut rlutcs rsdt rsut rsutcs tas uas vas historical/atmos/day/day/ clt hfls hfss huss pr prsn psl rlds rlus rlut tas ua uas va vas historical/atmos/fx/fx/ sftlf historical/ocean/mon/Omon/ msftmyz psu tauuo tauvo tos historical/ocean/day/day/ tos historical/land/mon/Lmon/ cropFrac evspsblsoi evspsblveg lai mrro mrso mrsos tran historical/land/day/day/ mrsos historical/landIce/mon/LImon/ snc snw historical/seaIce/mon/OImon/ sic sit transix transiy

Keep Track of Acquired Data I Local CMIP5 data files are stored under the following directory structure (10 keywords): /cmip5 /data /Experiment /Realm /Frequency /MIP-Table /Variable /Institute.Model /Ensemble /Version /datafiles.nc

Keep Track of Acquired Data II Use the “find” command to create a master list of all subdirectory names under /cmip5/data Updated daily, this master list shows all acquired data residing on local disk Users can sort or filter this list in order to discover if data they need has been acquired

Discovery of Available Data I Build a Dataset Search URL: &latest=true &replica=false &facets=id &limit=0 &project=CMIP5 &experiment=piControl &realm=atmos &time_frequency=mon &cmor_table=Amon &variable=clt&variable=hfls….&variable=vas

Discovery of Available Data II Capture dataset search results into a text file called “tmp” using wget : wget –O tmp “$URL” Remove debris text from “tmp” to extract relevant information: grep 'name=\"cmip5.’ tmp > tmp2 sed s/' tmp3 sed s/'\">1 '//g tmp3 > tmp4 sed s/'\">2 '//g tmp4 > result

Discovery of Available Data III The result text file contains a list of dataset IDs and data nodes for all available data that match my search criteria: cmip5.output1.BCC.bcc-csm1-1-m.piControl.mon.atmos.Amon.r1i1p1.v |bcccsm.cma.gov.cn cmip5.output1.BCC.bcc-csm1-1.piControl.mon.atmos.Amon.r1i1p1.v1|bcccsm.cma.gov.cn cmip5.output1.BNU.BNU-ESM.piControl.mon.atmos.Amon.r1i1p1.v |esg.bnu.edu.cn cmip5.output1.CCCma.CanESM2.piControl.mon.atmos.Amon.r1i1p1.v |dapp2p.cccma.ec.gc.ca cmip5.output1.CMCC.CMCC-CM.piControl.mon.atmos.Amon.r1i1p1.v |adm07.cmcc.it etc. This list of what is available is compared to the master list of what has been acquired to determine what is needed

Get Needed Data a)Determine number of files for each data set b)Download wget script, give it a unique name c)Keep authentication certificates up-to-date d)Execute wget script e)Put files in proper place under /cmip5/data/

Determine Number of Files Build a File Search URL: &dataset_id= cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu &variable=clt&variable=hfls….&variable=vas Extract number of files from result : wget –q tmp “$URL” –O - | grep numFound

Download WGET Script I Build a wget URL: &dataset_id= cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu &limit=1000 &variable=clt&variable=hfls….&variable=vas If number of files > 1000: You need separate URLs: Append “&offset=1000” to 1 st URL to get 2 nd group of files Append “&offset=2000” to 1 st URL to get 3 rd group of files

Download WGET Script II Build a meaningful name for wget script: wget.cmip5.output1.NCAR.CCSM.rcp85.mon.atmos.Amon.r1i1p1.v1.sh Use wget to download the wget script : wget –q –O wgetname.sh “$URL”

User Access and Authentication Register with ESGF and get an OpenID and password e.g. Enroll in appropriate group (e.g. CMIP5 research) Obtain or renew certificates for user authentication Use the python utility “MyProxyClient” to renew certificates automatically: #!/bin/bash export X509_CERT_DIR=$HOME/.esg/certificates export X509_USER_PROXY=$HOME/.esg/credentials.pem < /homes/jma/pass /usr/local/bin/myproxyclient \ logon -s pcmdi9.llnl.gov -o $X509_USER_PROXY \ -p 7512 –T -l jennifer -S

Execute WGET Script Run wget script, capture all output in a log file: wgetname.sh -v > wget.log 2>&1 & A wget may fail for any number of reasons:  Data node down  Data node throttling number of simultaneous wgets  File not found  Checksum failure  Certificate expired, or authorization failed  Connection timeout  Forbidden If at first you don’t succeed, try, try, try again Failure is an option

Make Data User-Friendly Create GrADS descriptor files Aggregate files over time dimension Make use of ensemble dimension when appropriate Identify missing or overlapping time periods Assign non-standard dimensions (e.g. basin averages or fixed fields) Handle 365_day calendars Create PDEF files for non-rectilinear grids For ocean and sea ice realms ESMF’s RegridWeightGen utility generates the interpolation weights Vector fields must be rotated from grid-relative to Earth-relative coordinates before interpolation

Get Additional Information About CMIP5: About ESGF: About this presentation: