ARCCSSive Scott Wales & Paola Petrelli ARCCSS CMS

Slides:



Advertisements
Similar presentations
An ESG Walkthrough -ESG Federation website -- DCC File system for ESG Muhammad Atif.
Advertisements

Python for Science Shane Grigsby. What is python? Why python? Interpreted, object oriented language Free and open source Focus is on readability Fast.
Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta Adams IGES/COLA AMS 2013.
Climate Analytics on Global Data Archives Aparna Radhakrishnan 1, Venkatramani Balaji 2 1 DRC/NOAA-GFDL, 2 Princeton University/NOAA-GFDL 2. Use-case 3.
Preparing CMOR for CMIP6 and other WCRP Projects
Python Lab Proteomics Informatics, Spring 2014 Week 1 28 th Jan, 2014 Himanshu Grover
Seminar #3 CM036: Advanced Databases1 Seminar 4: Relational Algebra and its Simulation using SQL Purpose To understand how the relational operations are.
MS Access Database Connection. Database? A database is a program that stores data and records in a structured and queryable format. The tools that are.
The AstroGrid Desktop Suite Release 2008 what it is what you can do short demo (wireless permitting...) NAM2008 Belfast Andy Lawrence April 2008.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
Python Lab Summary and Resources Proteomics Informatics, Spring 2014 Week nd Apr, 2014
Preparing and Deploying Data to ArcPad Juan Luera.
Regional Climate Model Evaluation System based on satellite and other observations for application to CMIP/AR downscaling Peter Lean 1, Jinwon Kim 1,3,
CORDEX A Coordinated Regional Downscaling Experiment A Coordinated Regional Downscaling Experiment Colin Jones & Filippo Giorgi Task Force : Ghassem Asrar,
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
Advanced Tips And Tricks For Power Query
DATA MINING Pandas. Python Data Analysis Library A library for data analysis of (mostly) tabular data Gives capabilities similar to Excel and SQL but.
Useful Python Libraries Ghislain Prince. This presentation - Standard libraries - “batteries included” - 3 rd party libs.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
IDENTITY FINDER TRAINING. What is Identity Finder?  Identity Finder is a program that is installed on your desktop, laptop, or server to locate personally.
Seminar #6 CG096 Advanced Database Technologies1 Advanced Databases Seminar 6: Implementing Relational Algebra Data Model using SQL.
SSMS SQL Server Management System. SQL Server Microsoft SQL Server is a Relational Database Management System (RDBMS) Relational Database Management System.
What was done for AR4. Software developed for ESG was modified for CMIP3 (IPCC AR4) Prerelease ESG version 1.0 Modified data search Advance search Pydap.
NCAR/RAL - National Security Applications Program 1 MetVault Training Keith Searight Tom Margolis ATEC-4DWX Forecaster Training 25 Feb 2009.
UNIVERSITY OF UTAH GREEN INFRASTRUCTURE MONITORING DATABASE CVEEN 7970 Hydroinformatics Semester Project Zachary Magdol, Jai Kanth Panthail, Pratibha Sapkota,
Desktop Database and Climate Analysis Steven Burian and Erfan Goharian Hydroinformatics Fall 2013.
Extending ArcGIS with Python Clinton Dow – Geoprocessing Product Esri.
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2015 Attila Skulteti
How to Get Started With Python
Sea Ice Component Description Number of historical ensemble members
DBMS Programs MS SQL Server & MySQL
Bibliography and reference manager programs (EndNote, Mendeley, Zotero) 2016 Attila Skulteti
Deep Neural Networks: A Hands on Challenge
C++ / G4MICE Course Session 4 Create a complete C++ class
MET4750 Techniques for Earth System Modeling
data on marble: what and where
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
Updates on HPC and Data management at NCI
An Introduction to the IVC Software Framework
CMIP6-DICAD – FU Berlin - Germany
Mark V. Janikas Marjean Pobuda
RCM workshop, Meteo Rwanda, Kigali
DATA MINING Python.
ACCESS Rose-cylc Technical Infrastructure at NCI & BoM
MaterialsHub - A hub for computational materials science and tools.
Object-Orientated Programming
Impact of Climate Change on Water Resources
Final Project: Read from a csv file and write to a database table
Python for Quant Finance
Prepared by Kimberly Sayre and Jinbo Bi
MS Access Database Connection
Notebook Development and Testing
Notebook Development and Testing
PyStormTracker: A Parallel Object-Oriented Cyclone Tracker in Python
DESIGN & IMPLEMENTATION
Introduction to Deep Learning with Keras
Python’s Standard library part I
Option One Install Python via installing Anaconda:
Architecture of the web
Introduction to RDS Datasets
GA 7.1 n216 Scott Wales & Holger Wolff ARCCSS CMS
Programming with Data Lab 7
BI Tools Service – Framework Design Planning Reporting as a service
ArcGIS Editor for OpenStreetMap: Contributing data
EOSDIS Approach to Data Services in the Cloud
DATA MINING Python.
Installations for Course
Mapping packages Unfortunately none come with Anaconda (only geoprocessing is which does lat/long to Cartesian conversions). matplotlib.
Installations for Course
Presentation transcript:

ARCCSSive Scott Wales & Paola Petrelli ARCCSS CMS Aim: Provide improved access to CMIP5 data replicated at NCI Uses: Search downloaded files for files matching constraints Compare against ESGF catalogue for new files Bruce Miller, CSIRO http://www.scienceimage.csiro.au/image/8199 (CC-BY)

Filesystem Limits authoritative `-- IPCC |-- CMIP5 | |-- CSIRO-BOM | |-- CSIRO-QCCCE | `-- UNSW |-- CORDEX | |-- AUS-44 | `-- AUS-44i |-- GeoMIP | `-- UNSW |-- PMIP3_CMIP5 | `-- UNSW `-- PMIP3_PMIP3 `-- UNSW DRSv2_legacy/CMIP5 |-- ACCESS1-0 | |-- 1pctCO2 | |-- abrupt4xCO2 | |-- amip | |-- historical | |-- historicalExt | |-- piControl | |-- rcp45 | `-- rcp85 |-- ACCESS1-3 The climate research community maintains a replica of commonly used CMIP5 files at /g/data/ua6 The area has become disorganised, with several different organisation structures all symlinked to each other drstree/CMIP5 |-- GCM | |-- BCC | | |-- bcc-csm1-1 | | `-- bcc-csm1-1-m | |-- BNU | | `-- BNU-ESM | |-- CAU-GEOMAR | | `-- KCM1-2-2 | |-- CCCMA | | |-- CanAM4 | | |-- CanCM4

Basic Use $ search_replica --experiment historical \ --model ACCESS1-0 --variable tas $ cat search_result.txt historical,tas,day,ACCESS1-0,r1i1p1,v20131108,\ /g/data/ua6/authoritative/IPCC/CMIP5/CSIRO-BOM/ACCESS1-0/historical/day/atmos/day/r1i1p1/v20131108/tas historical,tas,day,ACCESS1-0,r3i1p1,v20140402,\ /g/data/ua6/authoritative/IPCC/CMIP5/CSIRO-BOM/ACCESS1-0/historical/day/atmos/day/r3i1p1/v20140402/tas ARCCSSive is a database cataloguing all the CMIP5 files at NCI, with a Python wrapper to allow use in scripting etc. search_replica - Find files in /g/data1/ua6 compare_ESGF - Find files on ESGF that haven't been downloaded Scripts output CSV format files with model and location/version information $ compare_ESGF --experiment historical \ --model ACCESS1-0 --variable tas $ cat historical.csv model_ensemble/variable,tas_day,tas_Amon,tas_3hr ACCESS1-0_r1i1p1, v20120115 | v20130124 | v20130227 | v20130529 | v20131108 ACCESS1-0_r2i1p1, v20141119 | v20141119 latest new | , v20130726 | v20130726 latest new | ,NP ACCESS1-0_r3i1p1, v20140402 | v20140402 latest new | , v20140402 | v20140402 latest new | ,NP

Advanced Use sqlite3> SELECT instances.* FROM ( SELECT model,experiment,mip,realm, count(*) AS count FROM instances WHERE variable = 'tas' GROUP BY model,experiment,mip,realm) AS inst_group NATURAL JOIN instances WHERE count > 2 AND variable = 'tas' Advanced users are able to query the database either in Python using sqlalchemy or directly using SQL This allows for more complex queries, e.g. what datasets are available that have the variable 'tas' in at least two ensemble members? We don't expect the majority of researchers to use this feature, but it allows us to create tools for specific use cases inst_group = cmip.query(Instance, func.count(Instance.ensemble).label('count')) .filter_by(variable='tas') .group_by(Instance.experiment, Instance.mip, Instance.model, Instance.realm) .subquery() cmip.query(Instance) .join(inst_group, and_(Instance.experiment == grouped.c.experiment, Instance.mip == grouped.c.mip, .filter(grouped.c.count > 2, Instance.variable == 'tas') .order_by(Instance.model, Instance.experiment)[0:5]

hh5 Conda environments basemap beautifulsoup4 cartopy cdat-lite cf_units compliance-checker dask ecmwf_grib f90nml gdal ipython iris jupyter mule numpy pandas pandoc ARCCSSive is available as part of the CMS team's `analysis` Anaconda environment, on both Raijin and the VDI desktops module use /g/data3/hh5/public/modules module load conda/analysis27 or module load conda/analysis3 These environments provide a wide range of Python libraries for working with climate and weather datasets proj4 pytest pydap pyferret pyspharm readline scipy six sqlalchemy sqlite sympy windspharm wrf-python xarray

More Info Training: https://training.nci.org.au (Under Research Data / CMIP) Documentation: https://arccssive.rtfd.org Code: https://github.com/coecms/arccssive DOI: 10.5281/zenodo.844936 Help: climate_help@nci.org.au CSIRO Marine Research http://www.scienceimage.csiro.au/image/2620 (CC-BY)