Exascale Climate Data ANalysis From the Inside Out Frédéric Laliberté Paul Kushner University of Toronto ExArch WP3.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Database Architectures and the Web
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
1 Using ANN to solve the Navier Stoke Equations Motivation: Solving the complete Navier Stokes equations using direct numerical simulation is computationally.
The Mean Meridional Circulation: A New Potential-Vorticity, Potential- Temperature Perspective Cristiana Stan Colorado State University March 11, 2005.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Chapter 9 Vertical Motion. (1) Divergence in two and three dimensions. The del “or gradient” operator is a mathematical operation performed on something.
Chapter 8 Coordinate Systems.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
The CrossGrid project Juha Alatalo Timo Koivusalo.
Kashif Jalal CA-240 (072) Web Development Using ASP.NET CA – 240 Kashif Jalal Welcome to week – 2 of…
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 18-1 Accounting Information Systems 9 th Edition Marshall.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.
Dynamics Energetics of General Circulation In the previous lecture we discussed the characteristics of the general circulation of the atmosphere In today’s.
CHAPTER 8 APPROXIMATE SOLUTIONS THE INTEGRAL METHOD
Java web development Servlet & Java server pages.
Application Domain The Energy Problem: Growing world demand and diminishing supply –Efficient, large scale (> 1MW) power production is a necessity –Environmentally.
Numerical Grid Computations with the OPeNDAP Back End Server (BES)
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Lecture On Database Analysis and Design By- Jesmin Akhter Lecturer, IIT, Jahangirnagar University.
Development of WRF-CMAQ Interface Processor (WCIP)
Javier Junquera Molecular dynamics in the microcanonical (NVE) ensemble: the Verlet algorithm.
A Global Agriculture Information System Zhong Liu 1,4, W. Teng 2,4, S. Kempler 4, H. Rui 3,4, G. Leptoukh 3 and E. Ocampo 3,4 1 George Mason University,
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P. Kushner,
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.
The european ITM Task Force data structure F. Imbeaux.
Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Math/Oceanography Research Problems Juan M. Restrepo Mathematics Department Physics Department University of Arizona.
Modern Era Retrospective-analysis for Research and Applications: Introduction to NASA’s Modern Era Retrospective-analysis for Research and Applications:
Serge Andrianov Theory of Symplectic Formalism for Spin-Orbit Tracking Institute for Nuclear Physics Forschungszentrum Juelich Saint-Petersburg State University,
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Regional Climate Model Evaluation System based on satellite and other observations for application to CMIP/AR downscaling Peter Lean 1, Jinwon Kim 1,3,
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
Managing Potential Pollutants from Livestock Farms: An Economics Perspective Kelly Zering North Carolina State University.
Product-Generation in ESG: some explorations of the user experience Steve Hankin – March, 2007.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
SCD Research Data Archives; Availability Through the CDP About 500 distinct datasets, 12 TB Diverse in type, size, and format Serving 900 different investigators.
Download class materials onto your desktop… as usual.
Air pollutants, such as aerosols and various trace gases, are transported on a hemispheric or global scale. The Task Force on Hemispheric Transport of.
IVOA Interop, Beijing, China, May IVOA Data Access Layer Working Group Sessions Doug Tody (NRAO/NVO ) Markus Dolensky (ESO/EuroVO) Data Access Layer.
Atms 4320 / 7320 lab 8 The Divergence Equation and Computing Divergence using large data sets.
What is a Computer An electronic, digital device that stores and processes information. A machine that accepts input, processes it according to specified.
EGEE is a project funded by the European Union under contract IST Generic Applications Requirements Roberto Barbera NA4 Generic Applications.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Downscaling and Why Local Predictions are Difficult to Make Preparing Your Coast.
Modelling & Simulation of Semiconductor Devices Lecture 1 & 2 Introduction to Modelling & Simulation.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Applied Operating System Concepts
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
The Client/Server Database Environment
Simulation use cases for T2 in ALICE
University of Technology
National Center for Atmospheric Research
Huanyuan(Wayne) Sheng
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
Presentation transcript:

Exascale Climate Data ANalysis From the Inside Out Frédéric Laliberté Paul Kushner University of Toronto ExArch WP3

Exascale Climate Data ANalysis From the Inside Out Frédéric Laliberté Paul Kushner University of Toronto ExArch WP3 A USER PERSPECTIVE

ExArch Work Package 3 Quality Assurance and Climate Science Diagnostics with CMIP5/CORDEX data archive. U of Toronto, DKRZ and UCLA. Develops a library of climate diagnostics parallel to the development of the ExArch query system.

Challenges Diagnostics using high-frequency data require large downloads, using up bandwidth and demanding vast storage capabilities Diagnostics used in intercomparison Studies increasingly require numerically consistent data Data converted from the model’s hybrid grid to pressure grid can lead to numerical errors with some diagnostics (e.g. derivatives) These errors degrade the model’s perceived performance and obfuscates explicit conservation laws For Users For Modeling groups

User PERSPECTIVE

ExArch WP3 - U of T Will try to answer the users requirements by benchmarking a series of advanced climate diagnostic using a simple server-side processing framework. Expect to better monitor the role that can be played by OPeNDAP in conjunction with CDOs. Will prepare advanced climate diagnostics scripts to be ready for to monitor the performance of the tools developed by ExArch associated teams.

Server-Side Processing: An Ideal Case An Ideal Case User requests data using a size-reducing query Server evaluates the transfer time Server evaluates the computational time for each size-reducing operations Server balances the load between bandwidth and cpu time: performs some size-reducing operations remotely User receives the intermediary result and completes the task

Server-Side Processing: The reality The reality User requests data using a size- reducing query Query requests load balancing estimates from the user Server estimates cpu of server-side operations and generates a local queue User receives the intermediary result and completes the task If cpu queue is too long

Case Study: Isentropic coordinates For many diagnostics, like the overturning circulation, it is important to compute zonal and temporal averages along isentropic surfaces:Grid Size / year at N80 Input Output A reduction in size by a factor per year!

Isentropic coordinates Because zonal and temporal averaging are cheap, the server should compute them: (1) Convert the meridional velocity and pressure to isentropic coordinates (2) Compute the isentropic layer mass (3) Compute the zonal and temporal average of the product For the user, this process is more transparent and does not require the handling of vast amounts of data. For the server, bandwidth usage is greatly reduced.

Case Study: Joint Distribution The mass flux joint distribution can be written as: It computes the distribution of equivalent potential temperature on isentropic surfacesGrid Size / year at N80 Input Output A reduction in size by a factor per year!

Joint distribution at 35N for DJF The Joint Distribution can be used to determine Lagrangian trajectories of moist flows Average equiv. pot. temp in directional components

Other Diagnostics Many other diagnostics are based on EOFs (e.g. NAO) and thus require long time-series that are reduced into a few principal components. Tropical diagnostics of intraseasonal variability relies on the analysis of space-time spectra (Wheeler and Kiladis, 1999). Newer diagnostics (Dias 2010, thesis) use average over an ITCZ-following latitudinal region. Both methods require time series over large area and have huge data burdens to produce simplified diagnostics.

Modeling Groups Perspective

Functional Data Structure For theoretical studies of model outputs, it is convenient for the user to consider the data as ‘functions’: The data files specifies how mathematical operators should combine its variables to be consistent with the numerics. Take, for example, the computation of the Moist Static Energy (MSE): Depend on moisture parameterizations and are thus model-dependent The MSE flux divergence is an important quantity since it is constrained by evaporation: This integral will be accurate only if the divergence mimics the conservation law

Conclusion With server-side processing, modeling groups would make the development of advanced diagnostics easier and their computation more timely. Providing derived data computed from a native grid would reduce numerical errors and improve model intercomparison. Modeling groups would make sure that their model performs in the way intended, reducing model bias due to different grid specification. This is particularly important for CORDEX experiments. It is hoped that these advantages are worth the extra programming overhead for the modeling groups.