Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Grid Communication Simulator Boro Jakimovski Marjan Gusev Institute of Informatics Faculty of Natural Sciences and Mathematics University of Sts. Cyril.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
1 Complexity of Network Synchronization Raeda Naamnieh.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Reference: Message Passing Fundamentals.
{kajny, GridModelica: Modeling and Simulating on the Grid Håkan Mattsson, Christoph W. Kessler, Kaj Nyström, Peter Fritzson Programming.
12a.1 Introduction to Parallel Computing UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Study of a Paper about Genetic Algorithm For CS8995 Parallel Programming Yanhua Li.
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
School of Computer Science and Software Engineering A Networked Virtual Environment Communications Model using Priority Updating Monash University Yang-Wai.
Even faster point set pattern matching in 3-d Niagara University and SUNY - Buffalo Laurence Boxer Research partially supported by a.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)
Current Status of the Development of the Local Ensemble Transform Kalman Filter at UMD Istvan Szunyogh representing the UMD “Chaos-Weather” Group Ensemble.
1 Simulation Methodology H Plan: –Introduce basics of simulation modeling –Define terminology and methods used –Introduce simulation paradigms u Time-driven.
Lecture II-2: Probability Review
Classification: Internal Status: Draft Using the EnKF for combined state and parameter estimation Geir Evensen.
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
Grid for Coupled Ensemble Prediction (GCEP) Keith Haines, William Connolley, Rowan Sutton, Alan Iwi University of Reading, British Antarctic Survey, CCLRC.
Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG.
Load Balancing in Distributed Computing Systems Using Fuzzy Expert Systems Author Dept. Comput. Eng., Alexandria Inst. of Technol. Content Type Conferences.
Neural and Evolutionary Computing - Lecture 10 1 Parallel and Distributed Models in Evolutionary Computing  Motivation  Parallelization models  Distributed.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
Gridding Daily Climate Variables for use in ENSEMBLES Malcolm Haylock, Climatic Research Unit Nynke Hofstra, Mark New, Phil Jones.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
S AN D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE On pearls and perils of hybrid OpenMP/MPI programming.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Genetic Programming on General Purpose Graphics Processing Units (GPGPGPU) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
+ Simulation Design. + Types event-advance and unit-time advance. Both these designs are event-based but utilize different ways of advancing the time.
1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology,
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Blue Brain Project Carlos Osuna, Carlos Aguado, Fabien Delalondre.
1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.
Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition Thilo Stadelmann, Bernd Freisleben, Ralph Ewerth University of Marburg,
Chapter 2 Fundamental Simulation Concepts
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
Local Predictability of the Performance of an Ensemble Forecast System Liz Satterfield and Istvan Szunyogh Texas A&M University, College Station, TX Third.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.
CS 471 Final Project 2d Advection/Wave Equation Using Fourier Methods December 10, 2003 Jose L. Rodriguez
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
École Doctorale des Sciences de l'Environnement d’Île-de-France Année Universitaire Modélisation Numérique de l’Écoulement Atmosphérique et Assimilation.
École Doctorale des Sciences de l'Environnement d’ Î le-de-France Année Modélisation Numérique de l’Écoulement Atmosphérique et Assimilation.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
École Doctorale des Sciences de l'Environnement d’Île-de-France Année Universitaire Modélisation Numérique de l’Écoulement Atmosphérique et Assimilation.
Demonstration and Comparison of Sequential Approaches for Altimeter Data Assimilation in HYCOM A. Srinivasan, E. P. Chassignet, O. M. Smedstad, C. Thacker,
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Summary of “Efficient Deep Learning for Stereo Matching”
“Consolidation of the Surface-to-Atmosphere Transfer-scheme: ConSAT
CPSC 531: System Modeling and Simulation
Markov chain monte carlo
Methods and Materials (cont.)
Application of Stochastic Techniques to the ARM Cloud-Radiation Parameterization Problem Dana Veron, Jaclyn Secora, Mike Foster, Christopher Weaver, and.
16. Mean Square Estimation
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Distributed Data Assimilation - A case study Aad J. van der Steen High Performance Computing Group Utrecht University 1. The application 2. Parallel implementation 3. Model and experiments 4. Perspectives for distributed implementation

" The application, Ensflow, assimilates ocean flow data into a stochasic ocean flow model. " Many realisations of the model with randomly distributed parameters forming an ensemble are run. " Perodically these runs are integrated with satellite data and an optimal ensemble average is computed. " The sequence of ensemble averages over time describes the development of the ocean's currents best fitting the observations. The application - 1

The application - 2 The region of interest in the southern tip of Africa: Data from the TOPEX/Nimbus satellite are used for the assimilation. Purpose is to understand the evolution of streams and eddies in this region.

The application -3 Because of the stochastic nature of the model many realisations of the model with slightly different parameter values are to be evolved. The observations of the top layer values are interpolated to a 251x151 grid. The ensemble members are allowed to develop independently for some time and combined to find the ensemble mean With the best estimate for the model evolution without observations. = matrix of field measurement covariances. = matrix of representer coefficients.

The application - 4 The model performs two computational intensive tasks: 1. Generation of the ensemble members. 2. The computational flow part that describes the evolution of the stream function. Every 240 hourly timesteps an analysis of the ensemble is done to obtain the optimal estimate for the past period.

Parallel implementation Ensemble members are distributed evenly over the processors. 2. Data of ensemble members are independent and are local to the processors. 3. Only in the analysis phase to determine the globally optimal data have to be exchanged (using MPI). 4. The optimal global field is distributed and a new cycle is started.

Parallel implementation -2 The program contains 2 irreducible scalar parts: 1. Initialisation, linearly dependent on the number of ensemble members, and depends on, the number of gridpoints by. Init time =. 2. The analysis part for which holds that the analysis time. On the DAS-2 systems and (for ).

Parallel implementation -3 The time per ensemble member per 24h time step This amounts to 20x60x30 = 36,000s = 10h single processor time for the complete 20 day cycle considered. After the init phase a distribute operation and per analysis step a collect and a distribute operation are required. The total amount of data moved is. The bandwidth at with this occurs is MB/s (using Myrinet on one cluster). So, the total communication time is about 0.12s per transfer. Total communication time within one run.

Model and experiments -1 The timing model has the following form:

Model and experiments -2 Remarks: 1. There is a mistery with respect to the computation phases: for p = 1, for p > 1 consistently. 2. For p < 6, using 1 CPU/node is somewhat faster, from p = 6 on, 2 CPUs/node is marginally faster due to decreasing competition for memory and faster intra-node communication.

Model and experiments: Simulation results Shown is a simulation of 180 dayly periods, note the blue eddies that form counterclocwise in the Atlantic.

Perspectives for distributed implementation -1 The timing model has the following form: In the single-cluster implementation is quite small (ca. 15 s) and virtually independent of. For the distributed version this might not be the case: 1. Presently Globus cannot be used yet in conjunction with Myrinet's MPI, communication must be done via IP. 2. The geographical distance between the DAS clusters introduces non-negligable latencies.

Perspectives for distributed implementation -2 As can be seen from the figure the communication time is still insignificant when distributing the model over two locations (UU and VU):

Perspectives for distributed implementation -3 is quite erratic, more determined by synchronisation than communication time:

Perspectives for distributed implementation -4 The results show that this application is excellently suited for distributed processes. Still, both communication and the analysis phase may be made more efficient: 1. When is known which process id.s are located where, first intra-cluster communication can be done, then the assembled messages can be exchanged. 2. The analysis could be done on the local ensemble members (remember ) and synchronised less frequently.

Perspectives for distributed implementation -5 Using more sites has a notable effect on the communication. Again, synchronisation effects are more important than the communication time proper : Sites Exec.time (s) Comm.time(s) proc. run: UU, UU+VU, UU+VU+Leiden, UU+VU+Leiden+Delft

Perspectives for distributed implementation -6 This case study was a particular well suited candidate for distributed processing. Apart from improving this implementation we will proceed with three other projects that are promising: 1. Running two coupled oceanographic model within the Cactus framework. 2. Inexact sequence matching of genetic material. 3. Pattern recognition on proteomic micro arrays. Acknowledgements: Fons van Hees for the single-system parallelisation