Porting the MIT Global Circulation Model on the CellBE Processor

Slides:



Advertisements
Similar presentations
Workshop finale dei Progetti Grid del PON "Ricerca" Avviso Febbraio 2009 Catania Abstract In the contest of the S.Co.P.E. italian.
Advertisements

J ACOBI I TERATIVE TECHNIQUE ON M ULTI GPU PLATFORM By Ishtiaq Hossain Venkata Krishna Nimmagadda.
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Distributed Load Balancing for Parallel Agent-based Simulations Biagio Cosenza*, Gennaro Cordasco, Rosario De Chiara, Vittorio Scarano ISISLab, Dipartimento.
Extending the capability of TOUGHREACT simulator using parallel computing Application to environmental problems.
MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
Linné FLOW Centre Research on Ekman at the Linné Flow Center, KTH Mechanics Dan Henningson, Director.
Landscape Erosion Kirsten Meeker
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
Crash Simulations Using the SGI Altix 3700 Presented By: Levaughn Denton March 24, 2009.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Introduction Computational Challenges Serial Solutions Distributed Memory Solution Shared Memory Solution Parallel Analysis Conclusion Introduction: 
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Massively LDPC Decoding on Multicore Architectures Present by : fakewen.
Efficient Parallel Implementation of Molecular Dynamics with Embedded Atom Method on Multi-core Platforms Reporter: Jilin Zhang Authors:Changjun Hu, Yali.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
NSF NCAR | NASA GSFC | DOE LANL ANL | NOAA NCEP GFDL | MIT Adoption and field tests of M.I.T General Circulation Model (MITgcm) with ESMF Chris Hill ESMF.
Processing of a CAD/CAE Jobs in grid environment using Elmer Electronics Group, Physics Department, Faculty of Science, Ain Shams University, Mohamed Hussein.
© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, Scott Geaghan, Myra Jean Prelle, Brian Bouzas,
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Two Phase Flow using two levels of preconditioning on the GPU Prof. Kees Vuik and Rohit Gupta Delft Institute of Applied Mathematics.
Improved pipelining and domain decomposition in QuickPIC Chengkun Huang (UCLA/LANL) and members of FACET collaboration SciDAC COMPASS all hands meeting.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
Sept COMP60611 Fundamentals of Concurrency Lab Exercise 2 Notes Notes on the finite difference performance model example – for the lab… Graham Riley,
Parallelization of 2D Lid-Driven Cavity Flow
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
High performance parallel computing of climate models towards the Earth Simulator --- computing science activities at CRIEPI --- Yoshikatsu Yoshida and.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
High Performance Computing Group Feasibility Study of MPI Implementation on the Heterogeneous Multi-Core Cell BE TM Architecture Feasibility Study of MPI.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
High performance computing for Darcy compositional single phase fluid flow simulations L.Agélas, I.Faille, S.Wolf, S.Réquena Institut Français du Pétrole.
Slide 1 NEMOVAR-LEFE Workshop 22/ Slide 1 Current status of NEMOVAR Kristian Mogensen.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
XRD data analysis software development. Outline  Background  Reasons for change  Conversion challenges  Status 2.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Emerging Research Opportunities at the Climate Modeling Laboratory NC State University (Presentation at NIA Meeting: 9/04/03) Fredrick H. M. Semazzi North.
Hui Liu University of Calgary
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Xing Cai University of Oslo
R. Rastogi, A. Srivastava , K. Sirasala , H. Chavhan , K. Khonde
COMPUTATIONAL MODELS.
Jack Dongarra University of Tennessee
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2
High Performance Computing on an IBM Cell Processor --- Bioinformatics
CS 584 Lecture 3 How is the assignment going?.
CRESCO Project: Salvatore Raia
Applying Twister to Scientific Applications
Linchuan Chen, Peng Jiang and Gagan Agrawal
FUJIN: a parallel framework for meteorological models
COMP60621 Designing for Parallelism
Laura Bright David Maier Portland State University
CINECA HIGH PERFORMANCE COMPUTING SYSTEM
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Multicore and GPU Programming
Multicore and GPU Programming
Presentation transcript:

Porting the MIT Global Circulation Model on the CellBE Processor Marco POLLINI (1), Paolo PALAZZARI (1,2), Vittorio ROSATO (1,2) (1) Ylichron Srl, Roma (2) ENEA CRESCO Project , Casaccia Research Centre, Roma February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CRESCO platform February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor 204.8 GB/s 25.6 GFlop/s February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 The CellBE processor February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

MIT Global Circulation Model The MITgcm (MIT General Circulation Model) is a numerical model designed for study of the atmosphere, ocean, and climate. Its non-hydrostatic formulation enables it to simulate fluid phenomena over a wide range of scales. February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 Domain Decomposition The physical simulation domain is partitioned in 3D tiles. In the example, the domain is split along the x and y dimensions February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 CG2D Routine Most demanding kernel of the code; Solves a linear system of equations Ax=b Parallelized and demanded to the 8 SPEs Rewritten in C Each tile is split in 8 sub-tiles, each managed by one SPE February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Add the borders within the tile … February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Workshop Progetti GRID PON Ricerca, Catania 2009 Outline The CellBE Processor The MITGCM (Global Circulation Model) Parallelization of the the 2D Conjugate Gradient routine on the CellBE (Very) Preliminary performance evaluations February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Performance evaluation No specific effort has been devoted (till now) to optimize the code. Focus to guarantee results correctness Reported results compare (for a cluster of 8 CellBE processors) MPI implementation on a cluster of PPE (PowerPC) nodes MPI implementation on the same PPE cluster + 8 SPE threads per PPE node February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009

Performance evaluation Execution time (200 iterations) of CG2D code (>90% of the whole computational effort of the MITgcm code) MPI MPI+SPE 1.51 sec 0.443 sec Speed-up S0 = 3.46 Expected performances upon code optimization and update to new Cell architecture (native Double) ≥ 10xS0 February 11 , 2009 Workshop Progetti GRID PON Ricerca, Catania 2009