Initial experience on openCL pragamming and develop GPU solver for OpenFoam Presented by: Qingfeng Xia School of MACE University of Manchester Date: 2011-05-21.

Slides:



Advertisements
Similar presentations
GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.
Advertisements

Introduction to the CUDA Platform
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
The AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members December 9,
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,
OpenFOAM on a GPU-based Heterogeneous Cluster
CSCI 317 Mike Heroux1 Class Introduction CSCI 317 Mike Heroux.
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
1 Aug 7, 2004 GPU Req GPU Requirements for Large Scale Scientific Applications “Begin with the end in mind…” Dr. Mark Seager Asst DH for Advanced Technology.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
How to install CGAL Yuanzhen Wang. What is CGAL Computational Geometry Algorithms Library “Provide easy access to efficient and reliable geometric algorithms.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Efficient Pseudo-Random Number Generation for Monte-Carlo Simulations Using GPU Siddhant Mohanty, Subho Shankar Banerjee, Dushyant Goyal, Ajit Mohanty.
CSE328:Computer Graphics OpenGL Tutorial Dongli Zhang Department of Computer Science, SBU Department of Computer Science, Stony.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
Lecture 8: Caffe - CPU Optimization
OpenTS for Windows Compute Cluster Server. Overview  Introduction  OpenTS (academic) for Windows CCS  T-converter  T-microkernel  OpenTS installer.
Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Sobolev Showcase Computational Mathematics and Imaging Lab.
Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.
Porting the physical parametrizations on GPUs using directives X. Lapillonne, O. Fuhrer, Cristiano Padrin, Piero Lanucara, Alessandro Cheloni Eidgenössisches.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
Tests and tools for ENEA GRID Performance test: HPL (High Performance Linpack) Network monitoring A.Funel December 11, 2007.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
- DHRUVA TIRUMALA BUKKAPATNAM Geant4 Geometry on a GPU.
ACES WorkshopJun-031 ACcESS Software System & High Level Modelling Languages by
CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.
AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members November 10, 2009.
Simulating the Nonlinear Schrodinger Equation using MATLAB with CUDA
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
Personal Chris Ward CS147 Fall  Recent offerings from NVIDA show that small companies or even individuals can now afford and own Super Computers.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
GPU Programming Shirley Moore CPS 5401 Fall 2013
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
GPU VSIPL: Core and Beyond Andrew Kerr 1, Dan Campbell 2, and Mark Richards 1 1 Georgia Institute of Technology 2 Georgia Tech Research Institute.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
The Library Approach to GPU Computations of Initial Value Problems Dave Yuen University of Minnesota, U.S.A. with Larry Hanyk and Radek Matyska Charles.
1 Programming and problem solving in C, Maxima, and Excel.
SimTK 1.0 Workshop Downloads Jack Middleton March 20, 2008.
The Need for Speed in Flood Modelling November 13, 2014 Richard Connell Ben Tate Lachlan Inglis Vs.
Martin Kruliš by Martin Kruliš (v1.0)1.
June 13-15, 2010SPAA Managing the Complexity of Lookahead for LU Factorization with Pivoting Ernie Chan.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.
Guide To Develop Mobile Apps With Titanium. Agenda Overview Installation of Platform SDKs Pros of Appcelerator Titanium Cons of Appcelerator Titanium.
GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.
Evolution at CERN E. Da Riva1 CFD team supports CERN development 19 May 2011.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
CUDA Interoperability with Graphical Environments
Parallel Plasma Equilibrium Reconstruction Using GPU
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Introduction to .NET Core
Multi-Layer Perceptron On A GPU
R Programming.
GPU Implementations for Finite Element Methods
Simulation at NASA for the Space Radiation Effort
CS 179: GPU Programming Lecture 19: Projects 1
Presentation transcript:

Initial experience on openCL pragamming and develop GPU solver for OpenFoam Presented by: Qingfeng Xia School of MACE University of Manchester Date:

Structure Part 1. Introduction to OpenCL tools coding, analyzing, debugging and profiling tools ViennaCL C++ template library for BLAS1, 2,3 Part 2. Introduction to OpenFoam Summary of GPU plugins Part 3. Profiling results 3.1profiling ViennaCL blas library 3.2 profiling OpenFoam with GPU plugins Future work: real-time PIV and DSMC solver

AMD APP kernel analyzer and profiler Command line tool: sprofile32 for Linux Gui tool is a plugin for Visual Studio 2008/2010, Professional version is needed

AMD APP kernel analyzer

AMD APP Profiler

Nvidia visual profiler (cross-platfrom)

gDEBbuger(cross-platform) (1)Powerful tool for openGL debugging and profiling, now available for openCL (2) Cross-GPU platform: Support Nvidia and AMD GPU (3) Cross-OS: windows and Linux Mac Cons: Too powerful to quick get work

IDE for C++ development (codelite) Cross-platform IDE

viennaCL : openCL c++ blas lib Brilliant lib, BLAS I,II, III Same API with Boost::ublas, can fall back to CPU However, this lib can not been linked with OpenFoam (Error: Segment fault)

Part 2 Introduction to OpenFoam (CFD) Installation: (hope your GPU support double precision) load/git.php

2.1 Quick Introduction to OpenFoam Free Computional Fluid Dynamics(CFD) (1) OpenFoam is programmed in C++, without an GUI frontend. (2)Code_saturne (finite volume) programmed in Fortran, has GUI front end

GPU solvers for OpenFoam (1) OF plugin for OpenFoam (before 2010) Only free for single precision (2) Ofgpm package (GPL, May 2011) from Symscape.com, Which transplant OpenFoam from *nix to windows and develop a GUI(Cadium) for OpenFoam No preconditioner is implemented, No benchmark is done

My work : clUtils, clFoam, vclFoam

(1) clUtils Just as practice of openCL programming, and provide utility for clFoam solver. Mainly in C, so there is no template suport, Single precision and double precision is switchable via Macro #define scalar float

(2)clFoam(PCG & PBiCG) (1) parallel the CPU serial code to parallel code in openCL, all preconditioners of Openfoam are usable (2) my own PCG and Bistab solver implented according to algorithms of textbook.

(3)vclFoam Wrappers to call ViennalCL sparse matrix solving utility. No preconditioner is implemented. Yet, do not work until now (gcc 4.4 openfoam 1.7)

Part 3: profiling (1) Tricks on the profiling

Profiling method of ViennaCL

Vector adding via ViennaCL

SpeedIT classic PCG solver A Japanese research has already make an profiling The result show the PCG on GPU is 3 time slower than CPU.

clFoam profiliing

Profiling platform Redqueen.rcs.manchester.ac.uk CPU AMD core 4, 2.3 GHz using one core on the cluster node GPU: Telsa C2050

Conclusion (1)Looks promising, peak Gflops is hundreds times higher than single CPU. (2)But not powerful enough to boost CFD simulation now. Domain decomposition is still the most effective way.

Future work (1) Real-time PIV vectoring processing up to 10 Hz. Most of calculation time is spent on Inter?? between 32X32pixels spots. it can make the best usage of the fast local cache on GPU. (2) Direct simulation Monte Carlo method: particles tracking,etc.

Acknowledgement RCS Dr Mike Bane, Dr. Simon, etc. Test on Nvidia GPU of the cluster Redqueen