Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building scientific software on POWER, what help is available?

Similar presentations


Presentation on theme: "Building scientific software on POWER, what help is available?"— Presentation transcript:

1 Building scientific software on POWER, what help is available?
Andrew Edmondson

2 University of Birmingham
Royal charter in 1900 (history back to 1828) Member of Russell Group 34835 students (2017/18) – 4th largest in UK 11 staff and alumni are Nobel prize winners £134.2M research income (2017/18) £3.5bn economic impact Campuses in Birmingham and Dubai

3 IBM® POWER9™

4 RSEConUK 2019 Programme Chair

5

6 BEAR AI “In November 2018 we announced the imminent arrival at the University of the largest IBM POWER9 AI cluster in the UK. We are now inviting pilot users on to the system... We are particularly looking for people who use TensorFlow, PyTorch, or other GPU-accelerated software to contact us. We will help you use the service.” [

7 Scientific Software Installed version: 1.10.1
Python-based open source machine learning framework from Google. Installed version: 1.0.1 An open source deep learning platform from Facebook. Installed version: Very popular HPC molecular dynamics package with GPU acceleration.

8 Scientific Software Plus:
Amber Autoconf Automake Autotools Bazel BEDTools binutils Biopython Bison BLAST+ Blosc BOLT-LMM Boost Boost.Python bzip2 cairo CALMET CALPOST CALPUFF chiron CMake CUDA cuDNN Cufflinks CUnit cupy cURL DBus Deepbinner DendroPy dnaMD do_x3dna Doxygen Eigen EIGENSOFT ETE expat fast5_fetcher FFmpeg FFTW FLANN flappie flex fontconfig foss fosscuda FreeBayes freeglut FreeSurfer freetype FriBidi FSL future GATK GBOOST gc GCC GCCcore gcccuda gettext Ghostscript GLib GLPK GMP GObject-Introspection gompi gompic gperf Graphviz GROMACS GSL GTS Guile h5py HarfBuzz HDF5 hdf5storage help2man HPL HTSeq hwloc hypothesis icc intltool IPython JasPer Java Keras LAME LAMMPS libdrm libffi libgd libGLU libgpuarray libiconv LiBiNorm libjpeg-turbo libmatheval libpciaccess libpng libreadline libsodium libStatGen LibTIFF libtool libunistring libxml2 libxslt libyaml LLVM LMDB LWP-Protocol-https lxml LZO M4 magma Mako mappy matplotlib mayavi Mesa METIS mne MPFR MUMmer NAG NASM NCCL ncurses netCDF netCDF-Fortran nettle NiBabel Ninja NLopt NSPR NSS numactl numexpr ont-fast5-api Open3D OpenBLAS OpenMPI OpenPGM OptiType ORCA Pango PCRE Perl PGI picopore Pillow Pillow-SIMD pixman pkgconfig pkg-config PLUMED powerveclib protobuf protobuf-python psutil pylmdb Pyomo PyQt5 Pysam PyTables Python PyTorch PyYAML Qhull Qt5 SAMtools ScaLAPACK scikit-learn scikit-multiflow SCOTCH SeqAn snp-sites Sphinx SQLite SWIG Szip tb_nightly Tcl TensorFlow tensorflow-probability Theano Tk Tkinter TopHat torchvision transIndel uMatIC util-linux v8 veclib VTK wheel X11 x264 x265 x3dna XML-Parser xorg-macros xprop XZ Yasm ZeroMQ zlib So far…

9 Opt 1: PowerAI Conda Channel
Easy to install, IBM provided (=> supported) conda install powerai=1.6.0 conda install tensorflow-gpu powerai-release=1.6.0

10 Opt 2: EasyBuild “EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.”

11 EasyBuild EasyBuild allows us to easily and reproducibly build lots of scientific software for various different platforms. We have: EL7 sandybridge, haswell, broadwell, skylake, cascadelake Ubuntu haswell And now EL7 POWER9

12 Aside: EasyBuild and Conda
We’ve also made an EasyBuild recipe to install the PowerAI packages from the Conda channel…

13 Anon. IBM bigwig quote “You change a couple of flags for the compiler and it just builds on POWER” [2019] Um. No. But if it goes wrong, there is help available…

14 EasyBuild Talked before about EasyBuild expecting Intel…
Working with maintainers to support POWER E.g. CUDA, Java, NCCL, PyTorch, PGI, LAMMPS, Mesa, TensorFlow, Amber…

15 EasyBuild CUDA: Distributed as binary, source file named differently on Intel vs. POWER Previous version RPM only, but latest has .run file (hooray!) Java: Oracle Java not available for POWER Had to modify EasyBuild to use OpenJDK

16 EasyBuild Latest EasyBuild has fixes for some of those things – some from Birmingham, some from other people. E.g. TensorFlow now uses PIP, and “just works” on POWER

17 Challenges/Problems 1. Simon talked about porting code to POWER
Not going to say any more here 2. When things go wrong…

18 Problem: NumPy/SciPy “The fundamental package for scientific computing with Python.” “[NumPy] is a very important library on which almost every data science or machine learning Python packages such as SciPy (Scientific Python), Mat−plotlib (plotting library), Scikit-learn, etc depends...”

19 Problem: NumPy/SciPy Building SciPy on POWER9 using EasyBuild… (core dumped) Investigate… EasyBuild doesn’t didn’t parse the test results… and there were failures. We/it just hadn’t noticed them. No-one else (it seems) had run the SciPy tests on POWER This affected every NumPy/SciPy version that we tested… including the PowerAI Conda packages. Shout “Help!”

20 Help: NumPy/SciPy Send message on EasyBuild Slack… [4 June 2019]
The EasyBuild community were alarmed that NumPy and SciPy tests hadn’t been checked… so they started working on a fix. Created bug report on SciPy’s GitHub: Send message on PowerAIUG Slack… Our IBM contact read it and started talking to IBM dev teams in North America… “To let everyone know - there is a lot of discussion about the bug you mention above within our Development team's Slack channel. They're currently investigating the problem with a variety of packages” [June 2019]

21 Problem: NumPy/SciPy OpenBLAS
After several s with updates, questions etc… IBM identified the problem is ppc64le bugs in OpenBLAS IBM did lots of dev work on and made several patches These have been incorporated into 0.3.6 We changed to OpenBLAS and NumPy/SciPy seem to be fixed. Hooray! DO NOT USE un-patched OpenBLAS < on POWER

22 Fixed... We now have TensorFlow “foss” and “fosscuda” working on POWER9 via EasyBuild using OpenBLAS Our new “2019a” stack.

23 AssertionError: Arrays are not equal
A moment of balance… Alternative title: “The pain OpenBLAS have caused me” Alternative title 2: “It’s not just POWER” OpenBLAS bug… AssertionError: Arrays are not equal Had to rebuild our entire 2018b stack on 5 platforms with OpenBLAS

24 Building scientific software on POWER, what help is available?

25

26 Questions


Download ppt "Building scientific software on POWER, what help is available?"

Similar presentations


Ads by Google