Eos Compilers Fernanda Foertter HPC User Assistance Specialist.

Slides:



Advertisements
Similar presentations
Presented by The Evolving Environment at NCCS Vickie Lynch.
Advertisements

Win8 on Intel Programming Course Desktop : Introduction Cédric Andreolli Intel Software.
Accelerated Linear Algebra Libraries James Wynne III NCCS User Assistance.
ORNL is managed by UT-Battelle for the US Department of Energy Titan Cross Compile Adam Simpson OLCF User Support.
Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel ® Software Development.
1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001.
ARCHER Tips and Tricks A few notes from the CSE team.
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
The Jacquard Programming Environment Mike Stewart NUG User Training, 10/3/05.
Profiling your application with Intel VTune at NERSC
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Software and Services Group Optimization Notice Advancing HPC == advancing the business of software Rich Altmaier Director of Engineering Sept 1, 2011.
Carnegie Mellon Lessons From Building Spiral The C Of My Dreams Franz Franchetti Carnegie Mellon University Lessons From Building Spiral The C Of My Dreams.
High Performance Computing The GotoBLAS Library. HPC: numerical libraries  Many numerically intensive applications make use of specialty libraries to.
Presented by Rengan Xu LCPC /16/2014
Cc Compiler Parallelization Options CSE 260 Mini-project Fall 2001 John Kerwin.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Using VASP on Ranger Hang Liu. About this work and talk – A part of an AUS project for VASP users from UCSB computational material science group led by.
Applying Data Copy To Improve Memory Performance of General Array Computations Qing Yi University of Texas at San Antonio.
1 Day 1 Module 2:. 2 Use key compiler optimization switches Upon completion of this module, you will be able to: Optimize software for the architecture.
Lecture 8: Caffe - CPU Optimization
A COMPARISON MPI vs POSIX Threads. Overview MPI allows you to run multiple processes on 1 host  How would running MPI on 1 host compare with POSIX thread.
Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 NERSC Software Roadmap David Skinner, NERSC Division, Berkeley Lab.
Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Computing Labs CL5 / CL6 Multi-/Many-Core Programming with Intel Xeon Phi Coprocessors Rogério Iope São Paulo State University (UNESP)
Sobolev Showcase Computational Mathematics and Imaging Lab.
A COMPARISON MPI vs POSIX Threads. Overview MPI allows you to run multiple processes on 1 host  How would running MPI on 1 host compare with a similar.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
The Cray XC30 “Darter” System Daniel Lucio. The Darter Supercomputer.
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
Performance Optimization Getting your programs to run faster CS 691.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
Zhengji Zhao, Nicholas Wright, and Katie Antypas NERSC Effects of Hyper- Threading on the NERSC workload on Edison NUG monthly meeting, June 6, 2013.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
1 The Portland Group, Inc. Brent Leback HPC User Forum, Broomfield, CO September 2009.
Performance Optimization Getting your programs to run faster.
1. 2 Define the purpose of MKL Upon completion of this module, you will be able to: Identify and discuss MKL contents Describe the MKL EnvironmentDiscuss.
Introduction to the XC30 For Users of Cray’s XT5 and XK7 Aaron Vose 1.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Belgrade, 24 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Introduction to an HPC environment.
How to Enforce Reproducibility with your Existing Intel ® Math Kernel Library Code Noah Clemons Technical Consulting Engineer Intel ® Developer Products.
ARCHER Advanced Research Computing High End Resource
Threaded Programming Lecture 2: Introduction to OpenMP.
Auto-Vectorization Jim Hogg Program Manager Visual C++ Compiler Microsoft Corporation.
Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.
TI Information – Selective Disclosure Implementation of Linear Algebra Libraries for Embedded Architectures Using BLIS September 28, 2015 Devangi Parikh.
Single Node Optimization Computational Astrophysics.
Intermediate Parallel Programming and Cluster Computing Workshop Oklahoma University, August 2010 Running, Using, and Maintaining a Cluster From a software.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
Zhengji Zhao NERSC User Engagement Group HBM tools and Applications in VASP code NERSC User Group Meeting, Berkeley, CA, Mar 23, 2016.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
NUMA Control for Hybrid Applications Kent Milfeld TACC May 5, 2015.
Native Computing & Optimization on Xeon Phi John D. McCalpin, Ph.D. Texas Advanced Computing Center.
Dr. Ilia Bermous, the Australian Bureau of Meteorology
SHARED MEMORY PROGRAMMING WITH OpenMP
Running R in parallel — principles and practice
Intel® Parallel Studio and Advisor
Compiler Ecosystem November 22, 2018 Computation Products Group.
Compiler Back End Panel
Compiler Back End Panel
Compiler Front End Panel
Parallel Computing Explained How to Parallelize a Code
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

Eos Compilers Fernanda Foertter HPC User Assistance Specialist

2 Available Compilers Cray, Intel, GNU and PGI Cray compiler wrappers – Same command (ftn/cc/CC) regardless of compiler – Links in MPI/Libsci libraries automatically – Intel is the default – Static linking is default module swap PrgEnv-cray PrgEnv-intel Always use the wrapper!

3 Compiler tips Some are more aggressive then others at optimization and standards Safe to assume vendor knows its architecture best (e.g. Intel) and thus better at optimizing for their own No compiler is superior for all codes Try different compilers, good way of catching bugs Test your application against lower optimization levels and included tests Use system provided libraries

4 Caution

5 Intel specific tips ProsCons Compatibility with Sandybrige Scalar optimization Vectorization Intel MKL Intel specific OpenMP extensions AVX (speeds up vector + FP) OpenMP 4.0 support No Cray math libraries available

6 Some Intel Compiler Options FlagDescription noneOptimizes for speed; no flags, approx –O2 level -fastMaximizes speed across the entire program. Aggressive. Includes -ipo -O3 -no-prec-div -static -xHost -fast -no-ipoNo IPO -O0Disables optimizations -O1Optimizes for speed but keeps binary small -O2Speed + vectorizaiton, inlining -O3O2 + aggressive loop transformations -mkl=clusterSingle threaded MKL (BLAS/LAPACK/scaLAPACK/FFTW) -mkl=parallelMulti-Threaded MKL (No ScaLAPACK) -xAVXDefault on Eos

7 MKL link line builder Source:

8 Not a lot of difference (lower is better) Edison Benchmark Results at NERSC Source:

9 Edison Benchmark Results at NERSC Not a lot of difference (lower is better) Source:

10 Known Issue: Intel OpenMP Hybrid Source: Introducing the Intel Compiler on Edison, Michael Stuart, NUG Feb 2013 The Cray thread affinity settings and Intel's run time OpenMP environment conflict. Intel has an extra thread at run time, so 2 threads are scheduled on the same core and the job takes twice as long as it should.

11 Intel OpenMP Hybrid OMP_NUM_THREADS <=8 set KMP_AFFINITY compact + aprun -cc numa_node OMP_NUM_THREADS >8 and <=16 set KMP_AFFINITY scatter + aprun -cc none NERSC’s Edison Solution Source: Introducing the Intel Compiler on Edison, Michael Stuart, NUG Feb 2013 CPU-binding should be turned off Allows user compute threads to spread out over available resources Helper thread will no longer impact performance

12 OLCF.ORNL.GOV