J.J. Keijser Nikhef Amsterdam Grid Group MyFirstMic experience Jan Just Keijser 26 November 2013.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.
Internet of Things with Intel Edison Presentation Paul Guermonprez Intel Software
Information Technology Center Introduction to High Performance Computing at KFUPM.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
64bit Development Overview March 28 Microsoft. Objectives Learn about the current 64-bit platforms from a hardware, software and tools perspective Review.
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
NETL 2014 Workshop on Multiphase Flow Science August 5-6, 2014, Morgantown, WV Accelerating MFIX-DEM code on the Intel Xeon Phi Dr. Handan Liu Dr. Danesh.
Contemporary Languages in Parallel Computing Raymond Hummel.
Bondyakov A.S. Institute of Physics of ANAS, Azerbaijan JINR, Dubna.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
DATA STRUCTURES OPTIMISATION FOR MANY-CORE SYSTEMS Matthew Freeman | Supervisor: Maciej Golebiewski CSIRO Vacation Scholar Program
1 Intel® Many Integrated Core (Intel® MIC) Architecture MARC Program Status and Essentials to Programming the Intel ® Xeon ® Phi ™ Coprocessor (based on.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Computing Labs CL5 / CL6 Multi-/Many-Core Programming with Intel Xeon Phi Coprocessors Rogério Iope São Paulo State University (UNESP)
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Running Mantevo Benchmark on a Bare-metal Server Mohammad H. Mofrad January 28, 2016
Processors with Hyper-Threading and AliRoot performance Jiří Chudoba FZÚ, Prague.
HEPMARK2 Consiglio di Sezione 9 Luglio 2012 Michele Michelotto - Padova.
Hardware, Software, and Mobile Systems Chapter 4.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
How to use HybriLIT Matveev M. A., Zuev M.I. Heterogeneous Computations team HybriLIT Laboratory of Information Technologies (LIT), Joint Institute for.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Intra-Socket and Inter-Socket Communication in Multi-core Systems Roshan N.P S7 CSB Roll no:29.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
Martin Kruliš by Martin Kruliš (v1.1)1.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
CERN IT Department CH-1211 Genève 23 Switzerland Benchmarking of CPU servers Benchmarking of (CPU) servers Dr. Ulrich Schwickerath, CERN.
The LGI Pilot job portal EGI Technical Forum 20 September 2011 Jan Just Keijser Willem van Engen Mark Somers.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Intel Many Integrated Cores Architecture
NFV Compute Acceleration APIs and Evaluation
Cluster Status & Plans —— Gang Qin
Matt Lemons Nate Mayotte
Konstantinos Krommydas, Ruchira Sasanka (Intel), Wu-chun Feng
A Tool for Chemical Kinetics Simulation on Accelerated Architectures
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Cluster Optimisation using Cgroups
Virtualization OVERVIEW
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Tom LeCompte High Energy Physics Division Argonne National Laboratory
Diskpool and cloud storage benchmarks used in IT-DSS
SAM at CCIN2P3 configuration issues
Low-Cost High-Performance Computing Via Consumer GPUs
CRESCO Project: Salvatore Raia
Virtualization in the gLite Grid Middleware software process
Unit 2 Computer Systems HND in Computing and Systems Development
Overview Introduction VPS Understanding VPS Architecture
Conditions leading to the rise of virtual machines
What is Parallel and Distributed computing?
Intel® Parallel Studio and Advisor
Dr. Barry Wilkinson © B. Wilkinson Modification date: Jan 9a, 2014
Template for IXPUG EMEA Ostrava, 2016
Support for ”interactive batch”
Alternative Processor Panel Results 2008
Hybrid Programming with OpenMP and MPI
SAP HANA Cost-optimized Hardware for Non-Production
Multithreading Why & How.
Multicore and GPU Programming
Multicore and GPU Programming
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

J.J. Keijser Nikhef Amsterdam Grid Group MyFirstMic experience Jan Just Keijser 26 November 2013

J.J. Keijser Nikhef Amsterdam Grid Group

J.J. Keijser Nikhef Amsterdam Grid Group What have we got? Supermicro server 'pleedo' Dual 2.00 GHz 64 GB RAM Two Xeon Phi (aka 'Intel MICs') model kHz Each card has ◦60 cores with 4 threads each ◦8 GB DDR5 RAM ◦PCI Express v2 x GT/s

J.J. Keijser Nikhef Amsterdam Grid Group What can we do with it? Massively parallel computing: ◦Manycore applications ◦OpenMP and/or MPI jobs Runs Linux ◦Kewl! 'cat /proc/cpuinfo' returns 240 cores per card Can be reached via minicom and ssh

J.J. Keijser Nikhef Amsterdam Grid Group How do we do that? 1.Recompile code using Intel C/C++ and/or FORTRAN compiler suite 2.Copy file to Xeon Phi (or use NFS to exchange data) 3.Run code natively on the Xeon Phi 4.Copy results back (or again, use NFS)

J.J. Keijser Nikhef Amsterdam Grid Group Sounds too easy... It does ◦Recompiling code does not optimize it for the new architecture ◦The Intel compiler and the gcc compiler are almost compatible ◦The Intel compiler runs on the host machine, not on the Xeon Phi, so we're cross-compiling ◦(Some HEP build frameworks do not like this)

J.J. Keijser Nikhef Amsterdam Grid Group Openssl speed test Completely useless but very handy test: ◦openssl speed -evp aes-256-cbc -multi Test was extended to 30 seconds (default=3) and was run 3 times for each value of Advantages: ◦Compare results to regular CPUs ◦Scales very well with the number of cores/threads ◦Embarrassingly parallel ◦Low memory usage

J.J. Keijser Nikhef Amsterdam Grid Group Results # cores

J.J. Keijser Nikhef Amsterdam Grid Group Versus GHz # cores That's only a factor of 15 Even when correcting for optimisation the difference is still a factor of 5

J.J. Keijser Nikhef Amsterdam Grid Group Quantum chemistry code “Real life” usecase C & FORTRAN OpenMP Optimised for “normal” Xeons Low memory usage

J.J. Keijser Nikhef Amsterdam Grid Group Results (Courtesy Mark Leiden University) Do not be fooled by this plot: the E is still a factor of 10 faster and can access all system memory

J.J. Keijser Nikhef Amsterdam Grid Group Issues Initially the Xeon Phi's were very unstable Core temperature when idle was 91ºC During testing temperature went up to 98ºC after which the card shut down and a hardware reset was necessary ◦Fixed by setting chassis fans to always be on Intel compiler suite requires a license. For academic use a free license can be used, valid for 1 year 'root' code has a compile target for the Xeon Phi's that does not work out of the box

J.J. Keijser Nikhef Amsterdam Grid Group What's next? Cross-compile 'openjdk' Get 'root' running Examine “dips” in performance Gain more experience in debugging and tuning multi/many core applications

J.J. Keijser Nikhef Amsterdam Grid Group