An evaluation of the Intel Xeon E5 Processor Series Zurich Launch Event 8 March 2012 Sverre Jarp, CERN openlab CTO Technical team: A.Lazzaro, J.Leduc,

Slides:



Advertisements
Similar presentations
Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
Advertisements

1 Jacob Thomas Basu Vaidyanathan Bret Olszewski Session II April 2014 POWER8 Benchmark and Performance.
Computer Abstractions and Technology
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Chapter 4 Assessing and Understanding Performance
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
COMPUTER ARCHITECTURE
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Computer Performance Computer Engineering Department.
Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
History of Microprocessor MPIntroductionData BusAddress Bus
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
 Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). 
 Copyright, HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill, NC.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Fast Benchmark Michele Michelotto – INFN Padova Manfred Alef – GridKa Karlsruhe 1.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
Tim 18/09/2015 2Tim Bell - Australian Bureau of Meteorology Visit.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Power and Cooling at Texas Advanced Computing Center Tommy Minyard, Ph.D. Director of Advanced Computing Systems 42 nd HPC User Forum September 8, 2011.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
SJ – Sept Future processors: What is on the horizon for HEP computing Sverre Jarp CERN openlab CTO IT Department, CERN.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
LHC Computing, CERN, & Federated Identities
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
CERN News on Grid and openlab François Fluckiger, Manager, CERN openlab for DataGrid Applications.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice ProLiant G5 to G6 Processor Positioning.
PASTA 2010 CPU, Disk in 2010 and beyond m. michelotto.
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
A successful public-private partnership
Measuring Performance II and Logic Design
Intel and AMD processors
Title of the Poster Supervised By: Prof.*********
Grid site as a tool for data processing and data analysis
A successful public-private partnership
Hot Processors Of Today
Passive benchmarking of ATLAS Tier-0 CPUs
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
Morgan Kaufmann Publishers
Architecture & Organization 1
Simulation use cases for T2 in ALICE
Architecture & Organization 1
Types of Computers Mainframe/Server
Computer Evolution and Performance
Lecture 3: MIPS Instruction Set
Multicore and GPU Programming
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

An evaluation of the Intel Xeon E5 Processor Series Zurich Launch Event 8 March 2012 Sverre Jarp, CERN openlab CTO Technical team: A.Lazzaro, J.Leduc, A.Nowak

Mont Blanc (4,808m) Lake Geneva (310m deep) Geneva (pop. 190’000)

Intense data pressure creates strong demand for computing 250’000 IA computing cores Tens of petabytes stored per year Raw data: a few petabytes per second A rigorous selection process enables us to find that one interesting event in 10 trillion (10 13 )

The Worldwide LHC Computing Grid Tier-1: permanent storage, re- processing, analysis Tier-0 (CERN): data recording, reconstruction and distribution Tier-2: Simulation, end-user analysis > 1 million jobs/day ~250’000 cores 173 PB of storage nearly 160 sites 10 Gb links

The CERN openlab A unique research partnership of CERN and the industry Objective: The advancement of cutting-edge computing solutions to be used by the worldwide LHC community Partners support manpower and equipment in dedicated competence centers openlab delivers published research and evaluations based on partners’ solutions – in a very challenging setting Created robust hands-on training program in various computing topics, including international computing schools; summer student programme Past involvement: Enterasys Networks, IBM, Voltaire, F- secure, Stonesoft, EDS; New contributor: Huawei Just started phase IV:

Benchmarking: A complex affair In modern servers, at least the following elements need to be controlled: In modern servers, at least the following elements need to be controlled: – Hardware: Processor generation Processor generation Socket count Socket count Core count Core count CPU frequency CPU frequency Turbo boost Turbo boost SMT SMT Cache sizes Cache sizes Memory size and type Memory size and type Power configuration Power configuration – Software: Operating System version Operating System version Compiler version and flags Compiler version and flags 8 March 20126

Xeon E5 in some detail Advanced Vector eXtensions (AVX) Advanced Vector eXtensions (AVX) – 256 bit registers which can hold 4 doubles/8 floats – AVX instruction set More execution units More execution units – Two load units, for instance Enhanced Hyper-threading and Turbo- boost technology Enhanced Hyper-threading and Turbo- boost technology Larger on-die L3 cache Larger on-die L3 cache Integrated PCI Express 3.0 I/O Integrated PCI Express 3.0 I/O 8 March 20127

Our Xeon E5 testing System tested: System tested: – Beta-level white box; Dual-socket server. – Xeon 2.7 GHz, 8 cores, 130W TDP 32 GB memory (1333 MHz) 32 GB memory (1333 MHz) C1 stepping C1 stepping – Code name: “Sandy Bridge EP” Benchmarks used: Benchmarks used: – HEPSPEC – HEPSPEC/W – MT-Geant4 – MLfit 8 March 20128

HEPSPEC Throughput test from SPEC 2006 Throughput test from SPEC 2006 – All the C++ jobs (INT as well as FP); As many copies as cores – Scientific Linux CERN (SLC) 5.7/gcc 4.1.2/64-bit mode/Turbo off/SMT on – Compared to 6-core “Westmere-EP” Xeon X5670 GHz) Frequency-scaled Frequency-scaled 8 March Using only the “real” cores: Speed-up per core:1.2x Core count: 1.33x Total: 1.6x SMT gain (for both):1.23x

Energy efficiency For CERN and most W-LCG sites, energy efficiency is paramount For CERN and most W-LCG sites, energy efficiency is paramount – Our centres have (more or less) a fixed amount of electric energy – Ideally, we would like to double the throughput/watt from generation to generation – This was relatively easy when core count increased geometrically: 1  2  4 1  2  4 – Recently, however, it has been increasing arithmetically: 4 (Xeon 5500)  6 (Xeon 5600)  8 (Xeon E5-2600) 4 (Xeon 5500)  6 (Xeon 5600)  8 (Xeon E5-2600) 8 March

HEPSPEC/Watt Great news: Bigger jump than foreseen in energy efficiency! Great news: Bigger jump than foreseen in energy efficiency! – Now reaching 1 HEPSPEC/W which is 1.7x compared to Xeon X5670 Xeon E5 options: SLC 5.7, 64-bit mode, SMT on, Turbo on Xeon E5 options: SLC 5.7, 64-bit mode, SMT on, Turbo on Xeon 5600 options: SLC 5.4 Xeon 5600 options: SLC March Bigger is better! Xeon 5600 Xeon E STOP PRESS: With SLC 6 (gcc 4.4.6) we further lower the power consumption by 5% and increase the HEPSPEC results by 3%: 1.083x in total !

MT Geant4 Our favourite benchmark for testing weak scaling: Our favourite benchmark for testing weak scaling: A threaded version of CERN’s detector simulation program A threaded version of CERN’s detector simulation program – Speed-up compared to previous generation Both with Turbo-off, SMT-on (L5640 frequency-adjusted): 1.46x Both with Turbo-off, SMT-on (L5640 frequency-adjusted): 1.46x 8 March SLC 5.7, gcc 4.3.3, pinning of threads Xeon E SMT speed-up: 1.25x

MLFit Our favourite benchmark for testing strong scaling: Our favourite benchmark for testing strong scaling: A threaded/vectorised data analysis program A threaded/vectorised data analysis program – Single core (Turbo off, using SSE):1.19x – Single core, moving to AVX:1.12x – All the “real” cores w/SSE: (1.33 * 1.19)1.59x – All the “real” cores & AVX: (1.59 *1.12)1.78x 8 March x Xeon E SMT speed-up: 1.29x SLC 6.2, icc , pinning of threads

Conclusion The Intel Xeon E5 Processor Series confirms Intel’s desire to improve both absolute performance and performance per watt The Intel Xeon E5 Processor Series confirms Intel’s desire to improve both absolute performance and performance per watt CERN and W-LCG will appreciate both CERN and W-LCG will appreciate both – In particular, the HEPSPEC/W value – Now reaching 1 HEPSPEC/W which is 1.7x compared to previous generation (Xeon X5670) A full openlab evaluation report will be published at launch time A full openlab evaluation report will be published at launch time – – The Xeon X5670 report is available since April March