Processors with Hyper-Threading and AliRoot performance Jiří Chudoba FZÚ, Prague.

Slides:



Advertisements
Similar presentations
The Central Processing Unit: What Goes on Inside the Computer.
Advertisements

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
Evaluation of the 2-way Opteron 1U system Klaus Schossmaier CERN EP-AID Computing Seminar 3 September 2003 Performance test of PCs based on AMD platforms.
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
1 Hardware and Software Architecture Chapter 2 n The Intel Processor Architecture n History of PC Memory Usage (Real Mode)
CSCE101 – 4.2, 4.3 October 17, Power Supply Surge Protector –protects from power spikes which ruin hardware. Voltage Regulator – protects from insufficient.
Hyper Threading By Jeffrey Rodriguez. What is Hyper Threading? Intel’s implementation of Symmetric Multithreading (SMT) Two threads executing concurrently.
Lecture 3: Computer Performance
David A. Lifka Chief Technical Officer Cornell Theory Center Cycle Scavenging with Windows The Virtues of Virtual Server David
HS06 on the last generation of CPU for HEP server farm Michele Michelotto 1.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
UNIX System Administration Handbook Chapter 4. Controlling Processes 3 rd Edition Evi Nemeth et al. Li Song CMSC691X Summer 2002.
Adam Meyer, Michael Beck, Christopher Koch, and Patrick Gerber.
DELL PowerEdge 6800 performance for MR study Alexander Molodozhentsev KEK for RCS-MR group meeting November 29, 2005.
Memory Addressing in Linux  Logical Address machine language instruction location  Linear address (virtual address) a single 32 but unsigned integer.
NSTXpool Computer Upgrade WP #1685 Bill Davis December 9, 2010.
Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
History of Microprocessor MPIntroductionData BusAddress Bus
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Hp education services education.hp.com 67 CPU Management Version B.02 H4262S Module 5 Slides.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
4 Dec 2006 Testing the machine (X7DBE-X) with 6 D-RORCs 1 Evaluation of the LDC Computing Platform for Point 2 SuperMicro X7DBE-X Andrey Shevel CERN PH-AID.
Lecture – Performance Performance management on UNIX.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
FIRST COURSE Essential Computer Concepts. XP New Perspectives on Microsoft Office 2007: Windows XP Edition2 What Is a Computer? A computer is an electronic.
Workstations CPU David Josué Morales José Luis Micheri Marie Muñoz The Dinosaurs Electrical and Computer Engineering Department August 25, 2004.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
KOLKATA Grid Site Name :- IN-DAE-VECC-02Monalisa Name:- Kolkata-Cream VO :- ALICECity:- KOLKATACountry :- INDIA Shown many data transfers.
Morgan Kaufmann Publishers
Solution to help customers and partners accelerate their data.
Optimizing I/O Performance for ESD Analysis Misha Zynovyev, GSI (Darmstadt) ALICE Offline Week, October 28, 2009.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Baum, Boyett, & Garrison Comparing Intel C++ and Microsoft Visual C++ Compilers Michael Baum David Boyett Holly Garrison.
Lecture on Central Process Unit (CPU)
HS06 on last generation of HEP worker nodes Berkeley, Hepix Fall ‘09 INFN - Padova michele.michelotto at pd.infn.it.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
HS06 performance per watt and transition to SL6 Michele Michelotto – INFN Padova 1.
Contact Sambit Samal (sambits) for additional information on Benchmarks.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
New CPU, new arch, KVM and commercial cloud Michele Michelotto 1.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Benchmarking of CPU models for HEP application
Brief introduction about “Grid at LNS”
Itanium® 2 Processor Architecture
Understanding and Improving Server Performance
NFV Compute Acceleration APIs and Evaluation
CCR Autunno 2008 Gruppo Server
Gruppo Server CCR michele.michelotto at pd.infn.it
Architectural Design Review
Linux 202 Training Module Program and Process.
Comparing dual- and quad-core performance
CERN Benchmarking Cluster
Practical Rootkit Detection with RAI
Morgan Kaufmann Publishers Computer Performance
Presentation transcript:

Processors with Hyper-Threading and AliRoot performance Jiří Chudoba FZÚ, Prague

Motivation How to choose the optimal hardware Contributions are counted in SPECInts But how to measure it? CPU2000 tests – 150 USD Many results are published, but: The hardware is often not identical with our machines Results depend on OS, compiler, …

HP ProLiant DL360 G3 2xXeon 2.8 GhZ HT, cache 512 KB, 2x18.2 GB Ultra320 Hot Pluggable Drives SPECInt2000 results: ProLiant DL560, 2.8 GHz, Intel Xeon MP (2MB L3 cache), HT disabled in BIOS SPECint (SPECint_base ) ProLiant DL360 G3, 3.06GHz, Intel Xeon), 512KB L2, 1MB L3, no HT SPECint2000: 1258 Dell PowerEdge 2650, 2.8 GHz Xeon, 512KB L2 cache, HT disabled SPECint2000: 907 Intel D875PBZ motherboard (2.80C GHz PIV, HT maybe on – default status) SPECint2000: 1204 Intel D875PBZ motherboard (AA-301) (2.8E GHz, 1MB cache, HT maybe on) SPECint2000: 1269

5

6 10:49pm up 3 days, 4:03, 1 user, load average: 0.00, 0.00, processes: 30 sleeping, 1 running, 0 zombie, 0 stopped CPU0 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: K av, K used, K free, 0K shrd, K buff Swap: K av, 0K used, K free K cached Duplication of the architectural state on each processor, while sharing one set of processor execution resources Details on 2 logical processors

Not Doubled Performance Note that a CPU that supports hyper-threading is not going to provide comparable performance with two physical processors rated at the same speed. The simple reason for this is because the two logical processors that make up your hyper-threaded CPU have to share resources, namely the execution engine, cache, and access to the system bus. Intel promises 10-30% performance increase...

Hyper-Threading – not always better … but it not always the case:

Other tests Unix Benchmark Utility v.0.3 Author: Sergei Viznyuk noHTHT gcc 2.96gcc 3.2gcc 2.96gcc 3.2 CPU test (1.26) Klaus Schossmaier reported (numbers per CPU): Opteron 1.4 GHz GHz Xeon 2.4 GHz Itanium 1.0 GHz 66714

Results for AliRoot HT with scheduling HTnoHT s s s 2+2 jobs ± 3 s ± 5 s ± 10 s 4 jobs, parallel ± 2 s ± 48 s ±1 s 2 jobs, parallel CERN RH 7.3.3, kernel , AliRoot v , 1000 tracks HIJINGParam, Real time ftp://ftp.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/ftp://ftp.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/ + CPU0 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU0 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU1 states: 100.0% user, 0.0% system, 0.0% nice, 0.0% idle CPU2 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU3 states: 0.0% user, 0.1% system, 0.0% nice, 99.0% idle

Conclusions CPU resource estimates are probably very rough HT can add 15% in performance but in some cases in Real Time Publicly available results of some our standard CPU test would help (update of Root benchmark tests ?)

Root benchmark stress results: Root , gcc 3.2, -O 4 parallel jobs, 9000 events, HT: parallel jobs, 9000 events, noHT: parallel jobs, 9000 events, HT: 733 Opteron 1.4 GHz 1 MB cache, 8 GB RAM Opteron 1.8 GHz 1 MB cache, 8 GB RAM Itanium2 1.0 GHz 3 MB cache, 2 GB RAM P4 Xeon 3.06 GHz 512 KB cache, 2 GB RAM 750 rootmarks g (-O2) 950 rootmarks g (-O2) 497 rootmarks g (-O2) 750 rootmarks g (-O2) 550 rootmarks 32-bit binary compiled on P4 with g (-O2) 1020 rootmarks ecc 7.1 (-O) Klaus Schossmaier