Download presentation
Presentation is loading. Please wait.
1
Benchmarking of CPU models for HEP application
Commissione Calcolo e Reti 12 Dicembre 2006
2
Specint2000 Since about year 2000 experiments express their computing need in terms of Specint 2000 T1 and T2 agreed to provide huge quantities of SI2K with a defined but sliding profile Since 2006 SPEC.org is phasing out CPU2000 and pushing CPU2006 CCR 12/13 Dec 2006 m.michelotto
3
SpecInt_Rate Spec has a dedicated benchmark to assess the perfomance in a multi processes environment: SI_Rate Multiple SPECint jobs show better scaling than SPECintRate Probabily because SpecInt_Rate jobs are running in-sync and competing for some common resources CCR 12/13 Dec 2006 m.michelotto
4
Which should I use? One can safely take the Specint of one machine running in one core and then multiply by the number of cores CCR 12/13 Dec 2006 m.michelotto
5
CPU trends AMD AMD has done a good job with Opteron single core 25x
Same performance of Intel Xeon with lower clock and power consumption Even better with the first integrated dual core performance almost doubled at some power consumption Now is introducing the 22xx “socket F” with integrated DDR2 But is still on 90nm (on the Opteron) CCR 12/13 Dec 2006 m.michelotto
6
CPU trends Intel Intel changed their minds
no more deep pipeline, no more GHz race, no HT yes dual core, multicore The first attempt of dual core (two single core side by side, 90nm, 110Watt) reduced the gap from AMD The Woodcrest is a real dual core on the same chip, 65 nm, 65 Watt up to 3GHz But the FBDIMM are power hungry. Intel could leave the FBDIMM to MP line CCR 12/13 Dec 2006 m.michelotto
7
Future Intel Clovertown is a 4-core with 2 dual core die side by side
Not a real 4-core but from the user point of view it is functionally a 4-core the E5310, E5320 and E5345, clocked at 1.60GHz, 1.86GHz and 2.33GHz, respectively, and the 2.66GHz X5355. The first three CPUs run over a 1066MHz frontside bus, while the X5355 has a 1333MHz FSB. All four contain 8MB of L2 cache, split between two core pairs that make up each processor A real 4-core will follow in Q3/07 on 45nm with bigger caches CCR 12/13 Dec 2006 m.michelotto
8
Intel vs AMD 4core CCR 12/13 Dec 2006 m.michelotto
9
AMD Roadmap October 2006 2H06 1H07 2H07 2008 “Barcelona"
Quad-Core, 4x 512KB L2, 2MB L3, RDDR2, HT1, RAS, AMD-V, 65nm, Socket F (1207) October 2006 2H06 1H07 AMD Opteron Single & Dual-Core RDDR, 1MB L2/core, SSE3, HT1, Socket 940 AMD Opteron™ AMD Opteron™ with RDDR2 Dual-Core, 2x 1MB L2, HT1,RAS, AMD-V, Socket F (1207) AMD Opteron with RDDR2 Dual-Core, 2x 1MB L2,HT1, RAS, AMD-V, Quad-Core, 4x 512KB L2, 2MB L3, RDDR2, HT1, RAS, AMD-V, 65nm, Socket F (1207) 2H07 2008 Same Socket, Pkg. and Platforms Option for Board Enhancements 8000/ 800 Series Barcelona (Split Plane) The transition from Dual-Core AMD Opteron™ processors w/DDR2 to Quad-Core will provide a seamless upgrade path using the same socket, pkg. and platforms To allow customers maximum flexibility Spilt Plane will also be enabled (Board Support Req). Split plane allows for voltage splitting of the cores and N. Bridge resulting in a faster N. Bridge yielding performance benefits AMD-V is highlighted as new because it has been enhanced with Nested Page Tables Budapest (2H07) SBI-TSI (Also known as SBI-Lite) will be offered and provides a thermal mgmt interface to provide improved thermal mgmt capabilities One x16 link of HT1 or HT3 Note –While the CPU will be enabled with HT1 or HT3, we do not anticipate chipsets enabled with HT3 support until Q108 (See also Chipset slides) Core Enhancements Primary benefits for split plane are: Allows for independent voltage control between N. Bridge and power plane/supply yielding a higher performance of up to 6-15% for commercial workloads. The second advantage is better power management. Allows the cores to make lower voltage transitions than could otherwise be made when the N. Bridge is on the same power plane. As board assessments are done this is an excellent opportunity to reevaluate: that the memory routing aligns with the latest memory guidelines as specified in AMD technical specifications. dual HT Links for 2P - Opportunity for an additional performance gain of approx. 5% on some workloads. 2000/ 200 Series Available Today Future Release and 65nm Green Font = Existing features Red Font = New features CCR 12/13 Dec 2006 m.michelotto
10
Some Specint number CCR 12/13 Dec 2006 m.michelotto
11
Measuring specint? The Specint base is measured with a standard set of compiler switches But you can choose the operating system (Windows/Linux) and compiler The Specint peak is completely open If you measure with your own usual set of switches you can get different results CCR 12/13 Dec 2006 m.michelotto
12
BNL vs SPEC BNL got results very close to SPEC on AMD but much lower for Intel CCR 12/13 Dec 2006 m.michelotto
13
CERN CERN used the compiler switchs used my most LHC experiment and gcc compiler Scientific Linux CERN bit (gcc 3.4.5) SPECint 2000 v1.3 Default sources Optimization flags: -O2 -pthread -fPIC And a few others for portability Run one benchmark per core SPECint value is sum of results for each core CCR 12/13 Dec 2006 m.michelotto
14
Compiler is important From CERN gcc –o2 to icc –fast & pgo
60% improvement in performance CCR 12/13 Dec 2006 m.michelotto
15
from H.Meinhard @ hepix Prestonia Irwindale Presler Woodcrest Italy
2.4 2.8 3.0 3.4 3.8 2.0 3.2 2.2 2.6 0.8 Prestonia Irwindale Presler Woodcrest Italy Paxville Nocona Troy Dempsey Coppermine 2000 2002 2004 2005 2006 CCR 12/13 Dec 2006 m.michelotto
16
Power Consumption: 2.4 2.8 3.0 3.4 3.8 2.0 2.2 3.2 Prestonia Nocona
Irwindale Troy Italy Presler Woodcrest CCR 12/13 Dec 2006 m.michelotto Systematic error ≈ %
17
SPECint per VA 2.4 Prestonia 2.8 Nocona 3.0 Irwindale 3.4 3.8 Troy 2.0
2.2 3.2 Prestonia Nocona Irwindale Troy Italy Presler Woodcrest CCR 12/13 Dec 2006 m.michelotto
18
CERN vs SPEC CCR 12/13 Dec 2006 m.michelotto
19
How should I read ‘em? Specint measured with gcc and low optimization are about half of Specint pubblished This is not a problem IF experiments request already take count of this problem and IF all CPU are discounted in the same way CERN plans to use this SPECint as metric for tender But I see a differences from 45% to 72% It looks like Specint over a bit extimate the Woodcrest Let’s see what happens on real code CCR 12/13 Dec 2006 m.michelotto
20
Test made in padova Intel Woodcrest 3.0 GHz, 3GB on loan from E4
thanks to E4 and intel Italy AMD GHz installed at df.unipi.it thanks to Maurizio Davini from Pisa Pre-release stepping! Performance may improve AMD Ghz (2 and 8 GB) in Padova AMD GHz 4GB in Padova CCR 12/13 Dec 2006 m.michelotto
21
Programs used 32/64 and 64/64 32/64 only Stress test from Rootmarks
100 SUSY events from PYTHIA 32/64 only 300 single p events using OSCAR 100 CMS hcal events DIGIS 100 CMS hcal events DST CCR 12/13 Dec 2006 m.michelotto
22
Root Stress From +30% to +40% running at 64bit
If you divide by clock, performance very close but Woodcrest is clearly leading If you divide by Specint this is not true. The Woodcrest SPECint give less “real power” than AMD one? CCR 12/13 Dec 2006 m.michelotto
23
Pythia From +16% to +24% running at 64bit
If you divide by clock performance very very similar but Woodcrest still leading If you divide by Spec CPU int this is not true. You get much more 50% Pythia events from AMD Specint than Woodcrest Specint CCR 12/13 Dec 2006 m.michelotto
24
CMS Oscar + Orca Runs only a 32/64 (same as 32/32 to few percent)
No diff on same machine from only 2G to 8G If you divide by clock performance rather close with Woodcrest still leading If you divide by Specint this is not true. You get much more CMS SW events from AMD Specint than Woodcrest Specint CCR 12/13 Dec 2006 m.michelotto
25
Atlas (A.De Salvo) Woodcrest 5160 with local disk compared to AMD in Rome running from NFS Woodcrest clearly leading in terms of evt/sec CCR 12/13 Dec 2006 m.michelotto
26
Atlas (A.De Salvo) But if you divide by clock is very close to AMD and much faster than the old Xeon An AMD Specint produce double events than a Woodcrest specint. Woodcrest and Xeon produce exactly the same event per Specint CCR 12/13 Dec 2006 m.michelotto
27
To do Make more test on the AMD Socket F
Make same test on dual Clovertown (8 core) machine Try CMS sw at 64 bit when available Looking for a self contained Atlas, Alice and Lhcb suite like the one I have for CMS Build a HEPmark suite (IHEPCCC + Hepix working group) Take in account also the Performance/Watt CCR 12/13 Dec 2006 m.michelotto
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.