3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’

Slides:



Advertisements
Similar presentations
Multi-core and tera- scale computing A short overview of benefits and challenges CSC 2007 Andrzej Nowak, CERN
Advertisements

Lecture 6: Multicore Systems
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Beowulf Supercomputer System Lee, Jung won CS843.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Cosc 2150 Current CPUs Intel and AMD processors. Notes The information is current as of Dec 5, 2014, unless otherwise noted. The information for this.
Performance benchmark of LHCb code on state-of-the-art x86 architectures Daniel Hugo Campora Perez, Niko Neufled, Rainer Schwemmer CHEP Okinawa.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
F1031 COMPUTER HARDWARE CLASSES OF COMPUTER. Classes of computer Mainframe Minicomputer Microcomputer Portable is a high-performance computer used for.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
A+ Guide to Hardware: Managing, Maintaining, and Troubleshooting, Sixth Edition Chapter 9, Part 11 Satisfying Customer Needs.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Computer Performance Computer Engineering Department.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Complete CompTIA A+ Guide to PCs, 6e Chapter 2: On the Motherboard © 2014 Pearson IT Certification
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
CIS 501: Comp. Arch. | Prof. Joe Devietti | Xbox1/PS41 CIS 501: Computer Architecture Unit 12: Putting it All Together: The Xbox One/PS4 Game Consoles.
CPU Inside Maria Gabriela Yobal de Anda L#32 9B. CPU Called also the processor Performs the transformation of input into output Executes the instructions.
1 Latest Generations of Multi Core Processors
Chapter 17 Looking “Under the Hood”. 2Practical PC 5 th Edition Chapter 17 Getting Started In this Chapter, you will learn: − How does a computer work.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Academic PowerPoint Computer System – Architecture.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Succeeding with Technology Chapter 2 Hardware Designed to Meet the Need The Digital Revolution Integrated Circuits and Processing Storage Input, Output,
Understanding Parallel Computers Parallel Processing EE 613.
Paula Michelle Valenti Garcia #30 9B. MULTICORE TO CLUSTER Parallel circuits processing, symmetric multiprocessor, or multiprocessor: in the PC has been.
Report about HEPIX Roma April 3-7 SFT Group Meeting May 5, 2006 Ren é Brun CERN
PROCESSOR Ambika | shravani | namrata | saurabh | soumen.
Performance profiling of Experiments’ Geant4 Simulations Geant4 Technical Forum Ryszard Jurga.
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
19. November 2007Bernd Panzer-Steindel, CERN/IT1 CERN Computing Fabric Status LHCC Review, 19 th November 2007.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
Hardware Architecture
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Chapter 17 Looking “Under the Hood”
Intel and AMD processors
CPU Central Processing Unit
Itanium® 2 Processor Architecture
CS111 Computer Programming
Lynn Choi School of Electrical Engineering
Spatial Analysis With Big Data
What happens inside a CPU?
Architecture Background
Unit 2 Computer Systems HND in Computing and Systems Development
Microprocessors Chapter 4.
Technology and Historical Perspective: A peek of the microprocessor Evolution 11/14/2018 cpeg323\Topic1a.ppt.
Chapter 17 Looking “Under the Hood”
CS 286 Computer Organization and Architecture
Presentation transcript:

3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’

3. April 2006Bernd Panzer-Steindel, CERN/IT2 INTEL and AMD roadmaps  INTEL has moved now to 65 nm fabrication  new micro-architecture based on mobile processor development, Merom design (Israel)  Woodcrest (Q3) claims + 80% performance compared with 2.8 GHz while 35% power decrease some focus on SSE improvements (included in the 80%)  AMD will move to 65nm fabrication only next year  focus on virtualization and security integration  need to catch up in the mobile processor area  currently AMD processors are about 25% more power efficient INTEL and AMD offer a wide and large variety of processor types hard to keep track with new code names

3. April 2006Bernd Panzer-Steindel, CERN/IT3 Multi core developments  dual core dual CPU available right now  quad core dual CPU expected in the beginning of 2007  8-core CPU systems are under development, but not expected to come into market before 2009 ( cope with change in programming paradigm, multi-threading, parallel Heterogeneous and dedicated multi-core systems  Cell processor system PowerPC + 8 DSP cores  Vega 2 from Azul Systems 24/48 cores for Java and.Net  CSX600 from ClearSpeed (PCI-X, 96 cores, 25 Gflops, 10W) Rumor : AMD is in negotiations with ClearSpeed to use their processor board  revival of the co-processor !?

3. April 2006Bernd Panzer-Steindel, CERN/IT4 Game machines Microsoft Xbox 360 (available, ~450 CHF) PowerPC based, 3 cores (3.2 GHz each), 2 hardware threads per core 512 MB memory peak performance = ~ 1000 GFLOPS Sony Playstation 3 (Nov 2006) Cell processor, PowerPC + 8 DSP cores 512 MB memory peak performance = ~ 1800 GFLOPS problem for High Energy physics :  Linux on Xbox  Focus is on floating point calculations, graphics manipulation  Limited memory, no upgrades possible INTEL P4 3.0 GHz = ~ 12 GFLOPS ATI X1800XT graphics card = ~ 120 GFLOPS use the GPU as a co-processor, 32 node cluster at Stony Brook CPU for task parallelism GPU for data parallelism compiler exists, quite some code already ported

3. April 2006Bernd Panzer-Steindel, CERN/IT5 Market trends  The market share of AMD + INTEL in the desktop PC, notebook PC and server are is about 98 % (21% + 77%)  On the desktop the relative share is INTEL = 18%, AMD = 82% (this is the inverse ratio of their respective total revenues)  In the notebook area INTEL leads with 63%  The market share in the server market is growing for AMD, 14% currently Largest growth capacity is in the notebook (mobile) market

3. April 2006Bernd Panzer-Steindel, CERN/IT6 revenues in million $ units

3. April 2006Bernd Panzer-Steindel, CERN/IT7 revenue per PC is about 200 $ for the chips ~220 million PCs in 2005 (~65 million notebooks)

3. April 2006Bernd Panzer-Steindel, CERN/IT8 Costs  More sophisticated motherboards (sound, disk controller, cooling, graphic,etc…)  Major point is memory experiments need 1 (CMS, LHCb) or 2 (ALICE, ATLAS) GByte memory per job  multi-core needs corresponding large memory per node, 4 core == 10 Gbyte memory taking into account to run more jobs per node then there are cores to increase CPU efficiency (IO wait-time hiding)  per GByte one has to add 10 W power to the system  CPU costs include all the extras per core (security-, virtualization-, SSE-, power saving-features, etc.) which we partly can’t use. dual CPU – single core node cost = 4 GB memory 32% 2 CPUs 27% motherboard 20% chassis+power 16% hard disk 5% dual CPU – single core node cost = 4 GB memory 32% 2 CPUs 27% motherboard 20% chassis+power 16% hard disk 5%

3. April 2006Bernd Panzer-Steindel, CERN/IT9 Spot market, Mbit DIMMs, cost trends

3. April 2006Bernd Panzer-Steindel, CERN/IT10 Benchmarks Problems  to find ‘your’ configuration under spec.org  ‘gauge’ the specint values with the programs from the experiments  specific programs under steady development  needs environment disentanglement for ‘standalone’ programs Possible solution  run deep low level processor monitoring (chip level, execution units) continuously in the background on large number of nodes  create ‘running’ profiles of the running programs  compare with running the specint suite on the same nodes

3. April 2006Bernd Panzer-Steindel, CERN/IT11 counters exist also on the AMD chips, have not yet looked into that from a talk of Ryszard Jurga, openlab collaboration at CERN

3. April 2006Bernd Panzer-Steindel, CERN/IT12 performance monitoring

3. April 2006Bernd Panzer-Steindel, CERN/IT13 run on different nodes:  CPU architecture  separated by experiments, even only one job per node  mixture of experiments run on different nodes:  CPU architecture  separated by experiments, even only one job per node  mixture of experiments

3. April 2006Bernd Panzer-Steindel, CERN/IT14 measurements so far show :  floating point operations are at the 10-15% level  branch miss-prediction is at the 0.2% level  L2 cache misses are < 0.1 %  deduce memory access performance (~ 300 MB/s, needs further investigation..)  need 2 cycles per instruction, CPUs can do several instructions per cycle  little ‘parallelism’ in the code measurements so far show :  floating point operations are at the 10-15% level  branch miss-prediction is at the 0.2% level  L2 cache misses are < 0.1 %  deduce memory access performance (~ 300 MB/s, needs further investigation..)  need 2 cycles per instruction, CPUs can do several instructions per cycle  little ‘parallelism’ in the code needs more statistics and analysis……