New PC Architectures - Processors, Chipsets, Performance, Bandwidth 1. PC - Schematic overview 2. Chipset schema (Intel 860 example) 3. AMD Athlon, XEON-P4.

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Premio Desktop and Intel Processor Roadmap for Q3/2003 Premio Desktop and Intel Processor Roadmap for Q4/2003 to Q3/2004 By Calvin Chen Technical Director.
“Understanding Computers” Intro to GIS Fall 2004.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Premio Predator G2 Workstation Training
Intel Core2 GHz Q6700 L2 Cache 8 Mbytes (4MB per pair) L1 Cache: (128 KB Instruction +128KB Data at the core level???) L3 Cache: None? CPU.
Farms/Clusters of the future Large Clusters O(1000), any existing examples ? YesSupercomputing, PC clusters LLNL, Los Alamos, Google No long term experience.
1 Intel vs AMD By Carrie Pipkin: Introduction and History Ramiro Bolanos : Intel and VIA chipsets Dan Hepp: VIA and AMD chipsets, Conclusion.
Evaluation of the 2-way Opteron 1U system Klaus Schossmaier CERN EP-AID Computing Seminar 3 September 2003 Performance test of PCs based on AMD platforms.
Designing Lattice QCD Clusters Supercomputing'04 November 6-12, 2004 Pittsburgh, PA.
1 The System Unit Lecture 2 CSCI 1405 Introduction to Computer Science Fall 2006.
EECC722 - Shaaban #1 lec # 12 Fall Computer System Components SDRAM PC100/PC MHZ bits wide 2-way interleaved ~ 900 MBYTES/SEC.
Advanced Micro Devices - Athlon Buddy Guest Mike Lewitt Bill McCorkle November 28, 2001.
Computers Chapter 4 Inside the Computer © 2005 Prentice-Hall, Inc.Slide 2.
Peter Wegner, DESY CHEP03, 25 March LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig.
Computer Organization and Operating Systems AMD Athlon SMP Presented By: - Vinay Hegde ( ) Aashish Gupta ( )
1. Part 1: Comparative History Generally Intel has been the dominant producer of microprocessor chips AMD has proven to be a fierce competitor Competition.
2-1 Motherboard. 2-2 Section Objectives  Define the purpose of the major components on a motherboard including the CPU, chipset, and expansion slots.
* Definition of -RAM (random access memory) :- -RAM is the place in a computer where the operating system, application programs & data in current use.
Prepared by Careene McCallum-Rodney Hardware specification of a computer system.
 Model: ASUS SABERTOOTH Z77 Intel Series 7 Motherboard – ATX, Socket H2 (LGA115), Intel Z77 Express, 1866MHz DDR3, SATA III (6Gb/s), RAID, 8-CH Audio,
Processor Technology John Gordon, Peter Oliver e-Science Centre, RAL October 2002 All details correct at time of writing 09/10/02.
PC DESY Peter Wegner 1. Motivation, History 2. Myrinet-Communication 4. Cluster Hardware 5. Cluster Software 6. Future …
S3 Computer Literacy Computer Hardware. Overview of Computer Hardware Motherboard CPU RAM Harddisk CD-ROM Floppy Disk Display Card Sound Card LAN Card.
Basic Computer Structure and Knowledge Project Work.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
© Copyright IBM Corporation 2006 Course materials may not be reproduced in whole or in part without the prior written permission of IBM IBM BladeCenter.
GSBUG Hardware Info SIG May 8, GSBUG Hardware Info SIG Agenda – May 8, :00 – 7:05 Administration 7:05 – 8:15 Featured Topic – System RAM.
Know the Computer Multimedia tools. Computer essentials.
Computers: Information Technology in Perspective By Long and Long Copyright 2002 Prentice Hall, Inc. Computers: Information Technology in Perspective.
DELL PowerEdge 6800 performance for MR study Alexander Molodozhentsev KEK for RCS-MR group meeting November 29, 2005.
1 More on Computer Components Computer switches Binary number system Inside the CPU Cache memory Types of RAM Computer buses Creating faster CPUs NEXT.
Computer Hardware Mr. Richard Orr Technology Teacher Bednarcik Jr. High School.
GRAP 3175 Computer Applications for Drafting Unit II Computer Hardware.
PC Desktop Specs  Intel Core 2 Duo Processor E8400 (3GHz, 6M, 1333MHz FSB)  Windows Vista Home Premium OS  2GB, DDR2 Non-ECC SDRAM, 800MHz (2 DIMMS)
1 Inside the Computer Chapter 6 Copyright Prentice-Hall, Inc
Understanding Computers, Ch.31 Chapter 3 The System Unit: Processing and Memory.
NCR RealPOS 7456 Workstation
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
OCIPUG Hardware SIG February 12, OCIPUG Hardware SIG Agenda – February 12, :00 – 7:05 Administration 7:05 – 8:00 Featured Topic – CPUs 8:00.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Computer Hardware PC Components. Motherboard components 1.Ports 2.ISA Slot 3.PCI Slots 4.AGP Slot 5.CPU Slot 6.Chipset 7.Power connector 8.Memory sockets.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
History of Microprocessor MPIntroductionData BusAddress Bus
GSBUG Hardware Info SIG April 11, GSBUG Hardware Info SIG Agenda – April 11, :00 – 7:05 Administration 7:05 – 8:15 Featured Topic – System.
Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.
Computer Architecture Part IV-B: I/O Buses. Chipsets Intelligent bus controller chips found on the motherboard Enable higher speeds on one or more buses.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 30 – PC Architecture.
Computers © 2005 Prentice-Hall, Inc.Slide 1. Computers Chapter 4 Inside the Computer © 2005 Prentice-Hall, Inc.Slide 2.
Josh Ruggiero CSE 420 – April 23 rd  MCH – Memory Controller Hub  Bridges connection from CPU to RAM and Video Bus (AGP/PCI-X)  Connects to South.
W-220 Parts is Parts…. September 8, 2003Riad S. Twal2 Hardware Hard Drive RAM CD-Drive Power Supply.
Created by :Gaurav Shrivastava 1.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Motherboard A motherboard allows all the parts of your computer to receive power and communicate with one another.
SSE and SSE2 Jeremy Johnson Timothy A. Chagnon All images from Intel® 64 and IA-32 Architectures Software Developer's Manuals.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
NITC - kufranjah knowledge station - islam khatatbeh Motherboards How Motherboards Work IT Essentials: PC Hardware and Software v4.0.
Lecture # 10 Processors Microcomputer Processors.
Chapter 4 Inside the Computer
Computer Hardware.
CS111 Computer Programming
Unit 2 Computer Systems HND in Computing and Systems Development
Computers (Hardware and Software)
LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P
Special Instructions for Graphics and Multi-Media
Comparison of Two Processors
EVGA nForce™ 790i Ultra SLI
Intel vs AMD.
Presentation transcript:

New PC Architectures - Processors, Chipsets, Performance, Bandwidth 1. PC - Schematic overview 2. Chipset schema (Intel 860 example) 3. AMD Athlon, XEON-P4 processor architecture 4. Processor performance SSE(2) instructions Prefetch example 5. Bandwidth considerations 6. Network Interfaces 7. Benchmarks

PC - schematic CPU Cache CPU Cache FrontSide Bus CHIPSET Memory MIMI external I/O PCI(32/33, 64/66), SCSI, EIDE, USB, Audio, LAN, etc. internal I/O AGP, etc

800MB/s 64 bit PCI P64H >1GB/s 64 bit PCI 800MB/s P64H Intel 860 Chipset 133 MB/s I ICH2 MCH PCI Slots, (33 MHz, 32bit) 4 USB ports LAN Connection Interface ATA 100 MB/s (dual IDE Channels) 6 channel audio 10/100 Ethernet Intel® Hub Architecture 266 MB/s 3.2 GB/s XeonProcessor Dual Channel RDRAM* AGP4XGraphics 400MHz System Bus XeonProcessor PCI Slots (66 MHz, 64bit) MRH MRH Up to 4 GB of RDRAM

AMD Athlon

Intel P4

Pentium 4 CPU Core

PC Vector instructions Streaming SIMD extension 1, 2 (SSE1, SSE2), Intel XEON (P4) 144 new instructions, a 4x32-Bit (SSE1) or 2x64-bit (SSE2) SIMD integer arithmetic and 4x32-bit (SSE1) single precision or 2x64-bit (SSE2) double precision SIMD floating point instructions. 3DNow!Professional, AMD Athlon 4 71 new instructions, including SSE1 compatible floating point instructions PowerPC Altivect 162 SIMD instructions for max. 32-bit floating point arithmetic Applications : 3D Games (!), Compression (MPEG, JPEG...), Signal processing,...

SSE/SSE2 registers xmm0 xmm2 xmm1 128-bits xmm3 xmm4 xmm5 xmm6 xmm8

SSE/SSE2 instructions SSE instruction example: Packed Add, 64 bit addpd xmm1,xmm == xmm1 xmm2 xmm

SSE/SSE2 instructions SSE instruction example: Shuffle, 32 bit shuffps xmm1, xmm2, 9Ch ( b) xmm1 xmm Ch xmm1

SSE2 prefetch example (M. Lüscher) Using inlining assembler in GCC: #define _prefetch_su3(addr) \ __asm__ __volatile__ ("prefetcht0 %0 \n\t" \ "prefetcht0 %1" \ : \ "m" (*(((char*)((unsigned int)(addr))))), \ "m" (*(((char*)((unsigned int)(addr)))+128)))... su3 *um;.... um=&gauge_field[iy][0]; _prefetch_su3(um);

SSE(2) support Kernel 2.4x or patches for lower versions CompilerGNU gcc (no SSE optimization) GNU binutils 2.11.x Portland Group compiler, -Mvect=sse Libraries Intel Math Kernel Lib (MKL), JPEG Library, etc. ApplicationsGNU inlining assembler (M. Lüscher) GNU C callable NASM assembler (MILC collaboration) C callable libraries (e.g. from Intel)

Double Data Rate (DDR) SDRAM and Rambus Todays (02.Oct.2001) Prices for 1 GByte DDR SDRAM: ca. 200,- Euro, 1 GByte PC800 Rambus memory: ca. 530,- Euro (both 4 x 256 MB modules) DDR SDRAM (free spec): Utilizing both rising and falling edges of the clock 8 Bytes * 2 * clock rate Clock rate Bandwidth GB/s RAMBUS (license): Direct RDRAM, 16-bit data path, 8-bit control bus, also utilizing both rising and falling edges of the clock, 800 MHz * 2 Byte = 1.6 GB/s Dual channel : 3.2 GB/s (XEON, P4)

Rambus, DDR SDRAM roadmaps Rambus: now DDR SDRAM (2002): 400 MHz * 8Byte * 2 Double Data Rate-2 SDRAM chip (DDR-2 RAM) 6.4 GB/s ?? 4.8 GB/s ??

Bandwidth considerations (see Jef Poskanzer Pentium 4 FSB64bits100MHzQDR3.2 GB/s25.6 Gbps InterfaceWidthFrequencyBytes/SecBits/Sec 2-channel PC800 RDRAM 2x16bits400MHz DDR3.2 GB/s25.6 Gbps PC2100 SDRAM64bits133MHz DDR2.1 GB/s17 Gbps EV6 bus (Athlon/Duron FSB) 64bits100MHz DDR1.6 GB/s12.8 Gbps PC1600 SDRAM64bits100MHz DDR1.6 GB/s12.8 Gbps PC800 RDRAM16bits400MHz DDR1.6 GB/s12.8 Gbps PC150 SDRAM64bits150MHz1.3 GB/s10.2 Gbps 133MHz FSB64bits133MHz1.06 GB/s8.5 Gbps AGP 4x32bits266MHz1.06 GB/s8.5 Gbps 100MHz FSB64bits100MHz800 MB/s6.4 Gbps PC100 SDRAM64bits100MHz800 MB/s6.4 Gbps MemoryBusSpec. I/O

Bandwidth considerations (cont.) PC66 SDRAM64bits66MHz533 MB/s4.3 Gbps InterfaceWidthFrequencyBytes/SecBits/Sec fast/wide PCI64bits66MHz533 MB/s4.3 Gbps AGP 2x32bits133MHz533 MB/s4.3 Gbps Ultra-320 SCSI16bits160MHz320 MB/s2.6 Gbps AGP32bits66MHz266 MB/s2.1 Gbps Ultra-160 SCSI16bits80MHz160 MB/s1.3 Gbps PCI32bits33MHz133 MB/s1.06 Gbps ATA/133 disk8bits66MHz DDR133 MB/s1.06 Gbps gigabit ethernetserial1GHz125 MB/s1 Gbps MemoryBusSpec. I/ONetworkDisk

ATA/100 disk8bits50MHz DDR100 MB/s800 Mbps Ultra-2 Wide SCSI 16bits40MHz80 MB/s640 Mbps OC-12 networkserial622 MHz77.7 MB/s Mbps ATA/66 disk8bits33MHz DDR66 MB/s533 Mbps USB-2 serial480 MHz60 MB/s480 Mbps IEEE 1394serial400MHz50 MB/s400 Mbps Ultra Wide SCSI16bits20MHz40 MB/s320 Mbps ATA/33 disk8bits16.6MHz DDR33 MB/s266 Mbps Fast Wide SCSI16bit10MHz20 MB/s160 Mbps OC-3 networkserial155 MHz19.4 MB/s Mbps 100baseT ethernetserial 100MHz12.5 MB/s100 Mbps T-3 networkserial45MHz5.6 MB/s Mbps USBserial12MHz1.5 MB/s12 Mbps 10baseTethernetserial10MHz1.25 MB/s10 Mbps T-1 networkserial 1.5MHz193 KB/s1.544 Mbps Bandwidth considerations (cont.)

Network Interfaces Max. Sustained 10 GBASE, 10 Gbit Ethernet1.25 GB/s? GSN (&Hippi)800 MB/s? Myrinet500 MB/s MB/s SCI500 MB/s (Ring)ca. 200 MB/s Gbit Ethernet125 MB/s MB/s Fast Ethernet12.5 MB/s MB/s

Benchmarks (Lies, Damned Lies, Benchmarks, Roger Shepherd, Peter Thompson, Inmos Techn. Note 27, Jan. 1988) Peak performance (never achieved !) Number of execution units * clock rate e.g. MFLOPS :Million Floating point Operations Per Second MIPS:Million Integer instructions Per Seconds CERN units : 1 CERN unit = 40 MIPS

Examples of standard (synthetic) Benchmarks CPU Cache CPU Cache FrontSide Bus CHIPSET Memory MIMI external I/O PCI(32/33, 64/66), SCSI, EIDE, USB, Audio, LAN, etc. internal I/O AGP, etc Bonnie, Bonnie++ Netperf Spec CPU D Mark2000 ShereMark Stream Linpack

Stream Benchmark for Rambus memory on P4 Streams benchmark, gcc (Don Holmgren, Fermilab, u.a.) gcc FunctionRate (MB/s)RMS timeMin timeMax time Copy: a(i) = b(i) Scale: a(i) = q*b(i) Add: a(i) = b(i) + c(i) Triad: a(i) = b(i) + q*c(i) Portland Compiler Group build (-Mvect=sse) Function Rate (MB/s) RMS time Min timeMax time Copy: Scale: Add: Triad: (Most of this boost comes from pgcc's use of the SSE prefetch instructions; some benefit comes from moving data via the 128-bit wide SSE registers.)

Application Benchmarks, Stream benchmark 32/64-bit Dirac Kernel, LQCD (Martin Lüscher, CERN, DESY): P4, 1.4 GHz, 256 MB Rambus Time per lattice point: micro sec (1503 Mflops [32 bit arithmetic]) micro sec (814 Mflops [64 bit arithmetic]) Amanda Reco: P4 (1.4 GHz) vs. PIII(800 MHz) % improvement Stream Benchmark: P4(1.4 GHz, PC800 Rambus): 1.6 GB/s PIII (800MHz, PC133 SDRAM) : 400 MB/s PIII(400 MHz, PC133 SDRAM) : 340 MB/s

Application benchmarks - MILC benchmark (LQCD), non optimized