Download presentation
Presentation is loading. Please wait.
1
Computer Design – Introduction 1 MAMAS – Computer Architecture 234267 Dr. Lihu Rappoport Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh
2
Computer Design – Introduction 2 General Course Information u Grade 20% Exercise (mandatory) 80% Final exam u Textbooks Computer Architecture a Quantitative Approach: Hennessy & Patterson u Other course information Course web site: http://webcourse.cs.technion.ac.il/234267 Foils will be on the web several days before the class
3
Computer Design – Introduction 3 Class Focus u CPU Introduction: performance, instruction set (RISC vs. CISC) Pipeline, hazards Branch prediction Out-of-order execution u Memory Hierarchy Cache Main memory Virtual Memory u Advanced Topics u PC Architecture Motherboard & chipset, DRAM, I/O, Disk, peripherals
4
Computer Design – Introduction 4 Computer System Structure CPU PCI North Bridge DDRII Channel 1 mouse LAN Lan Adap Graphic Adapt Mem BUS CPU BUS Cache Sound Card speakers South Bridge PCI express ×16 IDE controller IO Controller DVD Drive Hard Disk Parallel Port Serial Port Floppy Drive keybrd DDRII Channel 2 USB controller SATA controller PCI express ×1
5
Computer Design – Introduction 5 Architecture & Microarchitecture u Architecture The processor features seen by the “user” Instruction set, addressing modes, data width, … u Micro-architecture The way of implementation of a processor Caches size and structure, number of execution units, … Timing is considered uArch (though it is user visible) u Processors with different uArch can support the same Architecture
6
Computer Design – Introduction 6 Compatibility u Backward compatibility New hardware can run existing software Core2 Duo can run SW written for Pentium 4, Pentium M, Pentium III, Pentium II, Pentium , 486, 386, 268 u Forward compatibility New software can run on existing hardware Example: new software written with SSE2TM runs on older processor which does not support SSE2TM Commonly supports one or two generations behind u Architecture independent SW JIT – just in time compiler: Java and.NET Binary translation
7
Computer Design – Introduction 7 Performance
8
8 Technology Trends and Performance u Computing capacity:4× per 3 years If we could keep all the transistors busy all the time Actual: 3.3× per 3 years u Moore’s Law: Performance is doubled every ~18 months Trend is slowing: process scaling declines, power is up 2× in 3 years 1.1× in 3 years CPU speed and Memory speed grow apart 2× in 3 years 4× in 3 years
9
Computer Design – Introduction 9 Moore’s Law Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm
10
Computer Design – Introduction 10 CPI – Cycles Per Instruction u CPUs work according to a clock signal Clock cycle is measured in nsec (10 -9 of a second) Clock frequency (= 1/clock cycle) measured in GHz (10 9 cyc/sec) u Instruction Count (IC) Total number of instructions executed in the program u CPI – Cycles Per Instruction Average #cycles per Instruction (in a given program) IPC (= 1/CPI) : Instructions per cycles CPI = #cycles required to execute the program IC
11
Computer Design – Introduction 11 CPU Time u CPU Time - time required to execute a program CPU Time = IC CPI clock cycle u Our goal: minimize CPU Time Minimize clock cycle: more GHz (process, circuit, uArch) Minimize CPI: uArch (e.g.: more execution units) Minimize IC: architecture (e.g.: SSE TM )
12
Computer Design – Introduction 12 Speedup overall = ExTime old ExTime new = 1 Speedup enhanced Fraction enhanced (1 - Fraction enhanced ) + Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: Amdahl’s Law ExTime new = ExTime old × Speedup enhanced Fraction enhanced (1 – Fraction enhanced ) +
13
Computer Design – Introduction 13 Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP Speedup overall = 1 0.95 =1.053 ExTime new = ExTime old × (0.9 + 0.1 / 2) = 0.95 × ExTime old Corollary: Make The Common Case Fast Amdahl’s Law: Example
14
Computer Design – Introduction 14 Calculating the CPI of a Program u ICi: #times instruction of type i is executed in the program u IC: #instruction executed in the program: u Fi: relative frequency of instruction of type i : Fi = ICi/IC u CPI i – #cycles to execute instruction of type i e.g.: CPI add = 1, CPI mul = 3 u #cycles required to execute the program: u CPI:
15
Computer Design – Introduction 15 Comparing Performance u Peak Performance MIPS, MFLOPS Often not useful: unachievable / unsustainable in practice u Benchmarks Real applications, or representative parts of real apps Targeted at the specific system usages u SPEC INT – integer applications Data compression, C complier, Perl interpreter, database system, chess-playing, Text-processing, … u SPEC FP – floating point applications Mostly important scientific applications u TPC Benchmarks Measure transaction-processing throughput
16
Computer Design – Introduction 16 The ISA is what the user / compiler see The HW implements the ISA instruction set software hardware Instruction Set Design
17
Computer Design – Introduction 17 ISA Considerations u Code size Long instructions take more time to fetch Longer instructions require a larger memory Important in small devices, e.g., cell phones u Number of instructions (IC) Reducing IC reduce execution time At a given CPI and frequency u Code “simplicity” Simple HW implementation Higher frequency and lower power Code optimization can better be applied to “simple code”
18
Computer Design – Introduction 18 Architectural Consideration Example u Displacement Address Size 1% of addresses > 16-bits 12 - 16 bits of displacement needed 0% 10% 20% 30% 0 12 3456789 10 11 12 131415 Address Bits Int. Avg. FP Avg.
19
Computer Design – Introduction 19 CISC Processors u CISC - Complex Instruction Set Computer The idea: a high level machine language Example: x86 u Characteristic Many instruction types, with a many addressing modes Some of the instructions are complex Execute complex tasks Require many cycles ALU operations directly on memory Only a few registers, in many cases not orthogonal Variable length instructions common instructions get short codes save code length
20
Computer Design – Introduction 20 Rankinstruction% of total executed 1load22% 2conditional branch20% 3compare16% 4store12% 5add8% 6and6% 7sub5% 8move register-register4% 9call1% 10return1% Total96% Simple instructions dominate instruction frequency Top 10 x86 Instructions
21
Computer Design – Introduction 21 CISC Drawbacks u Complex instructions and complex addressing modes complicates the processor slows down the simple, common instructions contradicts Make The Common Case Fast u Compilers don’t use complex instructions / indexing methods u Variable length instructions are real pain in the neck Difficult to decode few instructions in parallel As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts An instruction may be over more than a single cache line An instruction may be over more than a single page
22
Computer Design – Introduction 22 RISC Processors u RISC - Reduced Instruction Set Computer The idea: simple instructions enable fast hardware u Characteristic A small instruction set, with only a few instructions formats Simple instructions execute simple tasks Most of them require a single cycle (with pipeline) A few indexing methods ALU operations on registers only Memory is accessed using Load and Store instructions only Many orthogonal registers Three address machine: Add dst, src1, src2 Fixed length instructions u Examples: MIPS TM, Sparc TM, Alpha TM, Power TM
23
Computer Design – Introduction 23 RISC Processors (Cont.) u Simple architecture Simple micro-architecture Simple, small and fast control logic Simpler to design and validate Room for large on die caches Shorten time-to-market u Using a smart compiler Better pipeline usage Better register allocation u Existing RISC processor are not “pure” RISC e.g., support division which takes many cycles
24
Computer Design – Introduction 24 Compilers and ISA u Ease of compilation Orthogonality: no special registers few special cases all operand modes available with any data type or instruction type Regularity: no overloading for the meanings of instruction fields streamlined resource needs easily determined u Register Assignment is critical too Easier if lots of registers
25
Computer Design – Introduction 25 CISC Is Dominant u The x86 architecture, which is a CISC architecture, dominates the processor market A vast amount of existing software Intel, AMD, Microsoft and others benefit from this Intel and AMD put a lot of money to make high performance x86 processors, despite the architectural disadvantage Current x86 processor give the best cost/performance CISC processors use arch ideas from the RISC world Starting at Pentium II and K6 , x86 processors translate CISC instructions into RISC-like operations internally the inside core looks much like that of a RISC processor
26
Computer Design – Introduction 26 Software Specific Extensions u Extend arch to accelerate exec of specific apps u Example: SSE TM – Streaming SIMD Extensions 128-bit packed (vector) / scalar single precision FP (4×32) Introduced on Pentium® III on ’99 8 new 128 bit registers (XMM0 – XMM7) Accelerates graphics, video, scientific calculations, … u Packed:Scalar: x0x1x2x3 y0y1y2y3 x0+y0x1+y1x2+y2 x3+y3 + 128-bits x0x1x2x3 y0y1y2y3 x0+y0y1y2 y3 + 128-bits
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.