Download presentation
Presentation is loading. Please wait.
Published byClyde Eaton Modified over 9 years ago
2
Chapter 6 Pipelining & RISCs Dr. Abraham
3
Techniques for speeding up a computer Pipelining Parallel processing
4
Serial processing vs. Pipelining Serial processing - the execution of all stages of one process before the first of the next process. One process completely finishes before starting the next one. Pipelining - stages of repeating processes are overlapped
5
Pipelining like factory assembly line –consists of several processing stations - each performs a different stage in the process. –Each station repeats the same step on different elements in the sequence. –At any given time processes are at different stages of execution in pipeline several processes work one several data to produce a single result
6
Pipe –consists of all the circuits for individual stages and latches that separate each stage –Flowthrough time - time it takes the pipe to produce its first result –clock-cycle time- time it takes the pipe to produce subsequent results –Unifunction pipe - implements a single function –Multifunction pipe - a variety of function can be carried out
7
Pipline - 2 categories Arithmetic unit pipeline –most useful for vector operation instruction-unit pipeline –most useful for simplified instruction sets (RISCs)
8
Arithmetic-unit pipelining –example 10111011 multiplicand 1101001 multiplier 10111011 0000000 00000000 10111011 00000000 10111011 100110010110011 product
9
multiplier product – 1 10111011 – 1000 10111011000 – 11010010011 – 100000 1011101100000 – 1110111110011 – 1000000 10111011000000 – 1101001 100110010110011 –this can be accomplished by shifting to the left by n bits and inserting zeros on the right
10
10111011 x 1101001(page 290) –1 x 2 0 x10111011 = –0 x 2 1 x10111011 = –0 x 2 2 x10111011 = –1 x 2 3 x10111011 = –….
11
decompose The idea behind pipelining is to break the job into repetitive smaller tasks Each task can be worked on by one stage in the process
12
Ways to decompose operations Pipeline granularity –coarseness of decomposition Pipeline variability – number of ways the control unit of a multifunction pipeline can figure it for different operations.
13
How fine should the decomposition be? The smaller the stage the more stages flow-through time increases as the number of stages increases the clock rate is the reciprocal of the time it takes for the slowest state to complete its work.
14
See example on page 293 Two floating point pipelines for reciprocal floating point multiplication is comaped here. The highly pipelined unit on the right produces vector results at a higher rate but takes more time to do a simple multiplication.. Compilers many time are unable to produce code sequence that utilize fast pipelines. Most compilers produce scalar code rather than vector code.
15
Instruction-unit pipelining executing instruction has several logical parts –determine the address of the instruction –fetch it from memory (or cache) –analyze the op code –for operate instructions, determine the address of the operands, and the result each addressing will take several steps depending on the addressing mode
16
Instruction-unit pipelining instructions may have to wait for data to be written by previous instructions and pipelining parallelism may not be possible There should not be idle time for any individual segment –storage unit should supply instructions to the instruction unit at the same rate the instruction unit can process them.
17
Instruction-unit pipelining like taking a break from a assembly line Branching instructions flush the pipeline completely after branching it must refill the pipeline –the time that is lost is called the branch penalty. When a STORE instruction writes to memory, either the pipeline must be flushed first or both the memory and the copy in the pipeline should be update properly.
18
Scheduling functional units goal is to maximize the rate of instruction issues –an instruction issue consists of reserving a functional using, sending an opcode to it and reserving the result register. –Before control unit can issue an instruction, it must determine that the appropriate functional unit is free and that no data dependencies exist between the current and executing instructions. If both instructions require the same result register, one has to wait.
19
Pipelined Vector Processors CRAY-1 supercomputer and later T1 ASC, Star-100, and by others NEC SX, Siemans VPx and Fujitsu VPx-EX –pipelined instruction-decoding –multiple pipelined functional units that operate concurrently –asynchronous banks of interleaved memory –independent instruction and data caches –numerous buses to transfer data, address and control signals.
20
Vector processing hardware for vector processing is not very very expensive At the simplest level the vector instruction may be nothing more than microprograms that execute multiple operations instead of single operation. compilers can detect sequences of code that can take advantage of vector operations –instruction set should include parallel instructions
21
Two ways of viewing the speedup offered by vector processing –instruction fetching is greatly reduced, therefore the buses are available for data –essentially no overhead for branching. –If the an intermediate product is produced at every clock tick, then new data should be introduced in that slot immediately. Therefore, the data access time should be made faster; ie, process read/write requests at the pipeline rate. Architects use low-order memory interleaving and multiword data paths to meet the high rate of data demand.
22
Summary vector processing is faster than scalar processing due to: –reduced memory contention from fewer instruction accesses –reduced instruction decoding –predictable behavior which is especially important for implicit indexing and memory accesses as well as implicit branching.
23
RISCs In the 1980s the VAX instruction sets were so complicated so back to basics movement started - to reduce instruction sets similar to that of CDC 6600
24
Characteristics of RISCs –instructions are simple, uniform lenth –use one instruction format. –Little or no overlapping of instruction functionality –one addressing mode –only Load and Store instructions to reference memory –all operations are register to register –ISA supports two or three datatypes.
25
Characteristics contd. –almost all instructions execute in 1 clock cycle –architecture takes advantage of strengths of the software –architecture should have many registers
26
CISC vs. RISC controversy –Pro CISC Richer instruction sets improve merit of the architecture since instructions implemented in microcode execute faster than implemented in software. Richer instruction set does not cost more over simpler instruction sets Upward compatibility is easier to implement in the microcode of CISC Richer instruction set simplify compiler design Richer instruction set chips are more difficult to duplicate, so protects manufacturers.
27
Pro RISC –The more basic the hardware (and simpler), the cheaper and faster. This increased speed makes up increased number of instruction to program. –Larger number of bits are required for RISC instructions, use instruction cache –Easier to compile for RISC than CISC –Design effort and cost are less than CISC –It is easier to introduce parallelism into RISC
28
RISC implementation techniques –pipelining to speed up instruction decoding and execution –RISCs do not allow program self-modification –Use Harvard architecture : separate instruction and data streams –Use large register sets to reduce CPU-memory traffic –Some use independent registers for floating- point operations and results
29
–Seperate functional units for instruction processing and instruction execution –have delayed branches to avoid branch penalty. loads instruction following the branch before branching (branch-delay slot) if it is a conditional branch, if the test fails the instructions are discarded –Use specialized cache memories to decrease the memory-to-CPU delay. use separate instruction and data caches RISCs
30
–Use optimizing compilers rearrange the code sequence to take maximum advantage of the CPU –RISCs use overlapping register sets to speed up parameter passage. –Support string operations by loading and storing multiple registers RISCs
31
MIPS R2000 Motorola 88000 Sun SPARC IBM System/6000 Intel i860 HP Spectrum RISC machines
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.