Chapter 6 Pipelining & RISCs Dr. Abraham Techniques for speeding up a computer Pipelining Parallel processing.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Computer Organization and Architecture
Chapter 8: Central Processing Unit
Chapter 3 Instruction Set Architecture Advanced Computer Architecture COE 501.
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
OMSE 510: Computing Foundations 4: The CPU!
Computer Organization and Architecture
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Computer Organization and Architecture
Computer Organization and Architecture
Processor Technology and Architecture
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computer Organization and Architecture The CPU Structure.
RISC By Don Nichols. Contents Introduction History Problems with CISC RISC Philosophy Early RISC Modern RISC.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
CISC and RISC L1 Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Chapter 12 CPU Structure and Function. Example Register Organizations.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
Appendix A Pipelining: Basic and Intermediate Concepts
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
Processor Organization and Architecture
Parallelism Processing more than one instruction at a time. Pipelining
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Data Representation By- Mr. S. S. Hire. Data Representation.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
What have mr aldred’s dirty clothes got to do with the cpu
1 Instruction Sets and Beyond Computers, Complexity, and Controversy Brian Blum, Darren Drewry Ben Hocking, Gus Scheidt.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Ramesh.B ELEC 6200 Computer Architecture & Design Fall /29/20081Computer Architecture & Design.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.
1 Computer Architecture Part II-B: CPU Instruction Set.
Principles of Linear Pipelining
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
RISC and CISC. What is CISC? CISC is an acronym for Complex Instruction Set Computer and are chips that are easy to program and which make efficient use.
Processor Architecture
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Chapter One Introduction to Pipelined Processors
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
CISC. What is it?  CISC - Complex Instruction Set Computer  CISC is a design philosophy that:  1) uses microcode instruction sets  2) uses larger.
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
Visit for more Learning Resources
Overview Introduction General Register Organization Stack Organization
CISC (Complex Instruction Set Computer)
Computer Architecture
Central Processing Unit
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Computer Architecture
Computer Architecture
Lecture 4: Instruction Set Design/Pipelining
Pipelining.
Presentation transcript:

Chapter 6 Pipelining & RISCs Dr. Abraham

Techniques for speeding up a computer Pipelining Parallel processing

Serial processing vs. Pipelining Serial processing - the execution of all stages of one process before the first of the next process. One process completely finishes before starting the next one. Pipelining - stages of repeating processes are overlapped

Pipelining like factory assembly line –consists of several processing stations - each performs a different stage in the process. –Each station repeats the same step on different elements in the sequence. –At any given time processes are at different stages of execution in pipeline several processes work one several data to produce a single result

Pipe –consists of all the circuits for individual stages and latches that separate each stage –Flowthrough time - time it takes the pipe to produce its first result –clock-cycle time- time it takes the pipe to produce subsequent results –Unifunction pipe - implements a single function –Multifunction pipe - a variety of function can be carried out

Pipline - 2 categories Arithmetic unit pipeline –most useful for vector operation instruction-unit pipeline –most useful for simplified instruction sets (RISCs)

Arithmetic-unit pipelining –example multiplicand multiplier product

multiplier product – – – – – – – –this can be accomplished by shifting to the left by n bits and inserting zeros on the right

x (page 290) –1 x 2 0 x = –0 x 2 1 x = –0 x 2 2 x = –1 x 2 3 x = –….

decompose The idea behind pipelining is to break the job into repetitive smaller tasks Each task can be worked on by one stage in the process

Ways to decompose operations Pipeline granularity –coarseness of decomposition Pipeline variability – number of ways the control unit of a multifunction pipeline can figure it for different operations.

How fine should the decomposition be? The smaller the stage the more stages flow-through time increases as the number of stages increases the clock rate is the reciprocal of the time it takes for the slowest state to complete its work.

See example on page 293 Two floating point pipelines for reciprocal floating point multiplication is comaped here. The highly pipelined unit on the right produces vector results at a higher rate but takes more time to do a simple multiplication.. Compilers many time are unable to produce code sequence that utilize fast pipelines. Most compilers produce scalar code rather than vector code.

Instruction-unit pipelining executing instruction has several logical parts –determine the address of the instruction –fetch it from memory (or cache) –analyze the op code –for operate instructions, determine the address of the operands, and the result each addressing will take several steps depending on the addressing mode

Instruction-unit pipelining instructions may have to wait for data to be written by previous instructions and pipelining parallelism may not be possible There should not be idle time for any individual segment –storage unit should supply instructions to the instruction unit at the same rate the instruction unit can process them.

Instruction-unit pipelining like taking a break from a assembly line Branching instructions flush the pipeline completely after branching it must refill the pipeline –the time that is lost is called the branch penalty. When a STORE instruction writes to memory, either the pipeline must be flushed first or both the memory and the copy in the pipeline should be update properly.

Scheduling functional units goal is to maximize the rate of instruction issues –an instruction issue consists of reserving a functional using, sending an opcode to it and reserving the result register. –Before control unit can issue an instruction, it must determine that the appropriate functional unit is free and that no data dependencies exist between the current and executing instructions. If both instructions require the same result register, one has to wait.

Pipelined Vector Processors CRAY-1 supercomputer and later T1 ASC, Star-100, and by others NEC SX, Siemans VPx and Fujitsu VPx-EX –pipelined instruction-decoding –multiple pipelined functional units that operate concurrently –asynchronous banks of interleaved memory –independent instruction and data caches –numerous buses to transfer data, address and control signals.

Vector processing hardware for vector processing is not very very expensive At the simplest level the vector instruction may be nothing more than microprograms that execute multiple operations instead of single operation. compilers can detect sequences of code that can take advantage of vector operations –instruction set should include parallel instructions

Two ways of viewing the speedup offered by vector processing –instruction fetching is greatly reduced, therefore the buses are available for data –essentially no overhead for branching. –If the an intermediate product is produced at every clock tick, then new data should be introduced in that slot immediately. Therefore, the data access time should be made faster; ie, process read/write requests at the pipeline rate. Architects use low-order memory interleaving and multiword data paths to meet the high rate of data demand.

Summary vector processing is faster than scalar processing due to: –reduced memory contention from fewer instruction accesses –reduced instruction decoding –predictable behavior which is especially important for implicit indexing and memory accesses as well as implicit branching.

RISCs In the 1980s the VAX instruction sets were so complicated so back to basics movement started - to reduce instruction sets similar to that of CDC 6600

Characteristics of RISCs –instructions are simple, uniform lenth –use one instruction format. –Little or no overlapping of instruction functionality –one addressing mode –only Load and Store instructions to reference memory –all operations are register to register –ISA supports two or three datatypes.

Characteristics contd. –almost all instructions execute in 1 clock cycle –architecture takes advantage of strengths of the software –architecture should have many registers

CISC vs. RISC controversy –Pro CISC Richer instruction sets improve merit of the architecture since instructions implemented in microcode execute faster than implemented in software. Richer instruction set does not cost more over simpler instruction sets Upward compatibility is easier to implement in the microcode of CISC Richer instruction set simplify compiler design Richer instruction set chips are more difficult to duplicate, so protects manufacturers.

Pro RISC –The more basic the hardware (and simpler), the cheaper and faster. This increased speed makes up increased number of instruction to program. –Larger number of bits are required for RISC instructions, use instruction cache –Easier to compile for RISC than CISC –Design effort and cost are less than CISC –It is easier to introduce parallelism into RISC

RISC implementation techniques –pipelining to speed up instruction decoding and execution –RISCs do not allow program self-modification –Use Harvard architecture : separate instruction and data streams –Use large register sets to reduce CPU-memory traffic –Some use independent registers for floating- point operations and results

–Seperate functional units for instruction processing and instruction execution –have delayed branches to avoid branch penalty. loads instruction following the branch before branching (branch-delay slot) if it is a conditional branch, if the test fails the instructions are discarded –Use specialized cache memories to decrease the memory-to-CPU delay. use separate instruction and data caches RISCs

–Use optimizing compilers rearrange the code sequence to take maximum advantage of the CPU –RISCs use overlapping register sets to speed up parameter passage. –Support string operations by loading and storing multiple registers RISCs

MIPS R2000 Motorola Sun SPARC IBM System/6000 Intel i860 HP Spectrum RISC machines