Advanced Topic: Alternative Architectures Chapter 9 Objectives Learn the properties that often distinguish RISC from CISC architectures. Understand how multiprocessor architectures are classified. Appreciate the factors that create complexity in multiprocessor systems. Become familiar with the ways in which some architectures transcend the traditional von Neumann paradigm. 1
Introduction We have so far studied only the simplest models of computer systems; classical single-processor von Neumann systems. This chapter presents a number of different approaches to computer organization and architecture. Some of these approaches are in place in today’s commercial systems. Others may form the basis for the computers of the future. 2 2
Typical Architecture Already Explored ALU carries out the logic operations (comparisons) and arithmetic operations (adding, etc). Each memory location has a unique address. Each location can store a word Control Unit monitors and control the execution of all instructions and the transfer of all information CU extracts the instruction from memory, decodes the instructions, make sure data is in the right place at the right time, tell the ALU which registers to use, services interrupts, turn on the correct circuitry in the ALU for operation execution Dr. Clincy Lecture
ARCHITECTURE APPROACHES Complex Instruction Set Computer (CISC) Large number of instructions, of variable length Instructions have complex layouts A single instruction can be complex and perform multiple operations ISSUE: a small subset of CISC instructions slows the system down EXAMPLE: Intel x86 Architectures Reduced Instruction Set Computer (RISC) Because of CISC’s issue, designers return to less complicated architecture and hardwired a small, but complete instruction set Hardwired instructions are much faster Compiler is responsible for producing efficient code for Instruction Set Architecture (ISA) Simple instructions that execute faster Each instruction only performs one operation All instructions are the SAME size EXAMPLE: Pentium family and MIPS family of CPUs (Microprocessor without Interlocked Pipeline Stages)
ENGLISH Today is your birthday 19 symbols CHINESE 7 symbols RISC vs CISC Analogy ENGLISH Today is your birthday 19 symbols CHINESE 今天是你的生日 7 symbols
RISC Vs CISC Machines The underlying philosophy of RISC machines is that a system is better able to manage program execution when the program consists of only a few different instructions that are the same length and require the same number of clock cycles to decode and execute. RISC systems access memory only with explicit load and store instructions. In CISC systems, many different kinds of instructions access memory, making instruction length variable and fetch-decode-execute time unpredictable. 6 6
RISC Vs CISC Machines The difference between CISC and RISC becomes evident through the basic computer performance equation: RISC systems shorten execution time by reducing the clock cycles per instruction. CISC systems improve performance by reducing the number of instructions per program. 7 7
RISC Vs CISC Machines The simple instruction set of RISC machines enables control units to be hardwired for maximum speed. The more complex and variable length instruction set of CISC machines requires microcode-based control units that interpret instructions as they are fetched from memory. This translation takes time. With fixed-length instructions, RISC lends itself to pipelining and speculative execution (more predictive). 8 8
RISC Vs CISC Machines Consider the program fragments: The total clock cycles for the CISC version might be: (2 movs 1 cycle) + (1 mul 30 cycles) = 32 cycles While the clock cycles for the RISC version is: (3 movs 1 cycle) + (5 adds 1 cycle) + (5 loops 1 cycle) = 13 cycles With RISC clock cycle being shorter, RISC gives us much faster execution speeds. mov ax, 0 mov bx, 10 mov cx, 5 Begin add ax, bx loop Begin mov ax, 10 mov bx, 5 mul bx, ax RISC CISC 9 9
RISC Machines By reducing instruction complexity, simpler chips are needed – as a result, transistors heavily used by CISC can be used in more innovative ways like: pipelines, cache and registers CISC procedure calls and parameter passing involves considerable effort and resources Involves saving a return address Preserving register values Passing parameters by pushing on a stack or using registers Branching to the subroutine Executing the subroutine Once it returns to the calling program after the subroutine execution, parameter value modifications are saved Previous register values are restored Freeing up registers from the above tasks provide RISC machines hundreds of registers to create register environments Because of their load-store ISAs, RISC architectures require a large number of CPU registers. These register provide fast access to data during sequential program execution. They can also be employed to reduce the overhead typically caused by passing parameters to subprograms. Instead of pulling parameters off of a stack, the subprogram is directed to use a subset of registers. 10 10
RISC Registers Instead of pulling parameters off of a stack, the subprogram is directed to use a subset of registers. Each subset of registers is called a register window set By having all of the registers divided in various sets, when a program is executing in a particular environment, only one certain register set is being used or visible If the program changes to a different environment (ie. Procedure is called), then a different set of registers is used or visible This approach is faster and much less complex than “reusing the same registers” Regarding parameter passing using this register window set approach, the windows must overlap. The window set overlapping is accomplished by dividing the register window set into distinct partitions, which include Global Registers: common to all windows Local Registers: local to the current window Input Registers: overlaps with the preceding window’s output registers Output Registers: overlaps with the next window’s input registers 11 11
RISC Registers Typical real RISC architectures include 16 register sets (or windows) of 32 registers each. The CPU is restricted to operating in only one single window at a time. So, for any snap shot of time, only 32 registers are available. This is how registers can be overlapped in a RISC system. When procedure one calls procedure two, any parameters needing to be passed are placed into the output register set of procedure one Once procedure two begins to execute, the output registers of P1 becomes the input registers for P2 The current window pointer (CWP) points to the active register window. 8 Global Registers (R0-R7), 8 Input Registers (R8-R15), 8 Local Registers (R16-R23) and 8 Output Registers (R24-R31) 12 12
RISC Vs CISC RISC CISC Multiple register sets. Three operands per instruction. Parameter passing through register windows. Single-cycle instructions. Hardwired control. Highly pipelined. CISC Single register set. One or two register operands per instruction. Parameter passing through memory. Multiple cycle instructions. Microprogrammed control. Less pipelined. Continued.... 13 13
RISC Vs CISC CISC RISC Many complex instructions. Variable length instructions. Complexity in microcode. Many instructions can access memory. Many addressing modes. RISC Simple instructions, few in number. Fixed length instructions. Complexity in compiler. Only LOAD/STORE instructions access memory. Few addressing modes. 14 14
RISC Vs CISC Machines Today It is becoming increasingly difficult to distinguish RISC architectures from CISC architectures today. Some RISC systems provide more extravagant instruction sets than some CISC systems. Some systems combine both approaches. With the rise of embedded systems and mobile computing, the terms RISC and CISC have lost their significance The RISC Vs CISC debate started when chip-area and processor design complexity were issues – now energy and power are the issues. The two top competitors today are ARM and Intel – Intel focus on performance and ARM focus on efficiency 15 15
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of these, despite having some limitations. Flynn’s Taxonomy takes into consideration the number of processors and the number of data streams that flow into the processor. A machine can have one or many processors that operate on one or many data streams. For Flynn’s Taxonomy, the architecture is driven or influenced by the instructions characteristics – all processor activities are determined by a sequence of program code – program code act on the data. Data driven or Dataflow type architectures sequence of processor events are based on the data characteristics and not the instructions’ characteristics 16 16
Flynn’s Taxonomy 17 17
Flynn’s Taxonomy The four combinations of multiple processors and multiple data paths are described by Flynn as: SISD: Single instruction stream, single data stream. These are classic uniprocessor systems. SIMD: Single instruction stream, multiple data streams. Executes a single instruction using multiple computations at the same time (or in parallel) – makes use of data-level parallelism. All processors execute the same instruction simultaneously. MIMD: Multiple instruction streams, multiple data streams. These are today’s parallel architectures. Multiple processors function independently and asynchronously in executing different instructions on different data MISD: Multiple instruction streams operating on a single data stream. Many processors performing different operations on the same data stream. 18 18
Flynn’s Taxonomy Flynn’s Taxonomy falls short in a number of ways: First, there appears to be no need for MISD machines. Second, parallelism is not homogeneous. This assumption ignores the contribution of specialized processors. Third, it provides no straightforward way to distinguish architectures of the MIMD category. Doesn’t take into consideration how the processors are connected or interface with memory. One idea is to divide these MIMD systems into those that share memory, and those that don’t (distributed memory), as well as whether the interconnections are bus-based or switch-based. 19 19
Flynn’s Taxonomy - MIMD Symmetric multiprocessors (SMP) and massively parallel processors (MPP) are MIMD architectures that differ in how they use memory. SMP systems share the same memory and MPP do not. An easy way to distinguish SMP from MPP is: MPP many processors + distributed memory + communication via network SMP fewer processors + shared memory + communication via memory 20 20
Flynn’s Taxonomy - MIMD Other examples of MIMD architectures are found in distributed computing, where processing takes place collaboratively among networked computers. A network of workstations (NOW) uses otherwise idle systems to solve a problem. A collection of workstations (COW) is a NOW where one workstation coordinates the actions of the others. A dedicated cluster parallel computer (DCPC) is a group of workstations brought together to solve a specific problem. A pile of PCs (POPC) is a cluster of (usually) heterogeneous systems that form a dedicated parallel system. Another name for these approaches is “Cluster Computing” 21 21
Flynn’s Taxonomy – Recent Expansion -SPMD Flynn’s Taxonomy has been expanded to include SPMD (single program, multiple data) architectures. Recall SIMD executes a single instruction using multiple computations at the same time – data parallelism. For SPMD, multiple independent processors execute different tasks of the same program at the same time – task parallelism. Supercomputers use this approach Each SPMD processor has its own data set and program memory. Different nodes can execute different instructions within the same program using instructions similar to: If myNodeNum = 1 do this, else do that With this, different nodes execute different instructions with in the same program Yet another idea missing from Flynn’s is whether the architecture is instruction driven or data driven. 22 22