Instruct Set Architecture Variations

Slides:



Advertisements
Similar presentations
More Intel machine language and one more look at other architectures.
Advertisements

Instruction Set Design
Chapter 8: Central Processing Unit
CPU Review and Programming Models CT101 – Computing Systems.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
RISC / CISC Architecture By: Ramtin Raji Kermani Ramtin Raji Kermani Rayan Arasteh Rayan Arasteh An Introduction to Professor: Mr. Khayami Mr. Khayami.
Chapter 10- Instruction set architectures
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Henry Hexmoor1 Chapter 10- Control units We introduced the basic structure of a control unit, and translated assembly instructions into a binary representation.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
ITEC 352 Lecture 20 JVM Intro. Functions + Assembly Review Questions? Project due today Activation record –How is it used?
What have mr aldred’s dirty clothes got to do with the cpu
December 8, 2003Other ISA's1 Other ISAs Next, we discuss some alternative instruction set designs. – Different ways of specifying memory addresses – Different.
CSC 3210 Computer Organization and Programming Chapter 1 THE COMPUTER D.M. Rasanjalee Himali.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
1 The Instruction Set Architecture September 27 th, 2007 By: Corbin Johnson CS 146.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
Advanced Architectures
Assembly language.
CS161 – Design and Architecture of Computer Systems
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
A Closer Look at Instruction Set Architectures
William Stallings Computer Organization and Architecture 8th Edition
How will execution time grow with SIZE?
Architecture Review Instruction Set Architecture
A Closer Look at Instruction Set Architectures
Computer Organization and Design
Instruction set architectures
Central Processing Unit
CS170 Computer Organization and Architecture I
Today: Control Unit: A bit of review
Instruction set architectures
The all-important ALU The main job of a central processing unit is to “process,” or to perform computations....remember the ALU from way back when? We’ll.
Datapaths For the rest of the semester, we’ll focus on computer architecture: how to assemble the combinational and sequential components we’ve studied.
CENTRAL PROCESSING UNIT
Chapter 8 Central Processing Unit
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
Two questions Four registers isn’t a lot. What if we need more storage? Who exactly decides which registers are read and written and which ALU function.
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
How can we find data in the cache?
Unit 12 CPU Design & Programming
CSC 220: Computer Organization
Instruction set architectures
Multicycle and Microcode
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
A Closer Look at Instruction Set Architectures Chapter 5
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Data manipulation instructions
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
ECE 352 Digital System Fundamentals
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
ECE 352 Digital System Fundamentals
Addressing mode summary
Today: a look to the future
Review: The whole processor
Lecture 4: Instruction Set Design/Pipelining
Presentation transcript:

Instruct Set Architecture Variations Instruction sets can be classified along several lines. Addressing modes let instructions access memory in various ways. How many operands to name? Data manipulation instructions can have from 0 to 3 operands. Where are the operands allowed to be? Only memory addresses Memory addresses or registers registers only, 4/9/2019 Instruction encoding

Addressing mode summary 4/9/2019 Instruction encoding

Number of operands Our example instruction set had three-address instructions, because each one had up to three operands—two sources and one destination. This provides the most flexibility. In a two-address instruction, the first operand serves as both the destination and one of the source registers. ADD R0, R1, R2 operation destination sources operands R0  R1 + R2 Register transfer instruction: ADD R0, R1 operation destination and source 1 source 2 operands R0  R0 + R1 Register transfer instruction: 4/9/2019 Instruction encoding

One-address instructions Some computers, like this old Apple II, have one-address instructions. The CPU has a special register called an accumulator, which implicitly serves as the destination and one of the sources. Here is an example sequence which increments M[R0]: LD (R0) ACC  M[R0] ADD #1 ACC  ACC + 1 ST (R0) M[R0]  ACC ADD R0 operation source ACC  ACC + R0 Register transfer instruction: 4/9/2019 Instruction encoding

The ultimate: zero addresses If the destination and sources are all implicit, then you don’t have to specify any operands at all! This is possible with processors that use a stack architecture. HP calculators and their “reverse Polish notation” use a stack. The Java Virtual Machine is also stack-based. How can you do calculations with a stack? Operands are pushed onto a stack. The most recently pushed element is at the “top” of the stack (TOS). Operations use the topmost stack elements as their operands. Those values are then replaced with the operation’s result. 4/9/2019 Instruction encoding

Stack architecture example From left to right, here are three stack instructions, and what the stack looks like after each example instruction is executed. This sequence of stack operations corresponds to one register transfer instruction: TOS  R1 + R2 PUSH R1 PUSH R2 ADD (Top) (Bottom) 4/9/2019 Instruction encoding

Data movement instructions Finally, the types of operands allowed in data manipulation instructions is another way of characterizing instruction sets. So far, we’ve assumed that ALU operations can have only register and constant operands. Many real instruction sets allow memory-based operands as well. We’ll use the book’s example and illustrate how the following operation can be translated into some different assembly languages. X = (A + B)(C + D) Assume that A, B, C, D and X are really memory addresses. 4/9/2019 Instruction encoding

Register-to-register architectures Our programs so far assume a register-to-register, or load/store, architecture, which matches our datapath from last week nicely. Operands in data manipulation instructions must be registers. Other instructions are needed to move data between memory and the register file. With a register-to-register, three-address instruction set, we might translate X = (A + B)(C + D) into: LD R1, A R1  M[A] // Use direct addressing LD R2, B R2  M[B] ADD R3, R1, R2 R3  R1 + R2 // R3 = M[A] + M[B] LD R1, C R1  M[C] LD R2, D R2  M[D] ADD R1, R1, R2 R1  R1 + R2 // R1 = M[C] + M[D] MUL R1, R1, R3 R1  R1 * R3 // R1 has the result ST X, R1 M[X]  R1 // Store that into M[X] 4/9/2019 Instruction encoding

Memory-to-memory architectures In memory-to-memory architectures, all data manipulation instructions use memory addresses as operands. With a memory-to-memory, three-address instruction set, we might translate X = (A + B)(C + D) into simply: How about with a two-address instruction set? ADD X, A, B M[X]  M[A] + M[B] ADD T, C, D M[T]  M[C] + M[D] // T is temporary storage MUL X, X, T M[X]  M[X] * M[T] MOVE X, A M[X]  M[A] // Copy M[A] to M[X] first ADD X, B M[X]  M[X] + M[B] // Add M[B] MOVE T, C M[T]  M[C] // Copy M[C] to M[T] ADD T, D M[T]  M[T] + M[D] // Add M[D] MUL X, T M[X]  M[X] * M[T] // Multiply 4/9/2019 Instruction encoding

Register-to-memory architectures Finally, register-to-memory architectures let the data manipulation instructions access both registers and memory. With two-address instructions, we might do the following: LD R1, A R1  M[A] // Load M[A] into R1 first ADD R1, B R1  R1 + M[B] // Add M[B] LD R2, C R2  M[C] // Load M[C] into R2 ADD R2, D R2  R2 + M[D] // Add M[D] MUL R1, R2 R1  R1 * R2 // Multiply ST X, R1 M[X]  R1 // Store 4/9/2019 Instruction encoding

Size and speed There are lots of tradeoffs in deciding how many and what kind of operands and addressing modes to support in a processor. These decisions can affect the size of machine language programs. Memory addresses are long compared to register file addresses, so instructions with memory-based operands are typically longer than those with register operands. Permitting more operands also leads to longer instructions. There is also an impact on the speed of the program. Memory accesses are much slower than register accesses. Longer programs require more memory accesses, just for loading the instructions! Most newer processors use register-to-register designs. Reading from registers is faster than reading from RAM. Using register operands also leads to shorter instructions. 4/9/2019 Instruction encoding

Advanced CPU designs The last few weeks were a very fast tour of a simple CPU design. Today we’ll introduce some advanced CPU designs, which are covered in more detail in CS232 and CS333. Multicycle processors support more complex instructions. Pipelined CPUs achieve higher clock rates and higher performance. CPUs with caches make fewer RAM accesses, which improves performance. July 30, 2002 ©2000-2002 Howard Huang

Control unit review Our CPU is a single-cycle machine, since each instruction executes in one clock cycle. 1. An instruction is read from the instruction memory. 2. The instruction decoder generates the matching datapath control signals. 3. Register values are sent to the ALU or the data memory. 4. ALU or RAM outputs are sent back to the register file. 5. The PC is incremented, or reloaded for branches and jumps. ADRS Instruction RAM OUT PC Instruction Decoder DA AA BA MB FS MD WR MW Branch Control V C N Z 4/9/2019 Instruction encoding

Limitations of the simple CPU That’s a lot of work to squeeze into one clock cycle! The clock cycle time, or the length of each clock cycle, has to be long enough to allow any instruction to complete. But the longer the cycle time, the lower the clock rate can be. For example, a 10ns clock cycle time corresponds to a 100MHz CPU, while a 1GHz processor has a cycle time of just 1ns! Our basic CPU expects each instruction to execute in just one cycle. To support complex instructions, we would have to lengthen the cycle time, thus decreasing the clock rate. This also means that any hardware which requires multiple clock cycles, such as serial adders or multipliers, cannot be easily used. 4/9/2019 Instruction encoding

Multicycle processors A multicycle processor can implement complex instructions in hardware. Each complex instruction is implemented as a microprogram—a sequence of simpler, single-cycle operations like the ones we’ve seen already. This is like writing a small program or function to implement the complex instruction, except the “program” is stored in hardware. By breaking longer instructions into a sequence of shorter ones, we can keep cycle times low and clock rates high. 4/9/2019 Instruction encoding

CISC vs. RISC Complex instruction set computers (CISC), which include powerful, but potentially slow instructions, were more popular in the past. A lot of programming was done in assembly language, and more powerful instructions made the programmer’s job easier. But designing the control unit for a CISC processor is hard work. Reduced instruction set computers (RISC), which support only simpler instructions, have influenced every processor since the mid-80s. Simple instructions can execute faster, especially with pipelining. People now depend on compilers for generating assembly code, so powerful instruction sets are much less of an issue. 4/9/2019 Instruction encoding

Why cache? Recall the memory tradeoff we mentioned several weeks ago. Static memory is very fast, but also very expensive. Dynamic memory is relatively slow, but much cheaper. CPU Lots of static RAM CPU Lots of dynamic RAM Expensive! Slow! 4/9/2019 Instruction encoding

Introducing caches Wouldn’t it be nice if we could find a balance between fast and cheap memory? We do this by introducing a cache, which is a small amount of fast, expensive memory. The cache goes between the processor and the slower, dynamic main memory. It keeps a copy of the most frequently used data from the main memory. Memory access speed increases overall, because we’ve made the common case faster. Reads and writes to the most frequently used addresses will be serviced by the cache. We only need to access the slower main memory for less frequently used data. Lots of dynamic RAM A little static RAM (cache) CPU 4/9/2019 Instruction encoding

The principles of locality How can we determine exactly what data should be stored in the small amount of cache memory that we have? It’s usually difficult or impossible to figure out what data will be “most frequently accessed” before a program actually runs. In practice, most programs exhibit locality, which the cache can take advantage of. The principle of temporal locality says that if a program accesses one memory address, there is a good chance it will access that same address again. The principle of spatial locality says that if a program accesses one memory address, there is a good chance it will also access other nearby addresses. 4/9/2019 Instruction encoding

How caches take advantage of temporal locality The first time the processor reads from an address in main memory, a copy of that data is also stored in the cache. The next time that same address is read, we can use the copy of the data in the cache instead of accessing the slower dynamic memory. So the first read is a little slower than before since it goes through both main memory and the cache, but subsequent reads are much faster. This takes advantage of temporal locality—commonly accessed data is stored in the faster cache memory. Lots of dynamic RAM A little static RAM (cache) CPU 4/9/2019 Instruction encoding

How caches take advantage of spatial locality When the CPU reads location i from main memory, a copy of that data is placed in the cache. But instead of just copying the contents of location i, we can copy several words into the cache at once, like the four words from locations i through i+3. If the CPU does subsequently need to read from location i+1, i+2 or i+3, it can access that data from the cache and not the slower main memory. For example, instead of reading just one array element, the cache might actually load four array elements at once. Again, the initial load incurs some performance penalty, but we’re gambling on spatial locality and the chance that the CPU will need the extra data. Lots of dynamic RAM A little static RAM (cache) CPU 4/9/2019 Instruction encoding

Summary Multicycle processors implement complex instructions by translating them into sequences of simpler, one-cycle “micro-operations.” With pipelining, the execution of several instructions is overlapped. Individual instructions take longer to execute, but the clock cycle time can be reduced. This can dramatically decrease the overall execution time of programs. But there are many kinds of hazards that must be taken care of. Caches take advantage of the principles of locality to improve performance. 4/9/2019 Instruction encoding