Instruct Set Architecture Variations

Instruct Set Architecture Variations
Instruction sets can be classified along several lines. Addressing modes let instructions access memory in various ways. How many operands to name? Data manipulation instructions can have from 0 to 3 operands. Where are the operands allowed to be? Only memory addresses Memory addresses or registers registers only, 4/9/2019 Instruction encoding

Addressing mode summary
4/9/2019 Instruction encoding

Number of operands Our example instruction set had three-address instructions, because each one had up to three operands—two sources and one destination. This provides the most flexibility. In a two-address instruction, the first operand serves as both the destination and one of the source registers. ADD R0, R1, R2 operation destination sources operands R0  R1 + R2 Register transfer instruction: ADD R0, R1 operation destination and source 1 source 2 operands R0  R0 + R1 Register transfer instruction: 4/9/2019 Instruction encoding

One-address instructions
Some computers, like this old Apple II, have one-address instructions. The CPU has a special register called an accumulator, which implicitly serves as the destination and one of the sources. Here is an example sequence which increments M[R0]: LD (R0) ACC  M[R0] ADD #1 ACC  ACC + 1 ST (R0) M[R0]  ACC ADD R0 operation source ACC  ACC + R0 Register transfer instruction: 4/9/2019 Instruction encoding

The ultimate: zero addresses
If the destination and sources are all implicit, then you don’t have to specify any operands at all! This is possible with processors that use a stack architecture. HP calculators and their “reverse Polish notation” use a stack. The Java Virtual Machine is also stack-based. How can you do calculations with a stack? Operands are pushed onto a stack. The most recently pushed element is at the “top” of the stack (TOS). Operations use the topmost stack elements as their operands. Those values are then replaced with the operation’s result. 4/9/2019 Instruction encoding

Stack architecture example
From left to right, here are three stack instructions, and what the stack looks like after each example instruction is executed. This sequence of stack operations corresponds to one register transfer instruction: TOS  R1 + R2 PUSH R1 PUSH R2 ADD (Top) (Bottom) 4/9/2019 Instruction encoding

Data movement instructions
Finally, the types of operands allowed in data manipulation instructions is another way of characterizing instruction sets. So far, we’ve assumed that ALU operations can have only register and constant operands. Many real instruction sets allow memory-based operands as well. We’ll use the book’s example and illustrate how the following operation can be translated into some different assembly languages. X = (A + B)(C + D) Assume that A, B, C, D and X are really memory addresses. 4/9/2019 Instruction encoding

Register-to-register architectures
Our programs so far assume a register-to-register, or load/store, architecture, which matches our datapath from last week nicely. Operands in data manipulation instructions must be registers. Other instructions are needed to move data between memory and the register file. With a register-to-register, three-address instruction set, we might translate X = (A + B)(C + D) into: LD R1, A R1  M[A] // Use direct addressing LD R2, B R2  M[B] ADD R3, R1, R2 R3  R1 + R2 // R3 = M[A] + M[B] LD R1, C R1  M[C] LD R2, D R2  M[D] ADD R1, R1, R2 R1  R1 + R2 // R1 = M[C] + M[D] MUL R1, R1, R3 R1  R1 * R3 // R1 has the result ST X, R1 M[X]  R1 // Store that into M[X] 4/9/2019 Instruction encoding

Memory-to-memory architectures
In memory-to-memory architectures, all data manipulation instructions use memory addresses as operands. With a memory-to-memory, three-address instruction set, we might translate X = (A + B)(C + D) into simply: How about with a two-address instruction set? ADD X, A, B M[X]  M[A] + M[B] ADD T, C, D M[T]  M[C] + M[D] // T is temporary storage MUL X, X, T M[X]  M[X] * M[T] MOVE X, A M[X]  M[A] // Copy M[A] to M[X] first ADD X, B M[X]  M[X] + M[B] // Add M[B] MOVE T, C M[T]  M[C] // Copy M[C] to M[T] ADD T, D M[T]  M[T] + M[D] // Add M[D] MUL X, T M[X]  M[X] * M[T] // Multiply 4/9/2019 Instruction encoding

Register-to-memory architectures
Finally, register-to-memory architectures let the data manipulation instructions access both registers and memory. With two-address instructions, we might do the following: LD R1, A R1  M[A] // Load M[A] into R1 first ADD R1, B R1  R1 + M[B] // Add M[B] LD R2, C R2  M[C] // Load M[C] into R2 ADD R2, D R2  R2 + M[D] // Add M[D] MUL R1, R2 R1  R1 * R2 // Multiply ST X, R1 M[X]  R1 // Store 4/9/2019 Instruction encoding

Size and speed There are lots of tradeoffs in deciding how many and what kind of operands and addressing modes to support in a processor. These decisions can affect the size of machine language programs. Memory addresses are long compared to register file addresses, so instructions with memory-based operands are typically longer than those with register operands. Permitting more operands also leads to longer instructions. There is also an impact on the speed of the program. Memory accesses are much slower than register accesses. Longer programs require more memory accesses, just for loading the instructions! Most newer processors use register-to-register designs. Reading from registers is faster than reading from RAM. Using register operands also leads to shorter instructions. 4/9/2019 Instruction encoding

Advanced CPU designs The last few weeks were a very fast tour of a simple CPU design. Today we’ll introduce some advanced CPU designs, which are covered in more detail in CS232 and CS333. Multicycle processors support more complex instructions. Pipelined CPUs achieve higher clock rates and higher performance. CPUs with caches make fewer RAM accesses, which improves performance. July 30, 2002 © Howard Huang

Control unit review Our CPU is a single-cycle machine, since each instruction executes in one clock cycle. 1. An instruction is read from the instruction memory. 2. The instruction decoder generates the matching datapath control signals. 3. Register values are sent to the ALU or the data memory. 4. ALU or RAM outputs are sent back to the register file. 5. The PC is incremented, or reloaded for branches and jumps. ADRS Instruction RAM OUT PC Instruction Decoder DA AA BA MB FS MD WR MW Branch Control V C N Z 4/9/2019 Instruction encoding

Limitations of the simple CPU
That’s a lot of work to squeeze into one clock cycle! The clock cycle time, or the length of each clock cycle, has to be long enough to allow any instruction to complete. But the longer the cycle time, the lower the clock rate can be. For example, a 10ns clock cycle time corresponds to a 100MHz CPU, while a 1GHz processor has a cycle time of just 1ns! Our basic CPU expects each instruction to execute in just one cycle. To support complex instructions, we would have to lengthen the cycle time, thus decreasing the clock rate. This also means that any hardware which requires multiple clock cycles, such as serial adders or multipliers, cannot be easily used. 4/9/2019 Instruction encoding

Multicycle processors
A multicycle processor can implement complex instructions in hardware. Each complex instruction is implemented as a microprogram—a sequence of simpler, single-cycle operations like the ones we’ve seen already. This is like writing a small program or function to implement the complex instruction, except the “program” is stored in hardware. By breaking longer instructions into a sequence of shorter ones, we can keep cycle times low and clock rates high. 4/9/2019 Instruction encoding

CISC vs. RISC Complex instruction set computers (CISC), which include powerful, but potentially slow instructions, were more popular in the past. A lot of programming was done in assembly language, and more powerful instructions made the programmer’s job easier. But designing the control unit for a CISC processor is hard work. Reduced instruction set computers (RISC), which support only simpler instructions, have influenced every processor since the mid-80s. Simple instructions can execute faster, especially with pipelining. People now depend on compilers for generating assembly code, so powerful instruction sets are much less of an issue. 4/9/2019 Instruction encoding

Why cache? Recall the memory tradeoff we mentioned several weeks ago.
Static memory is very fast, but also very expensive. Dynamic memory is relatively slow, but much cheaper. CPU Lots of static RAM CPU Lots of dynamic RAM Expensive! Slow! 4/9/2019 Instruction encoding

Introducing caches Wouldn’t it be nice if we could find a balance between fast and cheap memory? We do this by introducing a cache, which is a small amount of fast, expensive memory. The cache goes between the processor and the slower, dynamic main memory. It keeps a copy of the most frequently used data from the main memory. Memory access speed increases overall, because we’ve made the common case faster. Reads and writes to the most frequently used addresses will be serviced by the cache. We only need to access the slower main memory for less frequently used data. Lots of dynamic RAM A little static RAM (cache) CPU 4/9/2019 Instruction encoding

The principles of locality
How can we determine exactly what data should be stored in the small amount of cache memory that we have? It’s usually difficult or impossible to figure out what data will be “most frequently accessed” before a program actually runs. In practice, most programs exhibit locality, which the cache can take advantage of. The principle of temporal locality says that if a program accesses one memory address, there is a good chance it will access that same address again. The principle of spatial locality says that if a program accesses one memory address, there is a good chance it will also access other nearby addresses. 4/9/2019 Instruction encoding

How caches take advantage of temporal locality
The first time the processor reads from an address in main memory, a copy of that data is also stored in the cache. The next time that same address is read, we can use the copy of the data in the cache instead of accessing the slower dynamic memory. So the first read is a little slower than before since it goes through both main memory and the cache, but subsequent reads are much faster. This takes advantage of temporal locality—commonly accessed data is stored in the faster cache memory. Lots of dynamic RAM A little static RAM (cache) CPU 4/9/2019 Instruction encoding

How caches take advantage of spatial locality
When the CPU reads location i from main memory, a copy of that data is placed in the cache. But instead of just copying the contents of location i, we can copy several words into the cache at once, like the four words from locations i through i+3. If the CPU does subsequently need to read from location i+1, i+2 or i+3, it can access that data from the cache and not the slower main memory. For example, instead of reading just one array element, the cache might actually load four array elements at once. Again, the initial load incurs some performance penalty, but we’re gambling on spatial locality and the chance that the CPU will need the extra data. Lots of dynamic RAM A little static RAM (cache) CPU 4/9/2019 Instruction encoding

Summary Multicycle processors implement complex instructions by translating them into sequences of simpler, one-cycle “micro-operations.” With pipelining, the execution of several instructions is overlapped. Individual instructions take longer to execute, but the clock cycle time can be reduced. This can dramatically decrease the overall execution time of programs. But there are many kinds of hazards that must be taken care of. Caches take advantage of the principles of locality to improve performance. 4/9/2019 Instruction encoding

Instruct Set Architecture Variations

Similar presentations

Presentation on theme: "Instruct Set Architecture Variations"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instruct Set Architecture Variations

Similar presentations

Presentation on theme: "Instruct Set Architecture Variations"— Presentation transcript:

Similar presentations

About project

Feedback