The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler.

The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Outline Overview Instruction Set Architecture Instruction Stream Data Stream

System Overview RISC instruction Set RISC instruction Set 64-bit processor 64-bit processor 15 million transistors 15 million transistors 1 st Alpha w/ out-of-order execution 1 st Alpha w/ out-of-order execution Speculative execution Speculative execution

Specs 6/7-stage pipeline 6/7-stage pipeline 4-way integer issue 4-way integer issue 2-way floating point issue 2-way floating point issue Peak of 6 instructions per cycle Peak of 6 instructions per cycle Sustainable of 4 instructions per cycle Sustainable of 4 instructions per cycle Can have up to 80 instructions in process at one time Can have up to 80 instructions in process at one time 4-way out-of-order issue 4-way out-of-order issue

Specs 4 integer execution units 4 integer execution units 2 general purpose units 2 general purpose units 2 address ALUs 2 address ALUs 32 int & 31 fp registers (64-bits wide) 32 int & 31 fp registers (64-bits wide) 48 int & 40 fp reorder registers 48 int & 40 fp reorder registers 80 additional int registers (copies other 80) 80 additional int registers (copies other 80)

RISC vs. CISC P3 and Athlon are CISC processors P3 and Athlon are CISC processors 21264 is a RISC processor 21264 is a RISC processor Today’s difference is that RISC ISA generally do not use operands from memory. Today’s difference is that RISC ISA generally do not use operands from memory.

Differences from 21164 Out of order issue Out of order issue Smaller pipeline Smaller pipeline Increased memory bandwidth Increased memory bandwidth Memory references can be accessed in parallel to caches Memory references can be accessed in parallel to caches One pipeline for both floating point and integer operations One pipeline for both floating point and integer operations 4x the bandwidth 4x the bandwidth

Previous instruction types Branch Branch Floating point Floating point Memory Memory Memory/Function code Memory/Function code Memory/branch Memory/branch Operate Operate PALcode PALcode

New instructions Floating Point Floating Point Cache prefetching Cache prefetching MVI (motion video instructions) MVI (motion video instructions)

Floating point instructions To Better calculate square root To Better calculate square root Single precision (SQRTS) Single precision (SQRTS) Double precision (SQRTT) Double precision (SQRTT) Move data between floating point and integer register files Move data between floating point and integer register files

Prefetch instructions Allows the compiler to exploit the higher bandwidth Allows the compiler to exploit the higher bandwidth Five instructions introduced Five instructions introduced

Prefetch instructions 21264 Cache Prefetch and Management Instructions Normal Prefetch The 21264 fetches the (64-byte) block into the (level one data and level two) cache. Prefetch with Modify Intent The same as the normal prefetch except that the block is loaded into the cache in dirty state so that subsequent stores can immediately update the block. Prefetch and Evict Next The same as the normal prefetch except that the block will be evicted from the (level one) data cache as soon as there is another block loaded at the same cache index. Write Hint 64The 21264 obtains write access to the 64-byte block without reading the old contents of the block. The application typically intends to over-write the entire contents of the block. EvictThe cache block is evicted from the caches.

MVI Or Motion Video Instructions Or Motion Video Instructions Set of Alpha processor instructions that are categorized as SIMD Set of Alpha processor instructions that are categorized as SIMD Intended to implement high quality software video encoding (MPEG-1, MPEG-2, H.261 and H.263 Intended to implement high quality software video encoding (MPEG-1, MPEG-2, H.261 and H.263

MVI (unpack) UNPKBW UNPKBW Unpack bytes to words Unpack bytes to words UNPKBL UNPKBL Unpack bytes to long words Unpack bytes to long words

MVI (pack) PACKWB PACKWB Truncates the four component words of the input register and writes them to the low four bytes of the output register. Truncates the four component words of the input register and writes them to the low four bytes of the output register. PACKLB PACKLB Truncates the 2 component long words of the input register to byte values and writes them to the low two bytes of the output register. Truncates the 2 component long words of the input register to byte values and writes them to the low two bytes of the output register.

MVI (Byte & Word Min. and Max.) MINUB8 MINUB8 MINUW4 MINUW4 MINSB8 MINSB8 MINSW4 MINSW4 MAXUB8 MAXUB8 MAXUW4 MAXUW4 MAXSB8 MAXSB8 MAXSW4 MAXSW4 Take the form MINxxx Ra, Rb, Rc

MVI (PERR) Replaced nine instructions for motion estimation. Replaced nine instructions for motion estimation. It takes the 8 bytes packed into 2 quadword registers and computes the absolute difference between them, then adds the eight intermediate results and right aligns the result in the destination register. It takes the 8 bytes packed into 2 quadword registers and computes the absolute difference between them, then adds the eight intermediate results and right aligns the result in the destination register. RESULT: Motion estimation calculations on 8 pixels in a single clock tick. RESULT: Motion estimation calculations on 8 pixels in a single clock tick.

The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler.

Similar presentations

Presentation on theme: "The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler.

Similar presentations

Presentation on theme: "The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler."— Presentation transcript:

Similar presentations

About project

Feedback