Alpha AXP Architecture Dr. Richard L. Sites Digital Technical Journal Volume 4, Number 4 Special Issue 1992 Oliver Hampton Friday, January 31, 2003
DEC Alpha AXP Learning Objectives Multiple Instruction Issue and Superscalar Alpha Multiprocessor implementation 32-bit and 64-bit data type register representation and memory load/store Alpha Instruction Set
Dr. Richard L. Sites Employment IBM Hewlett-Packard Burroughs Digital Equipment Corporation (1980) significant contributor to the Alpha AXP architecture Education B.S. in Mathematics form MIT Ph.D. in Computer Science from Stanford University Post-doctoral work at the University of North Carolina (computer architecture)
DEC Alpha AXP Overview Designed for speed 64-bit Load/Store RISC architecture Two sets of 64-bit registers 32 integer registers (R31 = 0 th ) 32 floating point registers (F31 = 0 th ) All instructions are fixed length → 32 bits Memory operations → Reads or Writes
DEC Alpha AXP Design Goals High Performance The Guinness Book of Records (October 1992) listed the Alpha as the world’s fastest single-chip microprocessor Longevity Twenty-five years before Alpha computers → 1000 times faster Twenty-five years after Alpha → Alpha 1000 times faster Clock rates 10 times faster Multiple instruction issue (superscalar) ≈ 10 new instruction every clock cycle Multiple processor systems ≈ 10 processors sharing memory
Design Goals Continued Capability to run VMS and UNIX OS First Alpha DECchip ran OpenVMS AXP, DEC OSF/1 AXP, and Windows NT PALcode: Hardware ↔ OS interface handler Sets state of machine before first instruction Mediates access to hardware resources Easy migration form VAX and MIPS architectures
Superscalar & Multiple Instruction Issue Envisioned as parallel pipelines MII definition: “starting more than one instruction at once” Alpha MII implementation eliminated Condition codes MII instructions do not compete for status register Branch delay slots Suppressed/Skipped instructions Problems with tandem suppression Arithmetic Exceptions (Over and Underflow) TRAPB may be used to report such exceptions
Multiprocessing Atomic update of Shared-memory Mutual Exclusion Requires instruction sequence Load-locked → in-register modify → store- conditional → test if no interrupts, no exceptions, no interfering write, then store-conditional stores the modified result and test reports success, else repeat No strict read/write ordering VAX avoids pipelined writes to preserve strict write ordering and avoid out-of-order writes
Alpha Register Data Representation Data Types (32-bit and 64-bit) Integer IEEE floating point VAX floating point 64-bit Data Types32-bit Data Types (1)(2)(3)(6)(5)(4)
Alpha Memory Load/Store No instructions operate directly on memory, data manipulation done between 64-bit registers Memory access (1) Reads = Load instruction (2) Writes = Store instruction ← → ← ← → → 32-bit store 32-bit load
Alpha Memory Continued Byte order Little-endian: byte zero is the low byte of an integer Big-endian: byte zero is the high byte of an integer Virtual addressing Full 64-bits (DECchip only used 43-bits) Paging DECchip used 8KB pages Expandable to 64KB pages
Alpha Instructions Four Types Operate Memory Branch CALL_PAL (TRAPB & PALcode group) 6-bit opcode zero to three 5-bit registers (RA, RB, RC) RA = universal RB = only read, never written RC = destination, never read
Operate Instructions Operate All operate instructions are three-operand, and register-to-register RC ← RA operate RB Integer operations may substitute 8-bit unsigned literal instead of RB Integer: add, subtract, multiply, compare Floating-point: add, subtract, multiply, compare, convert Logical: and, or, xor, and-not, or-not, xor-not
Operate Instruction Examples Add Quadword (integer arithmetic) ADDQ R6, R31, R7 R6 contains 64-bit representation of three base 10 R31 is always equal to zero R7 contains 64-bit answer to 3+0=3 Compare Equal (logical compare) CMPEQ R31, 3, R0 R0 contains answer to 0 == 3 → 0
Memory Instructions Load & Store RA: register to be loaded/stored If RA is unaligned a byte-manipulation instruction is requited RB: base register 16-bit displacement RB added to 64-bit sign-extended 16-bit displacement to obtain virtual address which maps to the physical address where RA is stored to, or loaded from
Memory Instruction Example Explicit Load of an Unaligned Quadword using Little-endian LDQ_U: Load Unaligned Quadword EXTQL: Extract Quadword Low EXTQH: Extract Quadword High
Branch Instruction RA is used in conditional branching to determine true/false Displacement is left sifted by two and sign extended to 64-bits so that it may be added to the Program Counter (PC)
Alpha AXP Comes Full Circle Compaq purchased DEC and Tandem Compaq server groups supported Alpha, MIPS, and Pentium Xeon June 2001, Compaq announced the end of Alpha Alpha processor development cancelled after 2003 Alpha-based system development cancelled after 2004 Alpha software teams at Compaq slated to target Intel’s Itanium
Alpha AXP Questions Is it possible to design longevity into a processor? What instruction code feature does Alpha utilize to run multiple operating systems? What is the Alpha instruction sequence that implements atomic updated on shared memory? Load-locked → in-register modify → store-conditional → test State one of the design exceptions that Alpha implemented to support Multiple Instruction Issue. (1) Condition codes, (2) Branch delay slots, (3) Suppressed/Skipped instructions, (4) Arithmetic Exceptions
References Sites, R.L., “Alpha AXP Architecture”, Digital Technical Journal, Vol.4, No.4, Meng, X., “The DEC Alpha AXP – A Case Study”, Notes/master/node93.html Notes/master/node93.html Rusling, D.A., “The Alpha AXP Processor”, node140.html node140.html Leibson, S., “So Long Alpha”, dit15_24.html dit15_24.html