architectural overview Sony Emotion Engine architectural overview 2002.5.20 Kim L. Vu
Acronyms ALU Arithmetic Logic Unit COP Coprocessor DMAC Direct Memory Access Controller DSP Digital Sound Processing EE Emotion Engine EFU Elementary Functional Unit GIF Graphics Interface IPU Image Processing Unit MAC Multiply-Accumulate RDRAM Rambus Dynamic RAM SPRAM Scratch-Pad RAM VPU Vector Processing Unit
Overview (ps2 architecture)
Emotion Engine VU0 thought simulation, AI, physics calculations SIMD, VLIW architecture VU1 fixed geometry calculations CPU + FPU program control IPU real-time image data decompression
Emotion Engine Features 300Mhz MIPS III CPU Two-issue superscalar,128-bit multimedia extensions 16k, 2-way instruction cache 8k, 2-way data cache 16k “scratch pad” RAM Vector Units Both have 4 FMACS + 1 FDIV EFP (Elementary function unit) in VU1 1 FMAC + 1 FDIV 128-bit data bus IPU – MPEG2 decoder unit 10-channel DMA Controller
CPU Core Features MIPS III Instruction Set architecture 6 stage pipeline PC Select | Fetch | Register | Exec | Cache Access | Write Back Two 64-bit integers ALUs ALUs can be combined in”lock step” to execute 128-bit SIMD operations Load/Store Unit Branch Execution Unit 64-entry two-branch prediction mechanism 32 128-bit registers
Vector unit performance Microarchitecturally identical 4 FMACS 1 FDIV 1 Load/Store Unit 1 ALU 1 random number generator 2 issue VLIW (64-bit bundle) Two operating modes VLIW and Coprocessor mode Throughput FMAC operation – 1 cycle FDIV operation – 7 cycles 4x4 matrix * vector – 4 cycles 4x4 matrix * matrix – 16 cycles
VU Features Features VU0 VU1 Job VLIW mode Coprocessor mode Flexible calculations Fixed 3D calculations VLIW mode Yes Coprocessor mode No VPU components 4k instruction RAM 4k data RAM VIF 16k instruction RAM 16k data RAM GIF EFU Performance 4 FMACS (2.4 Gflops) 1 FDIV (0.04 Gflops) 1 EFU (0.64 Gflops)
VPU0 Design Strategy 2-modes : VLIW and coprocessor Runs mainly in coprocessor mode Lower opcode always NOP Controlled by CPU Executes 32-bit MIPS coprocessor instructions Processes 4 parallel FP instructions
VPU1 Design Strategy Can only run in VLIW mode Executes 64-bit VLIW bundle Accessed by 3D display list 3D display list contain boht instruction and data in same structure
VU Instruction Formats Instruction bundle has 2 parts “upper” (SIMD) + “lower” “Lower” execution unit “Upper” execution unit FP div/sqrt/reverse sqrt 4 parallel FP add/sub Load/store 4 parallel FP mul EFU(1 FMAC + 1FDIV) 4 parallel FP add/msub Jump/branch Random number generator
EE Teams
Team 1 Handles physics, program control, AI and behavior calculations Members work closely together with each other Ease of communication through 128-bit dedicated busses from CPU to FPU and VU0 SPRAM – acts as CPU and VU0’s shared workspace
Team 2 Handles simple geometric calculations Members act as equal partners Dedicated 128-bit bus from VPU1 to GIF
Team Interoperation Serial connection Parallel connection VPU0 acts as VPU1’s coprocessor SPRAM is used to transfer data to VPU1 VPU1 renders final image Parallel connection GIF monitors the status of the graphics synthesizer Both teams independently and asynchronously sends display lists
References Atsushi Kunimatsu, et. al., “Vector Unit Architecture for Emotion Synthesis”, IEEE Micro, Vol. 20, No. 2, March/April 2000, pp. 40-47 K. Kutaragie et. al., “A Microprocessor with 128b CPU, 10 Floating-Point MACS, 4 Floating-Point Dividers, and MPEG2 Decoder,” ISSCC (Int’l Solid-States Circuit Conf.) Digest Tech. Papers, IEEE Press, Piscatawey, New Jersey, Feb. 1999, pp. 256-257 F. Micheal Raam, et. al., “A High-Bandwidth Superscalar Microprocessor for Multimedia Applications,” ISSCC Digest Tech. Papers, IEEE Press, Feb. 1999, pp. 258-259 Sound and Vision: A Technical Overview of the Emotion Engine by John Stokes, Ars Technica http://arstechnica.com/reviews/1q00/playstation2/ee-1.html The Playstation2 vs. the PC: A System-level Comparison of Two 3D Platforms by John Stokes, Ars Technica http://arstechnica.com/cpu/2q00/ps2/ps2vspc-1.html