PlayStation2 as a General Purpose Computer (The Emotion Engine vs. general PC architectures)

Slides:



Advertisements
Similar presentations
DSPs Vs General Purpose Microprocessors
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
ARCHITECTURE OF APPLE’S G4 PROCESSOR BY RON WEINWURZEL MICROPROCESSORS PROFESSOR DEWAR SPRING 2002.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
GCSE Computing - The CPU
architectural overview
1 Chapter 4 The Central Processing Unit and Memory.
PlayStation 2 Architecture Irin Jose Farid Momin Quy Ngo Olivia Wong.
GPGPU platforms GP - General Purpose computation using GPU
By: Clara Miles and Jarrick Lumma.  The motherboard is the main circuit board of the system unit. It can also be called a system board.  It contains.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Topic 4 Processor Performance AH Computing. Introduction bit processor, 16 bit address bus Intel8086/88 (1979) IBM PC 16-bit data and address buses.
Computer Architecture Part III-A: Memory. A Quote on Memory “With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
Processing Devices.
BLOCK DIAGRAM OF COMPUTER
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Higher Computing Computer structure. What we need to know! Detailed description of the purpose of the ALU and control unitDetailed description of the.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Basics and Architectures
Computer Graphics Graphics Hardware
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.
Fall 2012 Chapter 2: x86 Processor Architecture. Irvine, Kip R. Assembly Language for x86 Processors 6/e, Chapter Overview General Concepts IA-32.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
What is cache memory?. Cache Cache is faster type of memory than is found in main memory. In other words, it takes less time to access something in cache.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: CPU.
CPU Inside Maria Gabriela Yobal de Anda L#32 9B. CPU Called also the processor Performs the transformation of input into output Executes the instructions.
Introduction to MMX, XMM, SSE and SSE2 Technology
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
The components of a computer (Part I). Overview : A computer § accepts data and processes it into useful information §consists of l input: accepts data.
Academic PowerPoint Computer System – Architecture.
Playstation2 Architecture Architecture Hardware Design.
Emotion Engine™ AKA the “Playstation 2” Architecture Or The progeny of a MIPS and a DSP By Idan Gazit – June 2002.
CPU Transforms Input and Output Each computer contains one Collection of electronic circuits Processor Interpretates and execute instructions in a program.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Lecture # 10 Processors Microcomputer Processors.
Software Design and Development Computer Architecture Computing Science.
PipeliningPipelining Computer Architecture (Fall 2006)
CPU Central Processing Unit
William Stallings Computer Organization and Architecture 6th Edition
Unit 2 Technology Systems
GCSE Computing - The CPU
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Graphics Processor Graphics Processing Unit
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
Visit for more Learning Resources
Parallel Processing - introduction
Discovering Computers 2011: Living in a Digital World Chapter 4
Phnom Penh International University (PPIU)
CENTRAL PROCESSING UNIT CPU (microprocessor)
CPU Central Processing Unit
Multi-Processing in High Performance Computer Architecture:
GCSE Computing - The CPU
6- General Purpose GPU Programming
Lesson Objectives A note about notes: Aims
Presentation transcript:

PlayStation2 as a General Purpose Computer (The Emotion Engine vs. general PC architectures)

Can the PlayStation2 compete with the PC as a general purpose computer? What’s the difference between the general PC architecture and the PlayStation2 architecture? What’s the difference between the general PC architecture and the PlayStation2 architecture? How do these differences affect the performance of the PlayStation2 on general applications like word processing and running clients? How do these differences affect the performance of the PlayStation2 on general applications like word processing and running clients?

SISD vs. SIMD SISD – Single Instruction stream Single Data stream SISD – Single Instruction stream Single Data stream Intel and AMD Processors Intel and AMD Processors SIMD – Single Instruction stream Multiple Data streams SIMD – Single Instruction stream Multiple Data streams PlayStation2 PlayStation2 Motorola's MPC7400 (the G4) Motorola's MPC7400 (the G4) Sun's MAJC Sun's MAJC

SISD Takes advantage of instruction-level parallelism Takes advantage of instruction-level parallelism Executes multiple instructions at once on the same data stream Executes multiple instructions at once on the same data stream Good performance depends on good cache performance Good performance depends on good cache performance Very high clock speed (execute as many instructions as you can as fast as possible) Very high clock speed (execute as many instructions as you can as fast as possible)

SIMD Takes advantage of data parallelism Takes advantage of data parallelism Executes the same instruction on large amounts of uniform data all at once Executes the same instruction on large amounts of uniform data all at once Good performance depends on efficiently packing data into uniform format Good performance depends on efficiently packing data into uniform format Slower clock speed Slower clock speed Very high throughput Very high throughput

SISD/SIMD

SIMD on the PlayStation2 The heart of the PlayStation2 is the Emotion Engine The heart of the PlayStation2 is the Emotion Engine Its main function is to calculate display lists and send them on to a Graphics Synthesizer which renders these lists into three dimensional objects Its main function is to calculate display lists and send them on to a Graphics Synthesizer which renders these lists into three dimensional objects

Calculating display lists basically involves vector calculations Calculating display lists basically involves vector calculations the kind of task a SIMD architecture is perfect for the kind of task a SIMD architecture is perfect for It requires a relatively small set of instructions operating on massive amounts of uniform data It requires a relatively small set of instructions operating on massive amounts of uniform data The most common operation is a tight loop iterating through sets of matrices The most common operation is a tight loop iterating through sets of matrices SIMD on the PlayStation2

SISD data caches tend to be large SISD data caches tend to be large Huge performance gains are achieved by reading in a big chunk of data and executing as many instructions as you can on it Huge performance gains are achieved by reading in a big chunk of data and executing as many instructions as you can on it This approach is terrible for SIMD architecture This approach is terrible for SIMD architecture Data is not referenced repeatedly Data is not referenced repeatedly Vector calculations are performed and then the next bit of data is read Vector calculations are performed and then the next bit of data is read Nothing is gained by storing old data in cache memory Nothing is gained by storing old data in cache memory Differences in cache implementation

12K µop 8-way set associative execution trace cache 12K µop 8-way set associative execution trace cache 8K 8-way set associative data cache 8K 8-way set associative data cache 256K or 512K 8-way set associative Level 2 cache 256K or 512K 8-way set associative Level 2 cache The exact size of the L1 instruction cache is not clearly documented (8-12K would be a reasonable assumption) The exact size of the L1 instruction cache is not clearly documented (8-12K would be a reasonable assumption) Cache specs for the Pentium 4 (SISD architecture)

16K 2-way set associative instruction cache 16K 2-way set associative instruction cache 8K 2-way set associative data cache 8K 2-way set associative data cache Two Vector Units (VU0 and VU1) each have a 16K instruction cache and 16K data cache Two Vector Units (VU0 and VU1) each have a 16K instruction cache and 16K data cache 16K SPRAM (Scratch Pad RAM - high speed memory shared by the processor and VU0 16K SPRAM (Scratch Pad RAM - high speed memory shared by the processor and VU0 Cache specs for Emotion Engine (SIMD architecture)

The PlayStation total cache size is smaller than the Pentium 4 by a factor of about 3 or 5 depending on the size of the Pentium L2 cache The PlayStation total cache size is smaller than the Pentium 4 by a factor of about 3 or 5 depending on the size of the Pentium L2 cache Also, the caches are divided up into much smaller units on the Emotion Engine Also, the caches are divided up into much smaller units on the Emotion Engine The big difference is the lack of a L2 cache in the Emotion Engine The big difference is the lack of a L2 cache in the Emotion Engine Cache Specs

Designed with massive bandwith to maximize throughput Designed with massive bandwith to maximize throughput Memory bus bandwith: 3.2 GB/s Memory bus bandwith: 3.2 GB/s 16-bit bus connects two 128 MB RDRAM memory banks to the 10-channel Direct Memory Access Controller (DMAC) 16-bit bus connects two 128 MB RDRAM memory banks to the 10-channel Direct Memory Access Controller (DMAC) DMAC allows up to 10 simultaneous data transfers on 128-bit and 64-bit buses DMAC allows up to 10 simultaneous data transfers on 128-bit and 64-bit buses Much higher throughput is achieved because the system can service more requests simultaneously Much higher throughput is achieved because the system can service more requests simultaneously Bandwith in the Emotion Engine

General Purpose SISD Architecture

Emotion Engine Architecture

Performance of the PlayStation2 In multi media applications In multi media applications Outperforms PC’s by far on tasks such as Outperforms PC’s by far on tasks such as mp3 encoding/decoding mp3 encoding/decoding mpeg encoding/decoding mpeg encoding/decoding graphics applications graphics applications In applications that have very little data parallelism (like word processing, , or internet browsing) In applications that have very little data parallelism (like word processing, , or internet browsing) Degenerates to a machine with very low clock rate and a terrible cache implementation Degenerates to a machine with very low clock rate and a terrible cache implementation Cannot possibly compete with modern PC’s Cannot possibly compete with modern PC’s

Can the PlayStation2 compete with general purpose CPU’s? Not currently Not currently The lack of a L2 cache makes it difficult to compete with SISD architectures on workloads with high data reuse The lack of a L2 cache makes it difficult to compete with SISD architectures on workloads with high data reuse Even if we focus entirely on multimedia applications Even if we focus entirely on multimedia applications Code would have to be re-written and re-compiled to take advantage of the Emotion Engine’s higher bandwith and vector processors Code would have to be re-written and re-compiled to take advantage of the Emotion Engine’s higher bandwith and vector processors Not enough memory Not enough memory Only supports a total of 32MB Only supports a total of 32MB Not enough permanent storage Not enough permanent storage Max storage capacity is 16MB (two 8MB memory cards) Max storage capacity is 16MB (two 8MB memory cards)

Some Necessary Improvements Several improvements are necessary if the PlayStation2 wants to compete with general purpose PC’s in the future. For example: Several improvements are necessary if the PlayStation2 wants to compete with general purpose PC’s in the future. For example: Memory hierarchy needs to be re-designed to accommodate SISD workloads Memory hierarchy needs to be re-designed to accommodate SISD workloads A level 2 cache and a trace execution cache would substantially improve performance A level 2 cache and a trace execution cache would substantially improve performance A more powerful core CPU is necessary A more powerful core CPU is necessary Wider issue Wider issue Improved branch predictor Improved branch predictor Programmers need to learn how to fully utilize the strengths of the Emotion Engine Architecture Programmers need to learn how to fully utilize the strengths of the Emotion Engine Architecture

The PlayStation2 will face tougher competition from PC architectures, like the G4, that are incorporating SIMD architectures into their design more aggressively The PlayStation2 will face tougher competition from PC architectures, like the G4, that are incorporating SIMD architectures into their design more aggressively It will be interesting to see how these new architectures compete with the PlayStation2 as 3D gaming systems It will be interesting to see how these new architectures compete with the PlayStation2 as 3D gaming systems In the Future

Jon “Hannibal” Stokes: Jon “Hannibal” Stokes: “Sound and Vision: A Technical Overview of the Emotion Engine” “Sound and Vision: A Technical Overview of the Emotion Engine” “The PlayStation2 vs. the PC: a System-level Comparison of Two 3D Platforms” “The PlayStation2 vs. the PC: a System-level Comparison of Two 3D Platforms” “3 1/2 SIMD Architectures “ “3 1/2 SIMD Architectures “ A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder Suzuoki, M.; Kutaragi, K.; et al Solid-State Circuits, IEEE Journal of, Volume: 34 Issue: 11, Nov Page(s): A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder Suzuoki, M.; Kutaragi, K.; et al Solid-State Circuits, IEEE Journal of, Volume: 34 Issue: 11, Nov Page(s): Vector unit architecture for emotion synthesis Kunimatsu, A.; Ide, N.; Sato, T.; Endo, Y.; Murakami, H.; Kamei, T.; Hirano, M.; Ishihara, F.; Tago, H.; Oka, M.; Ohba, A.; Yutaka, T.; Okada, T.; Suzuoki, M. IEEE Micro, Volume: 20 Issue: 2, March-April 2000 Page(s): Vector unit architecture for emotion synthesis Kunimatsu, A.; Ide, N.; Sato, T.; Endo, Y.; Murakami, H.; Kamei, T.; Hirano, M.; Ishihara, F.; Tago, H.; Oka, M.; Ohba, A.; Yutaka, T.; Okada, T.; Suzuoki, M. IEEE Micro, Volume: 20 Issue: 2, March-April 2000 Page(s): Designing and programming the emotion engine Oka, M.; Suzuoki, M. IEEE Micro, Volume: 19 Issue: 6, Nov.-Dec Page(s): Designing and programming the emotion engine Oka, M.; Suzuoki, M. IEEE Micro, Volume: 19 Issue: 6, Nov.-Dec Page(s): References