Xbox 360 Architecture Presenter: Ataç Deniz Oral Date: 30/11/06.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Computer Architecture and Design Fall 2009 Indraneil Gokhale.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.
Pipelining Fetch instruction Decode instruction Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result Overlap these operations.
A Programmable Coprocessor Architecture for Wireless Applications Yuan Lin, Nadav Baron, Hyunseok Lee, Scott Mahlke, Trevor Mudge Advance Computer Architecture.
Chapter 17 Parallel Processing.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Superscalar Implementation Simultaneously fetch multiple instructions Logic to determine true dependencies involving register values Mechanisms to communicate.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Comparison of Next Generation Gaming Architectures Presented By Dela Tsiagbe Presented By Dela Tsiagbe.
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
Chapter 18 Multicore Computers
Computer performance.
The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Multi-core architectures. Single-core computer Single-core CPU chip.
By Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas This paper appears in: Micro, IEEE March/April 2011 (vol. 31 no. 2) pp 마이크로 프로세서.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
COMP Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
DASX : Hardware Accelerator for Software Data Structures Snehasish Kumar, Naveen Vedula, Arrvindh Shriraman (Simon Fraser University), Vijayalakshmi Srinivasan.
Comparing Intel’s Core with AMD's K8 Microarchitecture IS 3313 December 14 th.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
CIS 501: Comp. Arch. | Prof. Joe Devietti | Xbox1/PS41 CIS 501: Computer Architecture Unit 12: Putting it All Together: The Xbox One/PS4 Game Consoles.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
Emergent Game Technologies Gamebryo Element Engine Thread for Performance.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
.1 Multiprocessor on a Chip & Simultaneous Multi-threads [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Processor Architecture
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
CIS 501: Comp. Arch. | Prof. Joe Devietti | Xbox1/PS41 CIS 501: Computer Architecture Unit 12: Putting it All Together: The Xbox One/PS4 Game Consoles.
The Intel 86 Family of Processors
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
EKT303/4 Superscalar vs Super-pipelined.
Chao Han ELEC6200 Computer Architecture Fall 081ELEC : Han: PowerPC.
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
My Coordinates Office EM G.27 contact time:
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
1 Lecture 5a: CPU architecture 101 boris.
Itanium® 2 Processor Architecture
COMP 740: Computer Architecture and Implementation
Microarchitecture.
Multi-core processors
PowerPC 604 Superscalar Microprocessor
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Hyperthreading Technology
Presented by: Isaac Martin
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
CSC3050 – Computer Architecture
CIS 6930: Chip Multiprocessor: Parallel Architecture and Programming
Presentation transcript:

Xbox 360 Architecture Presenter: Ataç Deniz Oral Date: 30/11/06

Overview  The Xbox  What kind of computation?  Architectural details  Decisions / Trade-offs  Conclusion  Discussion

Photos taken from: The Xbox

Computation  Decompression kernel  Game World Geometry  Data streaming  also  AI software  Audio synthesis

Picture taken from: C17.S8/HC17.S8T4.pdf Why not use a PC? ●Dot product implementation ●Support for D3D formats

IBM PowerPC core 4 KB two- way set- associative BHT SIMD Vector unit Floating Point Unit Fixed Point UnitLoad/Store Unit

The Cache

Decisions / Trade Offs  Why multiple cores? (CMP versus SMP)  Cost-effective!  Enables shared L2 implementation (therefore reduces communication latency)

Decisions / Trade Offs (cont.)  Shared L2 Cache  To adapt to varying workloads  i.e. Scene management vs. audio processing

Decisions / Trade Offs (cont.)  In-order instruction issuance cores  Simplifies logic  Reduced die area  Reduced cost and power consumption  Out-of-order issuance requires  Additional pipeline stages to meet clock period timing  Rename registers and completion queues  In-order instruction execution  Claimed to be justified by two SMT (Symmetric MultiThreading) hardware threads per core

Computation  Decompression kernel  Game World Geometry  Data streaming

CPU Data Streaming Write Streaming  Enable data streaming  But do not thrash private cache or shared cache Write-through L1 caches Write-through L1 caches

CPU Data Streaming Write Streaming  Enable data streaming  But do not thrash private cache or shared cache Write-through L1 caches Write-through L1 caches Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB) Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB)

The Cache

CPU Data Streaming Write Streaming  Enable data streaming  But do not thrash private cache or shared cache Write-through L1 caches Write-through L1 caches Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB) Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB) Cacheable write gathering buffers (for data transformation workloads) Cacheable write gathering buffers (for data transformation workloads)

The Cache

CPU Data Streaming Read Streaming Custom prefetch instruction  separates read streaming from write streaming  L2 cache is not thrashed

Picture taken from: ibm.com/developerworks/library/pa-fpfxbox/ Conclusion

Discussion The End Any Questions?

References  Application Customized CPU Design, ibm.com/developerworks/power/library/pa-fpfxbox/index.html, ibm.com/developerworks/power/library/pa-fpfxbox/index.htmlhttp://www- 128.ibm.com/developerworks/power/library/pa-fpfxbox/index.html  J. Andrews, N. Baker, “Xbox 360 Architecture”, IEEE Macro, vol. 26, no. 2, pp ,  PowerPC – Wikipedia, the free encyclopedia,  Xbox 360 Architecture,