Xbox 360 Architecture Presenter: Ataç Deniz Oral Date: 30/11/06
Overview The Xbox What kind of computation? Architectural details Decisions / Trade-offs Conclusion Discussion
Photos taken from: The Xbox
Computation Decompression kernel Game World Geometry Data streaming also AI software Audio synthesis
Picture taken from: C17.S8/HC17.S8T4.pdf Why not use a PC? ●Dot product implementation ●Support for D3D formats
IBM PowerPC core 4 KB two- way set- associative BHT SIMD Vector unit Floating Point Unit Fixed Point UnitLoad/Store Unit
The Cache
Decisions / Trade Offs Why multiple cores? (CMP versus SMP) Cost-effective! Enables shared L2 implementation (therefore reduces communication latency)
Decisions / Trade Offs (cont.) Shared L2 Cache To adapt to varying workloads i.e. Scene management vs. audio processing
Decisions / Trade Offs (cont.) In-order instruction issuance cores Simplifies logic Reduced die area Reduced cost and power consumption Out-of-order issuance requires Additional pipeline stages to meet clock period timing Rename registers and completion queues In-order instruction execution Claimed to be justified by two SMT (Symmetric MultiThreading) hardware threads per core
Computation Decompression kernel Game World Geometry Data streaming
CPU Data Streaming Write Streaming Enable data streaming But do not thrash private cache or shared cache Write-through L1 caches Write-through L1 caches
CPU Data Streaming Write Streaming Enable data streaming But do not thrash private cache or shared cache Write-through L1 caches Write-through L1 caches Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB) Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB)
The Cache
CPU Data Streaming Write Streaming Enable data streaming But do not thrash private cache or shared cache Write-through L1 caches Write-through L1 caches Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB) Uncached write gathering buffers in shared L2 for each core (for later dumping to FSB) Cacheable write gathering buffers (for data transformation workloads) Cacheable write gathering buffers (for data transformation workloads)
The Cache
CPU Data Streaming Read Streaming Custom prefetch instruction separates read streaming from write streaming L2 cache is not thrashed
Picture taken from: ibm.com/developerworks/library/pa-fpfxbox/ Conclusion
Discussion The End Any Questions?
References Application Customized CPU Design, ibm.com/developerworks/power/library/pa-fpfxbox/index.html, ibm.com/developerworks/power/library/pa-fpfxbox/index.htmlhttp://www- 128.ibm.com/developerworks/power/library/pa-fpfxbox/index.html J. Andrews, N. Baker, “Xbox 360 Architecture”, IEEE Macro, vol. 26, no. 2, pp , PowerPC – Wikipedia, the free encyclopedia, Xbox 360 Architecture,