Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATI Stream Computing ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Micah Villmow May 30, 2008.

Similar presentations


Presentation on theme: "ATI Stream Computing ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Micah Villmow May 30, 2008."— Presentation transcript:

1 ATI Stream Computing ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Micah Villmow May 30, 2008

2 | ATI Stream Computing Update | Confidential 22 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Outline ATI Radeon™ HD 3800 Series GPU – What changed. ATI Radeon™ HD 3400/3600 Series and X2 GPU variants ATI Radeon™ HD 4800 – A new architecture? Compute Shader – A new paradigm

3 | ATI Stream Computing Update | Confidential 33 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 3800 Series GPU What Changed Double Precision Memory Controller Modifications Tex Modifications Linear Memory Global Buffer support Limited Render backends

4 | ATI Stream Computing Update | Confidential 44 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ALU Hardware – Double Precision Combine thin pipes together to produce double Combines two F32 components for F64  MSB is in y/w component  LSB is in x/z component Two pipe instructions:  DADD  Double-Float Conversion Ops  DLDEXP  DFRAC  Double Comparison Ops Four pipe instructions:  DFREXP  DMUL  DMAD IL: dmad r10.xy__, r0.xy, r5.xy, r10.xy ISA: 21 x: MULADD_64 T0.x, R5.y, R1.y, T0.y y: MULADD_64 T0.y, R5.y, R1.y, T0.y z: MULADD_64 ____, R5.y, R1.y, T0.y w: MULADD_64 ____, R5.x, R1.x, T0.x t: MULADD R4.y, R5.z, R3.z, T0.z IL: dadd r10.xy__, r0.xy, r5.xy dadd r10.__zw, r0.zw, r5.zw ISA: 20 x: ADD_64 T3.x, R3.y, R1.y y: ADD_64 T3.y, R3.x, R1.x z: ADD_64 T3.z, R3.w, R1.w VEC_120 w: ADD_64 T3.w, R3.z, R1.z t: MULADD T0.w, R4.y, R1.x, T0.w

5 | ATI Stream Computing Update | Confidential 55 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Memory Hardware – Memory Controller Die-shrink from 80nm to 55nm 512-bit ring bus, 256r/256w 72 GB/s bandwidth peak 32-bit memory channels

6 | ATI Stream Computing Update | Confidential 66 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Memory Hardware – Texture Unit Four 32KB four-way associative L1 caches L1 cache size is 4x8KB per SIMD engine Data is split across all four 8K L1 cache’s L1 cacheline is 128 bytes or 2 quads of data 256KB unified cache over all SIMDs

7 | ATI Stream Computing Update | Confidential 77 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Memory Hardware – Linear Layout Tiled Layout P Pitch Width Height Possible wasted space between width and pitch Euclidean coordinates for addressing Macro-micro tiling format is non-linear Outputs through color buffer backend Linear Layout Pitch Height Addressable space is pitch * height No wasted space in allocated texture Linear macro tiling format Outputs through SMX

8 | ATI Stream Computing Update | Confidential 88 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Memory Hardware – RB Changes Memory Controller DPP Array Memory Controller DPP Array ATI Radeon™ HD 2900 Series GPU ATI Radeon™ HD 3800 Series GPU

9 | ATI Stream Computing Update | Confidential 99 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPUs - Improvements 2.5x more floating point compute power than ATI Radeon™ HD 3800 Series GPUs Includes all the features added to ATI Radeon™ HD 3800 Series GPUs Higher bandwidths w/ GDDR5 memory 115GB/s memory bandwidth 1.2 Teraflops peak ALU performance New compute shader paradigm Inter- and Intra- thread sharing

10 | ATI Stream Computing Update | Confidential 10 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPUs – Architecture Features ALU Improvements 10 SIMD engines 16 TP’s per SIMD 5 streaming cores per TP 800 total streaming cores Shared global registers TEX Improvements 4 TEX units per SIMD 40 total TEX units Local data share Global data share MEM Improvements 8KB L1 cache per SIMD 480 GB/s L1 BW 4 32KB L2 caches 384 GB/s L2->L1 BW

11 | ATI Stream Computing Update | Confidential 11 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPUs - Hardware Layout Optimized for distributed memory layout and GDDR5 Various Sections:  ALU – Red  TEX – Brown  MEM – Orange  RAM – Green  PCIE – Blue  Display - Yellow

12 | ATI Stream Computing Update | Confidential 12 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPUs - ALU Units Same ALUs as ATI Radeon™ HD 3800 Series GPUs, just more Integer shifts on all streaming cores Improved double and integer performance 16KB on-chip local data share with write private-read anywhere memory model Global R/W registers per SIMD 32KB on-chip global data share

13 | ATI Stream Computing Update | Confidential 13 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPUs – Memory Hardware – TEX Units

14 | ATI Stream Computing Update | Confidential 14 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPU – Memory Hardware – Memory Controller

15 | ATI Stream Computing Update | Confidential 15 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview ATI Radeon™ HD 4800 Series GPU – Memory Hardware – Render Backends 4 Render backends 256 bit memory lines Write combining cache Global buffer via DB instead of SMX Scratch buffer bandwidth doubled Scatter bandwidth inline with color writes

16 | ATI Stream Computing Update | Confidential 16 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Compute Shader – A New Paradigm A general approach to the compute paradigm Disconnect the output domain from the problem domain Gives more control to the shader writer Read anywhere, write anywhere The new terminology – threads and groups Data sharing – shared registers and local data share Linear memory format

17 | ATI Stream Computing Update | Confidential 17 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Compute Shader – A New Paradigm (cont’d) Removes graphics-centric terminology and ideas An array of parallel processing elements Removes graphics pipeline from the picture (no ES, PS, GS, VS etc.) Inputs and outputs are disconnected from the output domain Domain is now specified by the number of threads to run in a 2D fashion.

18 | ATI Stream Computing Update | Confidential 18 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Compute Shader - Terminology Thread – A single invocation of the kernel Group – A set number of threads that can share data and run together on a single SIMD. Multiple groups can run on a single SIMD if registers allow Shared Registers – Registers that are global to a SIMD Local Data Share – 16KB on-chip memory per SIMD shared between threads in a group Wavefront – group of 64 threads run concurrently on a SIMD Fence – Synchronization mechanism for threads within a group  _threads – Generic barrier that synchronizes all threads to a point  _memory – Synchronize threads on global memory accesses  _sr – Synchronize on Shared Register access  _lds – Synchronize on local data share

19 | ATI Stream Computing Update | Confidential 19 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Data Sharing and Synchronization SR – Globally shared registers –Sharing between all wavefronts in a SIMD –Column sharing on the SIMD –Persistent registers –Atomicity guaranteed in same instruction LDS – Local Data Share –Write local, read global system –Share between all threads in a group –Synchronization required  New Indexing Values – No more vPos/vWinCoord –vTid – ID of thread within a group –vaTid – ID of thread within a domain –vTgroupid – ID of group within a domain

20 | ATI Stream Computing Update | Confidential 20 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Shared Registers Wavefront 1Wavefront 2Wavefront 3Wavefront 4Wavefront 0Wavefront 5Wavefront 7Wavefront 6Shared Registers SIMD 0 Wavefront 1Wavefront 2Wavefront 3Wavefront 4Wavefront 0Wavefront 5Wavefront 7Wavefront 6Shared Registers SIMD N Data is shared between columns of a wavefront per SIMD - Accesses in the same ALU clause are atomic, indexing is not allowed - Shared registers are carved out of the register pool - Same as accessing normal registers

21 | ATI Stream Computing Update | Confidential 21 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview IL SR Usage il_cs_2_0 dcl_cb cb0[1] dcl_shared_temp sr1 add sr0, sr0, r0.1111 mov g[vaTid0.x], sr0 ret end Atomic Read-Modify-Write Uses:  Reductions –Max –Min –Sum –Average  Order Agnostic Data Updates –Histogram –Global Counters –Semaphores

22 | ATI Stream Computing Update | Confidential 22 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Local Data Share 16KB of memory per SIMD, 4 banks of 1k Dwords, max 128 per thread. Write address is based on thread ID, and offsets are static Reads are done by thread ID + offset. Dispatches one write command every cycle Dispatches read over four cycles with waterfall 40-44 cycle latency that needs to be hidden by ALU

23 | ATI Stream Computing Update | Confidential 23 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview LDS Group 1 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront 0 Group 2 Wavefront 0 Wavefront 1 Wavefront 3 Wavefront 2 SIMD 0 Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 SIMD 0 LDS Memory Write self only Group 1 Wavefront 1 Wavefront 2 Wavefront 3 Wavefront 0 Group 2 Wavefront 0 Wavefront 1 Wavefront 3 Wavefront 2 SIMD N Wavefront 0 Wavefront 1 Wavefront 2 Wavefront 3 SIMD N LDS Memory Read Any

24 | ATI Stream Computing Update | Confidential 24 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview IL LDS Usage il_cs_2_0 dcl_cb cb0[1] dcl_num_thread_per_group 1024 dcl_lds_size_per_thread 4 dcl_lds_sharing_mode _wavefrontRel dcl_literal l0, 0x0, 0x04, 0x8, 0x1 mov r0, cb0[0].xxxx lds_write_vec mem, vTid0.x iadd r0, r0, vTid0.x0xx lds_read_vec_sharingMode(abs) r2, r0.x0 mov g[vaTid0.x], r2 ret end

25 | ATI Stream Computing Update | Confidential 25 | ATI Stream Computing – ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Disclaimer & Attribution DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI Logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.


Download ppt "ATI Stream Computing ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Micah Villmow May 30, 2008."

Similar presentations


Ads by Google