Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATI Stream Computing ATI Intermediate Language (IL) Micah Villmow May 30, 2008.

Similar presentations


Presentation on theme: "ATI Stream Computing ATI Intermediate Language (IL) Micah Villmow May 30, 2008."— Presentation transcript:

1 ATI Stream Computing ATI Intermediate Language (IL) Micah Villmow May 30, 2008

2 | ATI Stream Computing Update | Confidential 22 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL – What is it? Device agnostic forward compatible language Called Intermediate Language Portable ISA Can write for lowest common denominator First level to expose new ATI CAL features Allows finely-detailed optimizations Based on Microsoft® DirectX® 9.0 Shader Language

3 | ATI Stream Computing Update | Confidential 33 | ATI Stream Computing – ATI Intermediate Language (IL) Outline Pipeline – A quick recap Instructions  Setup and teardown  ALU  Texture units  Memory access  Functions  Flow Control Examples Future additions

4 | ATI Stream Computing Update | Confidential 44 | ATI Stream Computing – ATI Intermediate Language (IL) Pixel Pipeline IL instructions modify the state of the various stages of the pipeline Declarations instruction the setup engine how to setup the graphics card correctly ALU instructions instruct the stream processing units what do calculate TEX instructions instruct the texture units what data to fetch Global buffer accesses instruct the shader export path to get correct data Color buffer instructions send data through the render backends

5 | ATI Stream Computing Update | Confidential 55 | ATI Stream Computing – ATI Intermediate Language (IL) Compute Pipeline ATI Radeon™ HD 4800 Series GPUs introduce compute shader Pipeline now includes LDS, GDS, and SR Dedicated L1 per SIMD on ATI Radeon™ HD 4800 Series GPUs

6 | ATI Stream Computing Update | Confidential 66 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instruction Syntax The language to write CAL Shader A portable immediate language for AMD GPUs Resembles DirectX® assembly ATI IL kernel follows basic pattern of: 1.Setup state 2.Read texture data 3.Compute results 4.Write results

7 | ATI Stream Computing Update | Confidential 77 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions - Setup Shader Type:  il_ps_2_0 – IL Pixel Shader version 2.0  il_cs_2_0 – IL Compute Shader version 2.0 Inputs:  dcl_input_position_interp(linear_noperspective)_centered vWinCoord0.xy__ - Interpolated X/Y float coordinates  dcl_input vObjIndex0 - Auto-indexed integer value Outputs:  dcl_output_generic oN - Declare that color output buffer number N will be used, max N is 8 on R6XX based cards and 16 on R7XX Constants:  dcl_cb cbN[X] – Declare that constant buffer N will be used of size X, N is between 0-14, max X is 4096 Literals:  dcl_literal lN,,,, - Declare that literal number N will be used with four values Resources:  dcl_resource(N)_type([1d|2d],[unnorm|norm])_fmtx(TYPE)_fmty(TYPE) _fmtz(TYPE)_fmtw(TYPE) Scratch Buffer:  dcl_indexed_temp_array N[X] – Declare that scratch buffer N will be used of size X, max size 4096 Compute Shader:  dcl_num_thread_per_group N - Declare that N threads will be working together in one group Local Data Share:  dcl_lds_size_per_thread N – Declare that each thread will use N dwords of LDS, must be multiple of 4 and <= 64  dcl_lds_sharing_mode _wavefront[Rel|Abs] – Declare that sharing mode of LDS uses relative or absolute addressing Global Shared Registers:  dcl_shared_temp srN – Declare that the kernel will use N shared registers.

8 | ATI Stream Computing Update | Confidential 88 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions - Registers vObjIndex0.x – Integer register that stores the index of the thread within the domain vWinCoord0.xy – Floating point register that stores the Euclidean coordinates of the thread within the domain r# - General Purpose Registers that are 128 bits wide x#[idx] – Scratch buffer register to read/write at offset idx l# - Literal register that is 128 bits wide cb#[idx] – constant buffer access to read from offset idx g[idx] – Global buffer read/write register o# - Output buffer register

9 | ATI Stream Computing Update | Confidential 99 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions – Syntax/Opcodes Instruction syntax [_ ][_ ] <= opcode with specifiers [ [_ ][. ]] <= dst with modifier/mask [, [_ ][. ]] <= src with modifier/mask Sample Opcodes  ALU: –mad r0, r1, r2, r3 // r0 = r1 * r2 + r3 –dmul r0.xy, r1.xy, r2.xy // same as above but with doubles  TEX: –sample_resource(0)_sampler(0) r0, vWinCoord0.xy00 –sample_l_resource(0)_sampler(0) r0, vWinCoord0.xy00, r0.1000 // sample instruction required in loops  MEM: –lds_read_vec r0, vTid0.x0 // read from the current threads lds space at offset 0 –lds_write_vec mem.xy__, vaTid0.xxxx // Write the absolute thread id

10 | ATI Stream Computing Update | Confidential 10 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions – Write Masks/Read Swizzles Write Masks:  Each destination can have a write mask  There are four possible combinations for each component –Component – The original component position, which means write results –‘_’ – Do not write the results of this component to the register –‘0’ – Write the value 0.0f to the destination component –‘1’ – Write the value 1.0f to the destination component  Example: “mov r0.x10w, vWinCoord0.xy”, Places copies x element over and places y element in the w component of r0. Read Swizzles:  Each source register can have a read swizzle  The read swizzle reorders the way in which data is read  Read swizzles are extended based on the last swizzle used to fill the vector. i.e. r0.xy is equivalent to r0.xyyy  Each component can have up to one of 6 options –Component – Each position in the 4 vector can have a component specified, i.e. {xyzw} and there is no restriction on ordering –‘1’ – Use the value 1.0f as the source component –‘0’ – Use the value 0.0f as the source component  Example “mov r0, r0.wzyx” – Reverses the data in a register

11 | ATI Stream Computing Update | Confidential 11 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions – Functions Functions are possible in IL following a few constraints: 1.Must begin with “func ” 2.Must end with “endfunc” 3.Must use “ret” before “endfunc” 4.Only use “ret_dyn” for early_return 5.Must be placed after main function Main function must use “endmain” if functions are in use To call a function, use “call ” or the conditional versions.

12 | ATI Stream Computing Update | Confidential 12 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions - Example il_ps_2_0 dcl_literal l0, 0x40800000, 0x3f800000, 0x40000000, 0x40400000 dcl_cb cb0[2] dcl_input_position_interp(linear_noperspective)_centered vWinCoord0.xy__ dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmt w(float) dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmt w(float) mul r0, vWinCoord0.xy00, l0.z100 add r17, r0.xyxy, l0.00y0 mov r33, r0.xyxy div_zeroop(infinity) r33, r0.1111, r33 sample_resource(1)_sampler(1) r101, r17.xy00 sample_resource(1)_sampler(1) r102, r17.zw00 mul r35, r35, r33 add r101, r101, r35_neg(xyzw) mad r35, r101, r42.xxxx, r35 mul r36, r36, r33 add r102, r102, r36_neg(xyzw) mad r36, r102, r42.xxxx, r36 mad r19.x, r0.y, cb0[0].z, r0.x ftoi r21.x, r19.x mov o0, r35 mov g[r21.x], r36 ret end ;PS; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(7) KCACHE0(CB0:0-15) 0 x: MOV*2 R0.x, R0.x y: MOV R0.y, R0.y z: MULADD R0.z, R0.x, (0x40000000, 2.0f).x, 1.0f 1 z: MULADD R4.z, PV0.y, KC0[0].z, PV0.x w: MOV R0.w, PV0.y t: RCP_e R1.z, PV0.x 01 TEX: ADDR(64) CNT(2) VALID_PIX 2 SAMPLE R2, R0.xyxx, t1, s1 UNNORM(XYZW) 3 SAMPLE R3, R0.zwzz, t1, s1 UNNORM(XYZW) 02 ALU: ADDR(39) CNT(22) 4 x: MUL T2.x, 0.0f, R1.z t: RCP_e ____, R0.y 5 x: ADD T0.x, R2.z, -PV4.x y: ADD T0.y, R3.x, -PV4.x z: ADD ____, R2.x, -PV4.x VEC_120 w: MUL T0.w, 0.0f, PS4 t: ADD T1.x, R3.z, -PV4.x 6 x: ADD ____, R3.y, -PV5.w y: ADD ____, R2.y, -PV5.w VEC_120 z: ADD T0.z, R3.w, -PV5.w w: ADD ____, R2.w, -PV5.w VEC_120 t: MULADD R0.x, PV5.z, 0.0f, T2.x VEC_021 7 x: MULADD R2.x, T0.y, 0.0f, T2.x y: MULADD R0.y, PV6.y, 0.0f, T0.w z: MULADD R0.z, T0.x, 0.0f, T2.x w: MULADD R0.w, PV6.w, 0.0f, T0.w t: MULADD R2.y, PV6.x, 0.0f, T0.w VEC_021 8 z: MULADD R2.z, T1.x, 0.0f, T2.x w: MULADD R2.w, T0.z, 0.0f, T0.w t: F_TO_I ____, R4.z 9 t: MULLO_INT R3.x, PS8, (0x00000004, 5.605193857e-45f).x 03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R3.x], R2, ELEM_SIZE(3) 04 EXP_DONE: PIX0, R0 END_OF_PROGRAM GprPoolSize = 122 CodeLen = 544;Bytes SQ_PGM_END_CF = 5; words(64 bit) SQ_PGM_END_ALU = 61; words(64 bit) SQ_PGM_END_FETCH = 68; words(64 bit) ;SQ_PGM_RESOURCES = 0x00000005 SQ_PGM_RESOURCES:NUM_GPRS = 5

13 | ATI Stream Computing Update | Confidential 13 | ATI Stream Computing – ATI Intermediate Language (IL) ATI IL Instructions – Flow Control Flow control is based on the result of comparison instructions.  4 signed integer comparison instructions + negation  2 unsigned integer comparison instructions  4 floating point comparison instructions  4 double comparison instructions Flow control consists of:  if-else-endif or if-endif  call-return in static and conditional versions  switch-case1…n-endswitch  whileloop-continue/break-endloop

14 | ATI Stream Computing Update | Confidential 14 | ATI Stream Computing – ATI Intermediate Language (IL) Example Slide 1 – Output IL & Input IL il_ps_2_0 dcl_output_generic o0 dcl_literal l0, 1.0, 0.5, 0.5, 0.5 mov o0, l0 end il_ps_2_0 dcl_input_position_interp(linear_noperspective)_centered_center vWinCoord0.xy__ dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw (float) sample_resource(0)_sampler(0) o0, vWinCoord0.xy end

15 | ATI Stream Computing Update | Confidential 15 | ATI Stream Computing – ATI Intermediate Language (IL) Example Slide 2 - Bursting il_cs_2_0 dcl_cb cb0[1] dcl_num_thread_per_group 64 itof r0.z, vaTid0.x div r0.y, r0.z, cb0[0].x mod r0.x, r0.z, cb0[0].x flr r0, r0 mul r0.x, r0.x, cb0[0].z dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) imul r0.w, vaTid0.x, cb0[0].w sample_resource(0)_sampler(0) r1, r0.xy add r0.x, r0.x, r0.1 sample_resource(0)_sampler(0) r2, r0.xy add r0.x, r0.x, r0.1 sample_resource(0)_sampler(0) r3, r0.xy add r0.x, r0.x, r0.1 sample_resource(0)_sampler(0) r4, r0.xy add r0.x, r0.x, r0.1 mov g[r0.w + 0], r1 mov g[r0.w + 1], r2 mov g[r0.w + 2], r3 mov g[r0.w + 3], r4 end export_burst_perf.exe –w 2048 –h 2048 –t –e –r -2 Burst 1 Perf: 88.73GB/s Burst 2 Perf: 104.98GB/s Burst 3 Perf: 111.39GB/s Burst 4 Perf: 114.49GB/s 03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(0) Export Instruction: 03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(1)03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(2)03 MEM_GLOBAL_WRITE_IND: DWORD_PTR[0+R0.x], R7, ELEM_SIZE(3) BRSTCNT(3) 115.2GB/s Peak

16 | ATI Stream Computing Update | Confidential 16 | ATI Stream Computing – ATI Intermediate Language (IL) Example 3 – Scratch Buffer il_ps_2_0 dcl_input_position_interp(linear_noperspective)_centered_center vWinCoord0.xy__ dcl_output_generic o0 dcl_indexed_temp_array x0[2] dcl_cb cb0[1] mov r6, r6.0000 flr r5, vWinCoord0.xy ftoi r0.x, cb0[0].y ftoi r2.x, cb0[0].z mad r3, r5.y, cb0[0].x, r5.x mad r4, r5.y, cb0[0].x, r5.x mov x0[r0.x], r3 mov x0[r2.x], r4 add r0.x, r0.x, cb0[0].y add r2.x, r2.x, cb0[0].y add r5, x0[r0.x], x0[r2.x] add r6, r5, r6 mov o0, r6 end

17 | ATI Stream Computing Update | Confidential 17 | ATI Stream Computing – ATI Intermediate Language (IL) Example 4 – LDS & Shared Registers il_cs_2_0 dcl_cb cb0[1] dcl_num_thread_per_group 64 dcl_lds_size_per_thread 4 dcl_lds_sharing_mode _wavefrontRel dcl_literal l0, 64, 64, 64, 4 iadd r0, vTid0.x0, cb0[0].x0 mov r2, r2.0000 iadd r0.x, r0.x, cb0[0].y iadd r0.y, r0.y, l0.w and r0.x, r0.x, l0.x lds_read_vec r1, r0.xy fence_lds_threads add r2, r2, r1 lds_write_vec mem, r2 end il_cs_2_0 dcl_cb cb0[1] dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(fl oat)_fmtw(float) dcl_num_thread_per_group 64 dcl_shared_temp sr1 dcl_lds_size_per_thread 4 dcl_literal l0, 0, 0, 0, 0 dcl_literal l1, 0, 1, 41, 0x000000FF mov r0, r0.0000 if_logicalz vTgroupid0.x mov sr0.x, vaTid0.x mov r0.x, sr0.x else ieq r1.w, vTgroupid0.x, l1.y cmov_logical r0.x, r1.w, sr0.x, l1.z endif mov r0.z, vTgroupid0.x mov g[vaTid0.x], r0 ret endmain end

18 | ATI Stream Computing Update | Confidential 18 | ATI Stream Computing – ATI Intermediate Language (IL) Disclaimer & Attribution DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI Logo, Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft and DirectX are trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.


Download ppt "ATI Stream Computing ATI Intermediate Language (IL) Micah Villmow May 30, 2008."

Similar presentations


Ads by Google