University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,

University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark, Amir Hormati, Scott Mahlke, Sami Yehia *, Krisztián Flautner * University of Michigan *ARM Ltd.

University of Michigan Electrical Engineering and Computer Science 2 Computational Efficiency Low power envelope More useful work/transistors Hardware accelerators Niagara II encryption engine Source: AMD Analyst Day 12/14/06

University of Michigan Electrical Engineering and Computer Science 3 How Are Accelerators Used? Control statically placed in binary CPU Accel. Program

University of Michigan Electrical Engineering and Computer Science 4 Problem With Static Control Not forward/backward compatible CPU Accel. Program CPU Accel.

University of Michigan Electrical Engineering and Computer Science 5 Solution: Virtualization Statically identify accelerated computation Abstract accelerator features Dynamically retarget binary Proc. Accel. Program Proc. Accel. Trans. Engineer/ Compiler

University of Michigan Electrical Engineering and Computer Science 6 Liquid SIMD Virtualize SIMD accelerators Why virtualize SIMD? –Intel MMX to SSE2 –ARM v6 to Neon –Wide vectors useful [Lin 06]

University of Michigan Electrical Engineering and Computer Science 7 SIMD Accelerator Assumptions Same instruction stream Separate pipeline – memory interface Fetch Decode Scalar Exec SIMD Exec Retire

University of Michigan Electrical Engineering and Computer Science 8 Use scalar ISA to represent SIMD operations –Compatibility, low overhead Key: easy to translate How to Virtualize Program Branch

University of Michigan Electrical Engineering and Computer Science 9 Virtualization Architecture Fetch Decode Execute Retire Accel. uCode Cache Trans.

University of Michigan Electrical Engineering and Computer Science 10 1. Data Parallel Operations for(i = 0; i < 8; i++) { r1 = A[i]; r2 = B[i]; r3 = r1 + r2; r4 = r3 & constant; C[i] = r4; } + & A B + & A B + & A B C

University of Michigan Electrical Engineering and Computer Science 11 1a. What If There’s No Scalar Equivalent? for(i = 0; i < 8; i++) { r1 = A[i]; r2 = B[i]; r3 = r1 + r2; cmp r3, #FF; r3 = movgt #FF;... } SADD A B Idioms can always be constructed

University of Michigan Electrical Engineering and Computer Science 12 2. Scalarizing Permutations & + for(i = 0; i < 8; i++) { … r1 = r2 + r3; tmp[i] = r1 } for(i = 0; i < 8; i++) { r1 = offset[i]; r2 = tmp[r1 + i] r3 = r2 & const … } offset = {4, 4, 4, 4, -4, -4, -4, -4} & + & +

University of Michigan Electrical Engineering and Computer Science 13 3. Scalarizing Reductions + for(i = 0; i < 8; i++) { … r1 = A[i]; r2 = r2 + r1; … }

University of Michigan Electrical Engineering and Computer Science 14 Applied to ARM Neon All instructions supported except… VTBL – indirect indexing v1 = vtbl v2, v3 Interleaved memory accesses Not needed in evaluated benchmarks v3 1 0 1 3 v2 v1 Mem

University of Michigan Electrical Engineering and Computer Science 15 Translation to SIMD Update induction variable Use inverse of defined translation rules for(i = 0; i < 8; i++) { r1 = A[i]; r2 = B[i]; r3 = r1 + r2; r4 = offset[i]; C[i + r4] = r3; } for(i = 0; i < 8; i += 4) { v1 = A[i]; v2 = B[i]; v3 = v1 + v2; v4 = v3 & constant } for(i = 0; i < 8; i += 4) { v1 = A[i]; v2 = B[i]; v3 = v1 + v2; v4 } i += 4 for(i = 0; i < 8; i += 4) { v1 = A[i]; v2 = B[i]; v3 = v1 + v2; v4 = offset[i]; } for(i = 0; i < 8; i += 4) { v1 = A[i]; v2 = B[i]; v3 = v1 + v2; v3 = shuffle v3; C[i] = v3; }

University of Michigan Electrical Engineering and Computer Science 16 Translator Design Translator: efficiency, speed, flexibility Proc. Accel. Program Proc. Accel. Trans. Engineer/ Compiler

University of Michigan Electrical Engineering and Computer Science 17 Evaluation Trimaran ARM Hand SIMDized loops SimpleScalar model ARM926 w/ Neon SIMD VHDL translator, 130nm std. cell

University of Michigan Electrical Engineering and Computer Science 18 Liquid SIMD Issues Code bloat –<1% overhead beyond baseline Register pressure –Not a problem Translator cost –0.2 mm 2 + 2KB cache Translation overhead

University of Michigan Electrical Engineering and Computer Science 19 Translation Overhead SPECfp MediaBenchKernels

University of Michigan Electrical Engineering and Computer Science 20 Summary Accelerators are more common and evolving –Costly binary migration SIMD virtualization using scalar ISA –One binary: forward/backward compatibility –Negligible overhead

University of Michigan Electrical Engineering and Computer Science 21 Questions ? ? ? ? ? ? ? ? ? ? ? ?

University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,

Similar presentations

Presentation on theme: "University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,

Similar presentations

Presentation on theme: "University of Michigan Electrical Engineering and Computer Science 1 Liquid SIMD: Abstracting SIMD Hardware Using Lightweight Dynamic Mapping Nathan Clark,"— Presentation transcript:

Similar presentations

About project

Feedback