Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Exploiting 0n-Chip Bandwidth The vector ISA + compiler technology uses high bandwidth to mask latency Compiled matrix-vector multiplication: 2.

Similar presentations


Presentation on theme: "Slide 1 Exploiting 0n-Chip Bandwidth The vector ISA + compiler technology uses high bandwidth to mask latency Compiled matrix-vector multiplication: 2."— Presentation transcript:

1 Slide 1 Exploiting 0n-Chip Bandwidth The vector ISA + compiler technology uses high bandwidth to mask latency Compiled matrix-vector multiplication: 2 Flops/element –Easy compilation problem; stresses memory bandwidth –Compare to 304 Mflops (64-bit) for Power3 (hand-coded) –Performance normally scales with number of lanes –Need more memory banks than default DRAM macro

2 Slide 2 Compiling Media Kernels on IRAM The compiler generates code for narrow data widths, e.g., 16-bit integer Compilation model is simple, more scalable (across generations) than MMX, VIS, etc. –Strided and indexed loads/stores simpler than pack/unpack –Maximum vector length is longer than datapath width (256 bits); all lane scalings done with single executable

3 Slide 3 Protein Folding on IRAM? Vectorization of basic algorithms well-known, e.g., –Spectral methods (large FFTs); probably hand-code inner FFT –Naïve O(n 2 ) algorithm for forces vectorizes over atoms »Hierarchical methods (fast multipole) also vectorize over the inner loop (e.g., mvm) or by packing a set of interaction eval’s –Monte Carlo methods vectorize Difficulty comes from handling irregularities in the hardware –Unpredictable network delays, processor failures,… –Leads to an event-driven model: compute on the next pair of atoms when the 2 nd one arrives IRAM benefits from larger units of work –E.g., compute a set if interactions when then next chunk of k atoms arrives; vectorization/parallelism within a chunk –Larger messages also can amortize message overhead


Download ppt "Slide 1 Exploiting 0n-Chip Bandwidth The vector ISA + compiler technology uses high bandwidth to mask latency Compiled matrix-vector multiplication: 2."

Similar presentations


Ads by Google