Presentation is loading. Please wait.

Presentation is loading. Please wait.

15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer.

Similar presentations


Presentation on theme: "15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer."— Presentation transcript:

1 15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer

2 15-745 Spring 2006Slide 2Wavescalar ILP in Modern Architecture Lots of available ILP in software –Execute in parallel for greater performance Superscalar processors can’t tap it –Serialized by PC Superscalar doesn’t scale Data-flow approaches can cheaply leverage existing parallelism

3 15-745 Spring 2006Slide 3Wavescalar Introduction WaveCache and Wavescalar ISA Evaluation and Results Does WaveCache make sense? Compiler challenges

4 15-745 Spring 2006Slide 4Wavescalar Wavescalar: Basics ALU-in-cache data-flow architecture –No centralized, broadcast-based resources Compile data-flow binaries

5 15-745 Spring 2006Slide 5Wavescalar Wavescalar: Waves Instructions  architecture Programs broken into waves –Block with single entry Use wave number to tag data –Disambiguates data from multiple iterations

6 15-745 Spring 2006Slide 6Wavescalar Wavescalar: Memory Relaxed program order –Follow control-flow –Obey dependencies Distributed store buffers Hardware coherence

7 15-745 Spring 2006Slide 7Wavescalar Evaluation WaveCache –4 MB of on-chip instructions + data, 2K ALUs WaveCache vs. superscalar –16-wide OOO, 1K registers, 1K window WaveCache vs. TRIPS –4 16-wide in-order cores, 2 MB on-chip cache Key assumption: perfect memory Fair comparisons? Is it reasonable to assume perfect memory?

8 15-745 Spring 2006Slide 8Wavescalar Results WaveCache out- performs superscalar Similar performance to TRIPS

9 15-745 Spring 2006Slide 9Wavescalar Memory is the problem, not ILP Data-flow exposes greater ILP Memory not fast enough for low-ILP CPUs –Processor-memory performance gap What does perfect memory hide? –Does superscalar perform better? Did not model hardware coherence WaveCache needs MORE bandwidth than a superscalar

10 15-745 Spring 2006Slide 10Wavescalar Is WaveScalar Scalable? Sub-linear performance improvement –More clusters further away from memory SPEC, MediaBench fit easily in memory What happens to performance when the working set doesn’t fit in WaveCache?

11 15-745 Spring 2006Slide 11Wavescalar Compiler Challenges Wave identification –Can waves be optimized for performance? Handling path explosion –1 BR/5 inst  1050 loaded for 100 executed?

12 15-745 Spring 2006Slide 12Wavescalar Compiler Challenges Semi-static instruction placement –Fetch partial/complete waves –Loads/stores close to memory –Clustering neighboring instructions –Reduce coherence traffic


Download ppt "15-745 Spring 2006 Wavescalar S. Swanson, et al. Computer Science and Engineering University of Washington Presented by Brett Meyer."

Similar presentations


Ads by Google