Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.

Similar presentations


Presentation on theme: "Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri."— Presentation transcript:

1 Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri

2 Outline Context & Introduction Rigel Design Goals Rigel Architecture Design Elements Estimates for the Rigel Design Conclusion

3 Context & Introduction Accelerator(e.g. GPUs): a hardware entity designed to provide advantages for a specific class of applications including: higher performance, lower power, or lower unit cost compared to a general-purpose CPUs. Accelerator: maximize throughput(operations/sec) CPU: minimize latency (sec/operation)

4 Context & Introduction Challenges: Inflexible programming models Lack of conventional memory model Hard to scale irregular parallel apps Challenges lead to: Operations / area ($) Operations / Watt (power) Operations / Programmer Effort

5 Rigel Design Goals What: Future programming models  Apps and models may not exist yet  Flexible design: easier to retarget How: Focus on scalability, programmer effort  Raised hardware/software interface  Focusing design effort: five elements

6 Rigel Architecture  Area-optimized  Dual-issue  In-order  RISC-like ISA(instruction set architecture)  Single-precision  Floating-point  Registers

7 Rigel Architecture

8 45nm technology, 320mm 2 Rigel chip: (1024cores) frequency of 1.2 GHz: a peak throughput of 2.4 TFLOPS

9 Design Elements 1.Execution Model: ISA, SIMD vs. MIMD, VLIW vs. OoOE, MT 2.Memory Model: Caches vs. scratchpad, ordering, coherence 3.Work Distribution: Scheduling, spectrum of SW/HW choices 4.Synchronization: Scalability, influence on prog. model\ 5.Locality Management

10 Element 1: Execution Model  Tradeoff 1: MIMD vs. SIMD -Irregular data parallelism -Task parallelism  Tradeoff 2: Latency vs. Throughput -Simple in-order cores  Tradeoff 3: Full RISC ISA vs. Specialized Cores

11 Element 2: Memory Model  Tradeoff 1: Single vs. Multiple address space  Tradeoff 2: Hardware caches vs. scratchpads -Hardware exploits locality -Software manages global sharing  Tradeoff 3: Hierarchical vs. Distributed -Cluster cache/global cache hierarchy -ISA provides local/global memory operations -Non-uniformity: Programmer effort

12 Element 3: Work Distribution  Tradeoff (Spectrum):HW vs. SW Implementation -software task management: Hierarchical queues -Flexible policies + little specialized hardware  Rigel Task Model

13 Rigel Task Model

14 Rigel Task Model Evaluation

15 Element 4: Synchronization  Coherence mechanisms: 1. Control synchronization 2. Data sharing  Broadcast update -use cases: flags and barriers -reduce contention from polling

16 Area estimates for the Rigel Design

17 Conclusions Although Rigel is not yet a physical chip, the whole idea is novel and feasible. Future Work: Element five: Locality Management The Rigel design strikes a balance between performance and programmability

18 References https://rigel.crhc.illinois.edu/ http://users.crhc.illinois.edu/sjp/Sanjay_J._Patels_Homep age/Rigel.html Rigel: A Scalable Architecture for 1000+ Core Accelerators, Daniel R. Johnson et al, SAAHPC'09. The PowerPoint Presented at the 36th Annual International Symposium on Computer Architecture June 22nd, 2009 by John H. Kelm et al, UIUC


Download ppt "Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri."

Similar presentations


Ads by Google