Please do not distribute

Name: Please do not distribute
Uploaded: 2018-02-03T18:08:12+00:00
Duration: PTM22S33
Channel: Alayna Siers
Description: Please do not distribute

Please do not distribute
4/10/2017 A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, David Brooks Harvard University GYW

Beyond Homogeneous Parallelism
General-Purpose Cores (CPU) Programmable Accelerators (DSP, GPU) Application-Specific Accelerator (ASIP, ASIC) Energy Efficiency Flexibility Programmability Design Cost

4/10/2017 Today’s SoC OMAP 4 SoC GYW

4/10/2017 Today’s SoC DMA ARM Cores GPU DSP SD USB Audio Video Face Imaging System Bus Secondary Bus Tertiary OMAP 4 SoC GYW

4/10/2017 Today’s SoC Apple A7 Harvard VLSI-ARCH Group SoC Tapeout GYW

4/10/2017 Today’s SoC GPU/DSP CPU Buses Mem Inter- face Acc GYW

Future Accelerator-Centric Architectures
Please do not distribute 4/10/2017 Future Accelerator-Centric Architectures GPU/DSP Big Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators Small Cores How to decompose an application to accelerators? How to rapidly design lots of accelerators? How to design and manage the shared resources? Flexibility Design Cost Programmability GYW

4/10/2017 Aladdin: A pre-RTL, Power-Performance Accelerator Simulator Shared Memory/Interconnect Models Unmodified C-Code Accelerator Design Parameters (e.g., # FU, mem. BW) Private L1/ Scratchpad Aladdin Accelerator Specific Datapath Power/Area Performance “Accelerator Simulator” Design Accelerator-Rich SoC Fabrics and Memory Systems “Design Assistant” Understand Algorithmic-HW Design Space before RTL Flexibility Programmability Design Cost GYW

Future Accelerator-Centric Architecture
Please do not distribute 4/10/2017 Future Accelerator-Centric Architecture GPU/DSP Big Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators Small Cores GYW

Future Accelerator-Centric Architecture
Please do not distribute 4/10/2017 Future Accelerator-Centric Architecture GPU/DSP Big Cores Shared Resources Memory Interface Sea of Fine-Grained Accelerators Small Cores Aladdin can rapidly evaluate large design space of accelerator-centric architectures. GYW

4/10/2017 Aladdin Overview Optimization Phase Realization Phase Optimistic IR Initial DDDG Idealistic C Code Dynamic Data Dependence Graph (DDDG) Program Constrained DDDG Resource Power/Area Models Performance Activity Acc Design Parameters Power/Area GYW

4/10/2017 Aladdin Overview Optimization Phase Optimistic IR Initial DDDG Idealistic DDDG C Code Performance Activity Program Constrained DDDG Resource Constrained DDDG Acc Design Parameters Power/Area Models Power/Area Realization Phase GYW

4/10/2017 From C to Design Space C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; GYW

From C to Design Space IR Dynamic Trace
Please do not distribute 4/10/2017 From C to Design Space IR Dynamic Trace 0. r0=0 //i = 0 r4=load (r0 + r1) //load a[i] r5=load (r0 + r2) //load b[i] r6=r4 + r5 store(r0 + r3, r6) //store c[i] r0=r //++i r4=load(r0 + r1) //load a[i] r5=load(r0 + r2) //load b[i] r0 = r //++i … C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; GYW

From C to Design Space Initial DDDG
Please do not distribute 4/10/2017 From C to Design Space Initial DDDG IR Trace: 0. r0=0 //i = 0 1. r4=load (r0 + r1) //load a[i] 2. r5=load (r0 + r2) //load b[i] 3. r6=r4 + r5 4. store(r0 + r3, r6) //store c[i] 5. r0=r //++i 6. r4=load(r0 + r1) //load a[i] 7. r5=load(r0 + r2) //load b[i] 8. r6=r4 + r5 9. store(r0 + r3, r6) //store c[i] 10.r0 = r //++i … 0. i=0 1. ld a 2. ld b 3. + 4. st c 5. i++ 6. ld a 7. ld b 8. + 9. st c C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; 10. i++ 11. ld a 12. ld b 13. + 14. st c GYW

From C to Design Space Idealistic DDDG
Please do not distribute 4/10/2017 From C to Design Space Idealistic DDDG IR Trace: 0. r0=0 //i = 0 1. r4=load (r0 + r1) //load a[i] 2. r5=load (r0 + r2) //load b[i] 3. r6=r4 + r5 4. store(r0 + r3, r6) //store c[i] 5. r0=r //++i 6. r4=load(r0 + r1) //load a[i] 7. r5=load(r0 + r2) //load b[i] 8. r6=r4 + r5 9. store(r0 + r3, r6) //store c[i] 10.r0 = r //++i … 0. i=0 0. i=0 5. i++ 10. i++ 11. ld a 12. ld b 13. + 14. st c 6. ld a 7. ld b 8. + 9. st c 1. ld a 2. ld b 3. + 4. st c 5. i++ 1. ld a 2. ld b C Code: for(i=0; i<N; ++i) c[i] = a[i] + b[i]; 10. i++ 6. ld a 7. ld b 3. + 11. ld a 12. ld b 8. + 4. st c 13. + 9. st c 14. st c GYW

From C to Design Space Optimization Phase: C->IR->DDDG
Please do not distribute 4/10/2017 From C to Design Space Optimization Phase: C->IR->DDDG Include application-specific customization strategies. Node-Level: Bit-width Analysis Strength Reduction Tree-height Reduction Loop-Level: Remove dependences between loop index variables Memory Optimization: Memory-to-Register Conversion Store-Load Forwarding Store Buffer Extensible e.g. Model CAM accelerator by matching nodes in DDDG GYW

From C to Design Space One Design
Please do not distribute 4/10/2017 From C to Design Space One Design MEM + Resource Activity Idealistic DDDG Cycle 0. i=0 5.i++ 6. ld a 7. ld b 8. + 9. st c 1. ld a 2. ld b 3. + 4. st c 0. i=0 5.i++ 10. i++ 15. i++ 1. ld a 2. ld b 6. ld a 7. ld b 11. ld a 12. ld b 16. ld a 17. ld b 3. + 8. + 13. + 18. + 4. st c 9. st c 14. st c 19. st c Acc Design Parameters: Memory BW <= 2 1 Adder GYW

From C to Design Space Another Design
Please do not distribute 4/10/2017 From C to Design Space Another Design MEM + Resource Activity Idealistic DDDG Cycle 0. i=0 5.i++ 10. i++ 11. ld a 12. ld b 13. + 14. st c 7. ld b 8. + 9. st c 1. ld a 2. ld b 3. + 4. st c 15. i++ 16. ld a 17. ld b 18. + 19. st c 6. ld a 0. i=0 5.i++ 10. i++ 15. i++ 1. ld a 2. ld b 6. ld a 7. ld b 11. ld a 12. ld b 16. ld a 17. ld b 3. + 8. + 13. + 18. + 4. st c 9. st c 14. st c 19. st c Acc Design Parameters: Memory BW <= 4 2 Adders GYW

From C to Design Space Realization Phase: DDDG->Estimates
Please do not distribute 4/10/2017 From C to Design Space Realization Phase: DDDG->Estimates Constrain the DDDG with program and user-defined resource constraints Program Constraints Control Dependence Memory Ambiguation Resource Constraints Loop-level Parallelism Loop Pipelining Memory Ports # of FUs (e.g., adders, multipliers) GYW

From C to Design Space Power-Performance per Design
Please do not distribute 4/10/2017 From C to Design Space Power-Performance per Design Acc Design Parameters: Memory BW <= 4 2 Adders Power Acc Design Parameters: Memory BW <= 2 1 Adder Cycle GYW

From C to Design Space Design Space of an Algorithm
Please do not distribute 4/10/2017 From C to Design Space Design Space of an Algorithm Power Cycle GYW

4/10/2017 Aladdin Validation Aladdin C Code Power/Area Performance ModelSim Design Compiler Verilog Activity GYW

4/10/2017 Aladdin Validation Aladdin C Code Power/Area Performance RTL Designer Design Compiler Verilog Activity HLS C Tuning Vivado HLS ModelSim GYW

4/10/2017 Aladdin Validation GYW

Aladdin enables rapid design space exploration for accelerators.
Please do not distribute 4/10/2017 Aladdin enables rapid design space exploration for accelerators. 7 mins Aladdin C Code Power/Area Performance 52 hours RTL Designer Design Compiler Verilog Activity HLS C Tuning Vivado HLS ModelSim GYW

4/10/2017 Aladdin enables pre-RTL simulation of accelerators with the rest of the SoC. GPGPU-Sim GPU MARSx86 ... XIOSim… Big Cores Small Cores DRAMSim2 Memory Interface Shared Resources Cacti/Orion2 Sea of Fine-Grained Accelerators GYW

Modeling Accelerators in a SoC-like Environment
Please do not distribute 4/10/2017 Modeling Accelerators in a SoC-like Environment Acc Core Cache Memory Core Acc Core Cache Memory GYW

Aladdin: A pre-RTL, Power-Performance Accelerator Simulator
Please do not distribute 4/10/2017 Aladdin: A pre-RTL, Power-Performance Accelerator Simulator Architectures with 1000s of accelerators will be radically different; New design tools are needed. Aladdin enables rapid design space exploration of future accelerator-centric platforms. You can find Aladdin at GYW

Please do not distribute

Similar presentations

Presentation on theme: "Please do not distribute"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Please do not distribute

Similar presentations

Presentation on theme: "Please do not distribute"— Presentation transcript:

Similar presentations

About project

Feedback