Presentation is loading. Please wait.

Presentation is loading. Please wait.

Please do not distribute

Similar presentations


Presentation on theme: "Please do not distribute"— Presentation transcript:

1 Please do not distribute
4/17/2017 Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone Accelerator Generation: High-Level Synthesis 10:30 am – 11:00 am HLS-Based Accelerator-Rich Architecture Simulation: PARADE 11:00 am – 11:30 am Break 11:30 am – 12:00 pm Pre-RTL SoC Simulation: gem5-Aladdin 12:00 pm – 12:30 pm FPGA Prototyping: ARACompiler 12:30 pm – 2:00 pm Lunch 2:00 pm – 3:00 pm Panel on Accelerator Research 3:00 pm – 3:30 pm Accelerator Benchmarks and Workload Characterization 3:30 pm – 4:00 pm 4:00 pm – 5:00 pm Hands-on Exercise Amortize optimization phase GYW

2 Please do not distribute
4/17/2017 Integration for Heterogeneous SoC Modeling Yakun Sophia Shao, Sam Xi, Gu-Yeon Wei, David Brooks Harvard University GYW

3 Accelerator-CPU Integration: Today’s Conventional SoCs
Easy to integrate lots of IP, simple accelerator design Hard to program and share data Core L2 $ L3 $ DMA On-Chip System Bus Acc #1 Scratchpad Acc #n

4 Accelerator Integration Trend
Users design application-specific hardware accelerators. System vendors provide Host Service Layer with virtual memory and cache coherence support Intel QuickAssist QPI-Based FPGA Accelerator Platform (QAP) IBM POWER8’s Coherent Accelerator Processor Interface (CAPI) Main CPU/SoC FPGA or user-defined ASIC Core Core Accelerator L2 $ L2 $ Acc Agent Host Service Layer L3 $

5 IBM CAPI: Two part solution
Example of state-of-the-art: IBM POWER8’s Coherent Accelerator Processor Interface (CAPI) Virtual Addressing & Data Caching Easier, Natural Programming Model

6 IBM CAPI: Two part solution
Coherent Accelerator Processor Proxy (CAPP) Snoops PowerBus on behalf of accelerator Power Service Layer (PSL) Performs address translations, page table walker support Provides cache and interface logic Accelerator Core Core PCIe L2 $ L2 $ PSL CAPP L3 $ On-Chip Coherent PowerBus Memory Cache TLB

7 But… accelerators are not one size fits all
Problem: PSL layer consumes ~20-30% of FPGA resources… for one accelerator Applications have drastically different requirements. Memory design customization is often more important than datapath customization

8 gem5-Aladdin Integration
CPU Acc Datapath Cache Scratchpad TLB DMA Engine Cache LLC DRAM

9 Code example: Sift void imsmooth(F2D* array, float sigma, F2D* product); void sift() { … imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product)); invokeAcceleratorAndBlock(imsmooth); }

10 Start Aladdin Simulation
Code example: Sift void imsmooth(F2D* array, float sigma, F2D* product); void sift() { … // imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product)); invokeAccelerator(imsmooth); } Start Aladdin Simulation

11 Simulating Accelerator with Memory System using Aladdin
Cache Memory

12 Acc Cache Memory CPU Cache Memory

13 Modeling Accelerators in an SoC-like Environment
Please do not distribute 4/17/2017 Modeling Accelerators in an SoC-like Environment Acc Core Core Cache Memory GYW

14 Modeling Accelerators in an SoC-like Environment
Core Cache Memory

15 Accelerator Research Infrastructure
Standalone System Integration Modeling Aladdin gem5-Aladdin High-Level Synthesis PARADE RTL Prototyping FPGA

16 Tutorial References Y.S. Shao and D. Brooks, “ISA-Independent Workload Characterization and its Implications for Specialized Architectures,” ISPASS’13. B. Reagen, Y.S. Shao, G.-Y. Wei, D. Brooks, “Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware,” ISLPED’13. Y.S. Shao, B. Reagen, G.-Y. Wei, D. Brooks, “Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures,” ISCA’14. B. Reagen, B. Adolf, Y.S. Shao, G.-Y. Wei, D. Brooks, “MachSuite: Benchmarks for Accelerator Design and Customized Architectures,” IISWC’14.


Download ppt "Please do not distribute"

Similar presentations


Ads by Google