Please do not distribute

Slides:



Advertisements
Similar presentations
MEMOCode 2007 Design Contest – MIT Submission N. Dave, K. Fleming, M. King, M. Pellauer, M. Vijayaraghavan.
Advertisements

Please do not distribute
Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
1 EE249 Discussion A Method for Architecture Exploration for Heterogeneous Signal Processing Systems Sam Williams EE249 Discussion Section October 15,
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
Toward Cache-Friendly Hardware Accelerators
Rapid Exploration of Accelerator-rich Architectures: Automation from Concept to Prototyping David Brooks, Yu-Ting Chen, Jason Cong, Zhenman Fang, Brandon.
Please do not distribute
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Please do not distribute
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
The MachSuite Benchmark
Tutorial Outline Time Topic 9:00 am – 9:30 am Introduction 9:30 am – 10:10 am Standalone Accelerator Simulation: Aladdin 10:10 am – 10:30 am Standalone.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Ni.com Seven Habits of Highly Effective LabVIEW ™ DAQ Programmers Reid Lee Staff Software Engineer Wed Aug 16 10:15-11:30 a.m., 12:00-1:15 p.m., 3:30-4:45.
SystemC: A Complete Digital System Modeling Language: A Case Study Reni Rambus Inc.
SOC Consortium Course Material ASIC Logic Speaker: Lung-Hao Chang 張龍豪 Advisor: Prof. Andy Wu 吳安宇教授 May 21, 2003 National Taiwan University Adopted from.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
DSP base-station comparisons. Second generation (2G) wireless 2 nd generation: digital: last decade: 1990’s Voice and low bit-rate data –~14.4 – 28.8.
Exploiting Parallelism
Caches for Accelerators
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware WU DI NOV. 3, 2015.
A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST)
Design and Modeling of Specialized Architectures Yakun Sophia Shao May 9 th, 2016 Harvard University P HD D ISSERTATION D EFENSE.
PARADE: A Cycle-Accurate Full-System Simulation Platform for Accelerator-Rich Architectural Design and Exploration Zhenman Fang, Michael Gill Jason Cong,
Computer Engg, IIT(BHU)
Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin
Outline Installing Gem5 SPEC2006 for Gem5 Configuring Gem5.
Please do not distribute
Please do not distribute
Jason Cong, Yu-Ting Chen, Zhenman Fang, Bingjun Xiao, Peipei Zhou
Lab 1: Using NIOS II processor for code execution on FPGA
Please do not distribute
Prof: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2017
Please do not distribute
Ph.D. in Computer Science
Java Course Review.
Please do not distribute
ArcSight Logger/CA Partner Certification Training
Modeling of Digital Systems
Basic CUDA Programming
Introduction to SimpleScalar
FPGA Acceleration of Convolutional Neural Networks
FPGA Implementation of Multicore AES 128/192/256
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
CS 301 Fall 2001 – Chapter 3 Slides by Prof. Hartman, following “IBM PC Assembly Language Programming” by Peter Abel 9/17/2018.
A Review of Processor Design Flow
Figure 13.1 MIPS Single Clock Cycle Implementation.
Course Agenda DSP Design Flow.
הודעות ריענון מהיר והרחבות דגימת אות Low-Level
הודעות ריענון מהיר והרחבות כתיבה לקובץ Low-Level דגימת אות Low-Level
Register Pressure Guided Unroll-and-Jam
ESE532: System-on-a-Chip Architecture
A High Performance SoC: PkunityTM
Topics: Programming Constructs: loops & conditionals Digital Input
Final Project presentation
Hyoukjun Kwon*, Michael Pellauer**, and Tushar Krishna*
Application-Specific Customization of Soft Processor Microarchitecture
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Please do not distribute 5/27/2018 Tutorial Outline Time Topic Speaker 8:30 am – 9:00 am Accelerator Research Infrastructure Overview Sophia Shao 9:00 am – 9:30 am Aladdin: Accelerator Pre-RTL Modeling 9:30 am – 10:00 am Rapid Hardware Specialization with HLS: Glass Half Full Prof. Zhiru Zhang 10:00 am – 10:30 am PARADE: HLS-Based Accelerator-Rich Architecture Simulation Zhenman Fang 10:30 am – 11:00 am Break 11:00 am – 11:30 am gem5-Aladdin: Accelerator System Co-Design Sam Xi 11:30 am – 12:00 pm ARAPrototyper: FPGA Prototyping 12:00pm – 13:30 pm Lunch 13:30 pm – 14:00 pm Virtual Machine Setup Sophia Shao & Sam Xi 14:00 pm – 14:30 pm Hands-on: Accelerator Design Space Exploration using Aladdin 14:30 pm – 15:00 pm Hands-on: SoC Design Space Exploration using gem5-Aladdin Amortize optimization phase GYW

Aladdin Hands-on Exercise Goal: Running a power-performance design space exploration for triad in MachSuite. Tasks: Build LLVM-Tracer, Aladdin, and verify with aladdin unit-tests. Walk through the design space exploration steps using triad as an example: Generate LLVM IR trace Prepare a hardware configuration file Run Aladdin Explore the parameter space Unrolling Memory Bandwidth Clock frequency Repeat the above steps for MachSuite/stencil2d

Task 1: Build LLVM-Tracer and Aladdin Make sure LLVM-Tracer and Aladdin are built successfully in your virtual machine.

Task 2 Design Space Exploration for triad void triad (int *a, int *b, int *c, int s) { int i; triad_loop: for (i = 0; i < NUM; i++) { c[i] = a[i] + s * b[i]; }

Task 2 Design Space Exploration for triad Arrays void triad (int *a, int *b, int *c, int s) { int i; triad_loop: for (i = 0; i < NUM; i++) { c[i] = a[i] + s * b[i]; }

Task 2 Design Space Exploration for triad Arrays void triad (int *a, int *b, int *c, int s) { int i; triad_loop: for (i = 0; i < NUM; i++) { c[i] = a[i] + s * b[i]; } Loop

Array Parameters Read port Write port Partition/Bank partition,cyclic,a,8192,4,1 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 1 (1 partition)

Array Parameters Read port Write port Partition/Bank partition,cyclic,a,8192,4,1 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 1 (1 partition) partition,cyclic,a,8192,4,2 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 2 (2 partitions)

Array Parameters Read port Write port Partition/Bank a[0] a[1] a[2] partition,cyclic,a,8192,4,2 // partition type: cyclic // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 2 (2 partitions) a[0] a[1] a[2] a[3] partition,block,a,8192,4,2 // partition type: block // array name : a // array size : 8192 Bytes // element size : 4 Bytes (int) // partition factor : 2 (2 partitions) a[0] a[2] a[1] a[3]

Loop Parameters + a b s c X unrolling,triad,triad_loop,1 // unrolling a loop // function name : triad // loop label : triad_loop // unrolling factor : 1 X + c

Loop Parameters + + a b s a b s c c X X unrolling,triad,triad_loop,2 // unrolling a loop // function name : triad // loop label : triad_loop // unrolling factor : 2

Task 2.1 Generator Triad Trace vagrant@genie:~$ cd gem5-aladdin/src/aladdin/SHOC/triad/ vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad$ vi triad.c vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad$ make run-trace vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad$ vi dynamic_trace.gz

Task 2.2 Setup a design config vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad$ mkdir example vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad/example$ vi triad.cfg

Task 2.2 Setup a design config cycle_time,6 pipelining,1 partition,cyclic,a,8192,4,1 partition,cyclic,b,8192,4,1 partition,cyclic,c,8192,4,1 unrolling,triad,triad_loop,1

Task 2.2 Setup a design config vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad/example$ cp ../run.sh . vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad/example$ mkdir outputs vagrant@genie:~/gem5-aladdin/src/aladdin/SHOC/triad/example$ bash run.sh

Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 6 2052

Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 6 2052 4 516 10.2888 517 68.91

Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 6 2052 4.47 4 4.43 516 10.29 517 68.91

Task 2.3 Design Space Exploration Unrolling Partition Clock Period (ns) Cycles Power (mW) 1 6 2052 4.47 4 4.43 516 10.29 517 68.91