Zhiduo Liu Aaron Severance Satnam Singh Guy Lemieux Accelerator Compiler for the VENICE Vector Processor.

Slides:



Advertisements
Similar presentations
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Advertisements

P3- Represent how data flows around a computer system
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
VEGAS: A Soft Vector Processor Aaron Severance Some slides from Prof. Guy Lemieux and Chris Chou 1.
Computer Abstractions and Technology
Benchmarking and Performance Evaluations Todd Mytkowicz Microsoft Research.
Click to add text © IBM Corporation Optimization Issues in SSE/AVX-compatible functions on PowerPC Ian McIntosh November 5, 2014.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Lecture 1: Overview of Computers & Programming
Microprocessors. Microprocessor Buses Address Bus Address Bus One way street over which microprocessor sends an address code to memory or other external.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Wednesday, 9/4/02, Slide #1 1 CS 106 Intro to CS 1 Wednesday, 9/4/02  Today: Introduction, course information, and basic ideas of computers and programming.
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
Chapter 4 Assessing and Understanding Performance
Software Performance Tuning Project – Final Presentation Prepared By: Eyal Segal Koren Shoval Advisors: Liat Atsmon Koby Gottlieb.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 1: An Overview of Computers and Programming Languages C++ Programming:
CS 101 Problem Solving and Structured Programming in C Sami Rollins Spring 2003.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
A Source-to-Source OpenACC compiler for CUDA Akihiro Tabuchi †1 Masahiro Nakao †2 Mitsuhisa Sato †1 †1. Graduate School of Systems and Information Engineering,
Chapter 1 Introduction to Programming. Computer Hardware CPU Memory –Main or primary –Secondary or auxiliary Input device(s) Output device(s)
LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.
CH12 CPU Structure and Function
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.
1 Computer Performance: Metrics, Measurement, & Evaluation.
11 Getting Started with C# Chapter Objectives You will be able to: 1. Say in general terms how C# differs from C. 2. Create, compile, and run a.
Embedded Supercomputing in FPGAs
Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
1 ITCS 4/5010 GPU Programming, UNC-Charlotte, B. Wilkinson, Jan 14, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 1- 1 October 20, October 20, 2015October 20, 2015October 20,
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
CS CS CS IA: Procedural Programming CS IB: Object-Oriented Programming.
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
Compilers: Overview/1 1 Compiler Structures Objective – –what are the main features (structures) in a compiler? , Semester 1,
Automating and Optimizing Data Transfers for Many-core Coprocessors Student: Bin Ren, Advisor: Gagan Agrawal, NEC Intern Mentor: Nishkam Ravi, Yi Yang.
Survey of Program Compilation and Execution Bangor High School Ali Shareef 2/28/06.
Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th, 2012 Accelerator Compiler for the VENICE Vector Processor.
CS Computer Science I. BCPL was developed in 1967 as a language for writing operating systems and software compilers In 1970, the creators of the.
CSCI 6307 Foundation of Systems Review: Midterm Exam Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
Computer Software Types Three layers of software Operation.
Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan.
11 Introduction to Object Oriented Programming (Continued) Cats.
Lecture 7.  There are 2 types of libraries used by standard C++ The C standard library (math.h) and C++ The C++ standard template library  Allows us.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Computer Operation. Binary Codes CPU operates in binary codes Representation of values in binary codes Instructions to CPU in binary codes Addresses in.
W AVEFRONT S KIPPING USING BRAM S FOR C ONDITIONAL A LGORITHMS ON V ECTOR P ROCESSORS Aaron Severance Joe Edwards Guy G.F. Lemieux.
1 ITCS 4/5145GPU Programming, UNC-Charlotte, B. Wilkinson, Nov 4, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU programming.
Software Engineering Algorithms, Compilers, & Lifecycle.
Foundations of Computer Science C & C++ programming lecture 2
Component 1.6.
Lab 1: Using NIOS II processor for code execution on FPGA
CUDA Programming Model
Basic CUDA Programming
C++ Programming: From Problem Analysis to Program Design
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 2 Elementary Programming
Code Generation.
Processing Computer Components.
Central Processing Unit
CUDA Programming Model
Computer Terms Review from what language did C++ originate?
Compiler Structures 1. Overview Objective
Presentation transcript:

Zhiduo Liu Aaron Severance Satnam Singh Guy Lemieux Accelerator Compiler for the VENICE Vector Processor

This is the VENICE Vector Processor : Complicated

#include "vector.h“ int main() { int A[] = {1,2,3,4,5,6,7,8}; const int data_len = sizeof ( A ); int *va = ( int *) vector_malloc ( data_len ); vector_dma_to_vector ( va, A, data_len ); vector_wait_for_dma (); vector_set_vl ( data_len / sizeof (int) ); vector ( SVW, VADD, va, 42, va ); vector_instr_sync (); // wait for operation to complete vector_dma_to_host ( A, va, data_len ); vector_wait_for_dma (); vector_free (); // deallocate scratchpad malloc } You can program it like this : #include "Accelerator.h" #include "VectorTarget.h" using namespace ParallelArrays; using namespace MicrosoftTargets; int main() { Target *tgtVector = CreateVectorTarget(); int A[] = {1,2,3,4,5,6,7,8}; IPA b = IPA( A, sizeof (A)/sizeof (int)); IPA c = b + 42; tgtVector->ToArray( c, A, sizeof (A)/sizeof (int)); tgtVector->Delete(); } OR like this : Target *tgtVector = CreateMulticoreTarget(); Target *tgtVector = CreateDX9Target();

Assembly Programming : Write Assembly Download to board Compile with Gcc Get Result Doesn’t compile? Result Incorrect? Accelerator Programming : Write in Accelerator Download to board Compile with Microsoft Visual Studio Get Result Compile with Gcc Doesn’t compile? Or result incorrect?

Assembly Programming : 1.Hard to program 2.Long debug cycle 3.Not portable 4.Manual – Not always optimal or correct (wysiwyg) Accelerator Programming : 1.Easy to program 2.Easy to debug 3.Can also target other devices 4.Automated compiler optimizations

LIR Convert To LIR IR Add Intermediates Combine Operations Evaluation Ordering & Reference Counting Evaluation Ordering & Reference Counting Buffer Counting Calculate Buffer Size Allocate & Initialize Memory Transfer Data To Scratchpad Set VL Write Vector Instructions Transfer Result To Host Assign Buffers to Inputs Need Double buffering? LIR Compiler Flow

CPU Benchmark Runtime (seconds) fir2Dfirlifeimgblendmedianmotest Xeon W3690 (3.47GHz) VENICE (V64, 100MHz) Speedup1.0 x1.5 x2.3 x0.4 x3.2 x1.1 x 369x

Thank you !