Vector accelerator array in constrained memory bandwidth

Slides:

Advertisements

Similar presentations

IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.

Advertisements

Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.

Digital RF Stabilization System Based on MicroTCA Technology - Libera LLRF Robert Černe May 2010, RT10, Lisboa

Laboratory for SoC design TEMPUS meeting Niš,

K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.

Hardware-level model elaboration Igal Yaroslavski, M.Sc. Senior Team Leader - MATLAB & Simulink Application Engineering Signal.

FPGA Design Flow Design Circuit Simulation Implementation Programming.

Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.

HERMES tracking for OLYMPUS. Part #1. Detector survey. A.Kiselev OLYMPUS Collaboration Meeting DESY, Hamburg,

MEMOCODE 2007 HW/SW Co-design Contest Documentation of the submission by Eric Simpson Pengyuan Yu Sumit Ahuja Sandeep Shukla Patrick Schaumont Electrical.

Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray

Cross Product, Torque, and

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:

HPEC_GPU_DECODE-1 ADC 8/6/2015 MIT Lincoln Laboratory GPU Accelerated Decoding of High Performance Error Correcting Codes Andrew D. Copeland, Nicholas.

Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.

© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.

Spherical Code Construction Vojin Šenk Ivan Stanojević Mladen Kovačević University of Novi.

Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.

Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.

VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.

Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.

- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.

An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.

Sub-Nyquist Sampling Algorithm Implementation on Flex Rio

FPGA Based Smoke Simulator Jonathan Chang Yun Fei Tianming Miao Guanduo Li.

Wang Chen, Dr. Miriam Leeser, Dr. Carey Rappaport Goal Speedup 3D Finite-Difference Time-Domain.

1 Graphics CSCI 343, Fall 2015 Lecture 10 Coordinate Transformations.

Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.

GPS Computer Program Performed by: Moti Peretz Neta Galil Supervised by: Mony Orbach Spring 2009 Part A Presentation High Speed Digital Systems Lab Electrical.

Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.

Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Annual project אביב תשס " ט.

Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir Under the guidance of: Prof. M Balakrishnan Prof. Kolin.

TRIUMF HLA Development High Level Applications Perform tasks of accelerator and beam control at control- room level, directly interfacing with operators.

Co-processors for speeding up drug design algorithms Advait Jain Priyanka Jindal Pulkit Gambhir.

Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.

FPGA BASED REAL TIME VIDEO PROCESSING Characterization presentation Presented by: Roman Kofman Sergey Kleyman Supervisor: Mike Sumszyk.

1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.

Encryption / Decryption on FPGA Midterm Presentation Written by: Daniel Farcovich ID Saar Vigodskey ID Advisor: Mony Orbach Summer.

Use of FPGA for dataflow Filippo Costa ALICE O2 CERN

Backprojection Project Update January 2002

Fluid Animation CSE 3541 By: Matt Boggus.

Character Animation Forward and Inverse Kinematics

Cellular Automata Project:

Coordinate Change.

USB The topics covered, in order, are USB background

School of Engineering University of Guelph

Hardware Implementation of CTIS Reconstruction Algorithms

Hardware accelerator for Efficient error-correcting codes

Introduction to Programmable Logic

From: A New Software Approach for the Simulation of Multibody Dynamics

JIVE UniBoard Correlator (JUC) Firmware

FPGAs in AWS and First Use Cases, Kees Vissers

Introduction to High-level Synthesis

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

Figure 1 PC Emulation System Display Memory [Embedded SOC Software]

Emu: Rapid FPGA Prototyping of Network Services in C#

Matlab as a Development Environment for FPGA Design

High Level Synthesis Overview

Data Orgnization Frequently accessed data on the same storage device?

Co-processors for speeding up drug design algorithms

Beam Instrumentation Meeting – Dan Peterson

HIGH LEVEL SYNTHESIS.

Final Project presentation

EE 4xx: Computer Architecture and Performance Programming

1CECA, Peking University, China

Welcome to the FPGA Tools Course Agenda

NetFPGA - an open network development platform

Presentation transcript:

Vector accelerator array in constrained memory bandwidth Project Proposal Course Code : EDAN85 Course Teacher : Flavius Gruian Student : Arun Jeevaraj

Target Application Application parameters L distance Particle Beam simulator Linear Model with high density particle space: 10^6 particles Particles represented by space vector (dimension 6), x, y, z coordinates and momentum parameters 10^6 matrix multiplication per frame. Drift Space x1 x0 Drift space Application parameters Frame size : 45 MBytes for double precision Iterative Nature : Test for 1000 frames Drift space Drift space Drift space Drift space Frame Frame Frame Frame

Target Hardware Design. Target Design Design Requirements DDR MEMORY Maximise throughput Keep accuracy fidelity FPGA Vector Accelerator Array RAM Vector Accelerator Array RAM Hardware Design Use Memory Hierarchy Use Fixed Point Vector accelerator Array RAM

Design Flow Matlab/Python Reference Simulation Model Software Simulation Model for reference data (Matlab or python.) HLS/SystemC behavioral Models. FPGA Implementation and verification. Vivado HLS Behaviour Simulation Model Hardware Implementation Optional : Use ethernet I/O with PC for testing and verification.