Frank Vahid, UCR 1 Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Support provided by NSF, SRC, Dept. of Educ. Also.

Slides:



Advertisements
Similar presentations
Digitally-Bypassed Transducers: Interfacing Digital Mockups to Real-Time Medical Equipment Scott Sirowy*, Tony Givargis and Frank Vahid* This work was.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Frank Vahid, UCR 1 Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Support provided by NSF, SRC, and CareFusion.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Overview Finite State Machines - Sequential circuits with inputs and outputs State Diagrams - An abstraction tool to visualize and analyze sequential circuits.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
The New Software: Invisible Ubiquitous FPGAs that Enable Next-Generation Embedded Systems Frank Vahid Professor Department of Computer Science and Engineering.
Warp Processing – Towards FPGA Ubiquity Frank Vahid Professor Department of Computer Science and Engineering University of California, Riverside Associate.
Evolution of implementation technologies
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Programmable logic and FPGA
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
Application-Specific Codesign Platform Generation for Digital Mockups in Cyber- Physical Systems Bailey Miller *, Frank Vahid *†, Tony Givargis † *Dept.
1/21 Scalable Object Detection Accelerators on FPGAs Using Custom Design Space Exploration Chen Huang and Frank Vahid Dept. of Computer Science and Engineering.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Frank Vahid, 1 Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation Bailey Miller*, Frank Vahid*, Tony Givargis**
General FPGA Architecture Field Programmable Gate Array.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Automating Shift-Register-LUT Based Run-Time Reconfiguration Karel Heyse, Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Highest Performance Programmable DSP Solution September 17, 2015.
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
Automated Design of Custom Architecture Tulika Mitra
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Microcontroller Presented by Hasnain Heickal (07), Sabbir Ahmed(08) and Zakia Afroze Abedin(19)
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
J. Christiansen, CERN - EP/MIC
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Programmable Logic Devices
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Exploiting Parallelism
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
Self-Hosted Placement for Massively Parallel Processor Arrays (MPPAs) Graeme Smecher, Steve Wilton, Guy Lemieux Thursday, December 10, 2009 FPT 2009.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
Warp Processing: Making FPGAs Ubiquitous via Invisible Synthesis Greg Stitt Department of Electrical and Computer Engineering University of Florida.
A Scalable Pipelined Associative SIMD Array With Reconfigurable PE Interconnection Network For Embedded Applications Hong Wang & Robert A. Walker Computer.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Portable SystemC-on-a-Chip
Presentation transcript:

Frank Vahid, UCR 1 Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Support provided by NSF, SRC, Dept. of Educ. Also CareFusion, Xilinx, METI Chen Huang (UC Riverside, now Amazon) Bailey Miller (UC Riverside, intern at SpaceX) Prof. Tony Givargis (UC Irvine) Ting-Shuo Chou (UC Irvine) Others...

Frank Vahid, UCR 2Bailey Miller, UCR 2

Frank Vahid, UCR 3 Models of physical world that run in real-time Test cyber-physical systems Physical mockup Transducer models Environment model Digital mockup

Frank Vahid, UCR 4 Issue: Real-time achieved via inaccuracy Frank Vahid, UCR 4 Weibel lung complexity 4 gen: 32 ODEs 6 gen: 128 ODEs 8 gen: 512 ODEs 10 gen: 2048 ODEs “2-3 minutes to simulate one breath accurately” V[1],R[1] V[2],R[2] V[7],R[7]

Frank Vahid, UCR Weibel Neuron Weibel + gas Hemodynamic Weibel + hemo Performance (ms) PC(1) PC(4) GPU Speedup vs real-time PC(1): 0.8x PC(4): 3.1x GPU: 1.6x PC & GPU Parallel computations + Neighbor communication  Seem like great match for FPGAs

Frank Vahid, UCR 6 for (i=0; i < 128; i++) y[i] += c[i] * x[i].. FPGAs: Sw circuits (parallel) for (i=0; i < 128; i++) y += c[i] * x[i].. ************ C Code for FIR Filter Processor 1000’s of instructions –Several thousand cycles Circuit for FIR Filter Processor FPGA ~ 7 cycles (though slower clock) Speedup > 10x-100x

Frank Vahid, UCR 7 2x2 switch matrix y z w x FPGAs “101” (A Quick Intro) ab a1a0a1a0 4x2 Memory abab d 1 d 0 F G LUT FG ab SM ab c D E FPGA abc D E

Frank Vahid, UCR Weibel Neuron Weibel + gas Hemodynamic Weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS / FPGAs Speedup vs real-time PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS/FPGA: 3.2x HLS High-level synthesis: Compiler that converts program to circuits

Frank Vahid, UCR 9 Network of synchronized PEs on FPGAs FPGA Digital mockup General Processing Element Iterative ODE solver (Euler/RK4) 0.1 ms / 0.01 ms timestep PE 1 PE: 300 MHz

Frank Vahid, UCR 10 Synthesis tool 10K iterations 150K iterations Convert virtual PEs to physical circuits using FPGA place-route 1 2 Phase Maps ODEs to virtual PEs using simulated annealing

Frank Vahid, UCR Weibel Neuron Weibel + gas Hemodynamic weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS General PEs Speedup vs real-time PC(1):0.8x PC(4):3.1x GPU:1.6x HLS:3.2x General PEs:4.9x General PEs

Frank Vahid, UCR 12 Problem: More PEs  Lower frequency FPGA Inter-PE critical path FPGA DSP INST MEM DATA MEM Internal PE critical path 11-gen Weibel model, Virtex6 240T FPGA, general PEs Real ODEs/sec Lost ODEs/sec due to freq drop

Frank Vahid, UCR 13 FPGA Use model structure to improve Graph embedding: Map guest graph to host graph, minim. max wire length Guest Host Virtual PEs Physical PEs Avoid using FPGA placement (Phase 2)

Frank Vahid, UCR 14 FPGA 12 3 … Phase 2 – Map virtual PEs to physical PEs Embedding algorithm H-tree embedding Linear embedding Direct map embedding Guest Host [1] Zienicke, P Embeddings of Treelike Graphs into 2-Dimensional Meshes. (WG '90). [2] Aleliunas, R., and Rosenberg, A.L On Embedding Rectangular Grids in Square Grids. (Computers ‘82). [3] Berman, F., and Snyder, L On mapping parallel algorithms into parallel architectures, (PDC, ‘87).

Frank Vahid, UCR 15 2D grid of physical PEs EqP1 EqV1 EqP2 EqV2 EqP3 EqV3 EqP4 EqV4 EqP7 EqV7 EqP5 EqV5 EqP6 EqV6 FPGA Bypass FPGA placement EqP1 EqV1 EqP4 EqV4 EqP2 EqV2 EqP5 EqV5 EqP2 EqV2 EqP6 EqV6 EqP7 EqV7 (Phase 1: May require "graph folding" first to reduce #PEs)

Frank Vahid, UCR 16 FPGA Compare/backup: Simulated annealing Cost function: C = w1*sum + w2*max + w3*gaps Sum = sum of wire distances Max = max wire length (Euclidean dist.) Gaps = wires across architectural features FPGA P1 P2 Neighbor function: Swap PEs based on distance to neighbors

Frank Vahid, UCR 17 Results No placement strategy Simulated annealing placement Embedding placement 4 generations shown5 generations shown

Frank Vahid, UCR 18 Results Not routable Strategy# LUTS# BRAM#DSPEquivalent LUTs None SA Embed No impact on size 2D Neuron model - 256PE – Xilinx Virtex6 StrategyTotal power (mW) Dynamic power (mW) Static power (mW) None SA Embed % more power

Frank Vahid, UCR 19 Miller, B., F. Vahid, and T. Givargis. Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation. ACM Int. Symp. on FPGAs, Graph emb (Gen PEs) Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x

Frank Vahid, UCR 20 Custom Processing Element Custom datapath to solve specific type of equation MUL Const ROM Address Input_sel Address Inputs Output SUB Controller We Data RAM Controller PE SUBMUL FPGA Digital mockup Interface V’ = F 1 – F 2 F’ = P 1 -P 2 -(F*C R )*C L Custom PE for each ODE type Modified synthesis tool to create custom PEs for given ODEs first, then synthesis ODEs to PEs

Frank Vahid, UCR Weibel Neuron Weibel + gas Hemodynamic weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS General PEs Custom PEs Huang, Vahid, Givargis. Synthesis of networks of custom processing elements for real-time physical system emulation. Transactions on Design Automation of Electronic Systems (TODAES), 2013 (to appear). Custom PEs Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x Custom PE: 6.1x

Frank Vahid, UCR 22 Networks of Heterogeneous PEs Huang, Miller, Vahid, Givargis. Synthesis of Heterogeneous Processing Elements for Physical System Emulation. CODES+ISSS 2012, Oct, General PE: –Slow, flexible (can solve any types of ODEs) Custom PE: –Fast, inflexible (only solves one type of ODEs) Multi-Type PE –Combined multiple types of ODEs into single custom PE FPGA Digital mockup Interface Huge solution space: How to choose types of PEs? How many PEs to allocate? How to bind ODEs to PEs?

Frank Vahid, UCR 23 Automatic allocation and binding Initial random allocation PE allocator ODE-to-PE mapper New PE allocation Cycles of each PE Better solution Best solution N Y Simulated annealing

Frank Vahid, UCR Weibel Neuron Weibel + gas Hemodynamic weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS General PEs Custom PEs Heterogeneous PEs C. Huang, B. Miller, F. Vahid, T. Givargis. Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation. IEEE/ACM Conf on Hardware/Software Codesign and System Synthesis (CODES/ISSS, part of ESWEEK), Finland, Oct Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x Custom PE: 6.1x Heterog PE: 34.5x

Frank Vahid, UCR 25 Network of general/custom/heterogeneous PEs VS HLS (regularity extraction) Heterogeneous PE: (10x, 1.1x) HLS (7x, 0.85x) general PE (6x, 1.35x) custom PE (Speed, Size) Performance (ms): time to emulate 1000 ms, using Euler with 0.01 ms step. Size (equivalent LUTs)

Frank Vahid, UCR 26 Speedup / dollar CPU (I Intel X58 board): $480 GPU(GTX460 + I H55 board): $380 FPGA (Xilinx Virtex6 240T-2 board): $1800 Heterogeneous PEs: 3X better than PC(4) 4.5x better than GPU FPGA: Easier to build custom interfaces

Frank Vahid, UCR 27 Other projects Assistive monitoring –..\Desktop\Fall montage.mp4..\Desktop\Frank_pullChair_013113_cam3.video.wmv..\Desktop\Fall montage.mp4..\Desktop\Frank_pullChair_013113_cam3.video.wmv Web-based learning –"Textbook is dead" –Multi-univ synergy –pcpp.zyante.com (C++)pcpp.zyante.com Embedded systems educ. –New prog. model, virtual lab, programmingembeddedsystems.com programmingembeddedsystems.com –Also riosscheduler.orgriosscheduler.org Drunk driving (DUI) –..\Desktop\dui.MOV..\Desktop\dui.MOV –duicam.orgduicam.org drunken-driving-app/ drunken-driving-app/

Frank Vahid, UCR 28 Summary Frank Vahid, UCR 28 Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x Custom PE: 6.1x Heterog PE: 34.5x (Grph emb+HPE: 48.5x) FPGAs: Fastest cost- effective execution of physical models – Future –Manycore device –Beyond testing CPS Implement end-products

Frank Vahid, UCR 29 Questions? Frank Vahid, UCR 29