Download presentation
Presentation is loading. Please wait.
Published byWilfrid Gordon Modified over 9 years ago
1
Frank Vahid, UCR 1 Building Fake Body Parts: Digital Mockups Frank Vahid Univ. of California, Riverside Support provided by NSF, SRC, Dept. of Educ. Also CareFusion, Xilinx, METI Chen Huang (UC Riverside, now Amazon) Bailey Miller (UC Riverside, intern at SpaceX) Prof. Tony Givargis (UC Irvine) Ting-Shuo Chou (UC Irvine) Others...
2
Frank Vahid, UCR 2Bailey Miller, UCR 2
3
Frank Vahid, UCR 3 Models of physical world that run in real-time http://www.nhlbi.nih.gov/ Test cyber-physical systems Physical mockup Transducer models Environment model Digital mockup
4
Frank Vahid, UCR 4 Issue: Real-time achieved via inaccuracy Frank Vahid, UCR 4 Weibel lung complexity 4 gen: 32 ODEs 6 gen: 128 ODEs 8 gen: 512 ODEs 10 gen: 2048 ODEs “2-3 minutes to simulate one breath accurately” V[1],R[1] V[2],R[2] V[7],R[7]
5
Frank Vahid, UCR 5 0 100 200 300 400 500 600 700 800 900 1000 Weibel Neuron Weibel + gas Hemodynamic Weibel + hemo Performance (ms) PC(1) PC(4) GPU Speedup vs real-time PC(1): 0.8x PC(4): 3.1x GPU: 1.6x PC & GPU 1430 1490 1522 1184 Parallel computations + Neighbor communication Seem like great match for FPGAs
6
Frank Vahid, UCR 6 for (i=0; i < 128; i++) y[i] += c[i] * x[i].. FPGAs: Sw circuits (parallel) for (i=0; i < 128; i++) y += c[i] * x[i].. ************ ++++++ + ++ ++ + C Code for FIR Filter Processor 1000’s of instructions –Several thousand cycles Circuit for FIR Filter Processor FPGA ~ 7 cycles (though slower clock) Speedup > 10x-100x
7
Frank Vahid, UCR 7 2x2 switch matrix y z 0 1 0 1 w x FPGAs “101” (A Quick Intro) ab a1a0a1a0 4x2 Memory abab d 1 d 0 F G 00 01 10 11 LUT FG 10 1 1 1 0 1 0 1 0 00 01 10 11 ab 0 0 1 1 1 0 1 0 1 0 SM ab c D E FPGA abc D E 1 1 0 1 0 1 0 0110 1100 00001111
8
Frank Vahid, UCR 8 0 100 200 300 400 500 600 700 800 900 1000 Weibel Neuron Weibel + gas Hemodynamic Weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS / FPGAs Speedup vs real-time PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS/FPGA: 3.2x HLS 1430 1490 1522 1184 High-level synthesis: Compiler that converts program to circuits
9
Frank Vahid, UCR 9 Network of synchronized PEs on FPGAs FPGA Digital mockup General Processing Element Iterative ODE solver (Euler/RK4) 0.1 ms / 0.01 ms timestep PE 1 PE: 300 MHz
10
Frank Vahid, UCR 10 Synthesis tool 10K iterations 150K iterations Convert virtual PEs to physical circuits using FPGA place-route 1 2 Phase Maps ODEs to virtual PEs using simulated annealing
11
Frank Vahid, UCR 11 0 100 200 300 400 500 600 700 800 900 1000 Weibel Neuron Weibel + gas Hemodynamic weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS General PEs Speedup vs real-time PC(1):0.8x PC(4):3.1x GPU:1.6x HLS:3.2x General PEs:4.9x General PEs 1430 1490 1522 1184
12
Frank Vahid, UCR 12 Problem: More PEs Lower frequency FPGA Inter-PE critical path FPGA DSP INST MEM DATA MEM Internal PE critical path 11-gen Weibel model, Virtex6 240T FPGA, general PEs Real ODEs/sec Lost ODEs/sec due to freq drop
13
Frank Vahid, UCR 13 FPGA Use model structure to improve Graph embedding: Map guest graph to host graph, minim. max wire length Guest Host Virtual PEs Physical PEs Avoid using FPGA placement (Phase 2)
14
Frank Vahid, UCR 14 FPGA 12 3 … Phase 2 – Map virtual PEs to physical PEs Embedding algorithm H-tree embedding Linear embedding Direct map embedding Guest Host [1] Zienicke, P. 1990. Embeddings of Treelike Graphs into 2-Dimensional Meshes. (WG '90). [2] Aleliunas, R., and Rosenberg, A.L. 1982. On Embedding Rectangular Grids in Square Grids. (Computers ‘82). [3] Berman, F., and Snyder, L. 1987. On mapping parallel algorithms into parallel architectures, (PDC, ‘87).
15
Frank Vahid, UCR 15 2D grid of physical PEs EqP1 EqV1 EqP2 EqV2 EqP3 EqV3 EqP4 EqV4 EqP7 EqV7 EqP5 EqV5 EqP6 EqV6 FPGA Bypass FPGA placement EqP1 EqV1 EqP4 EqV4 EqP2 EqV2 EqP5 EqV5 EqP2 EqV2 EqP6 EqV6 EqP7 EqV7 (Phase 1: May require "graph folding" first to reduce #PEs)
16
Frank Vahid, UCR 16 FPGA Compare/backup: Simulated annealing Cost function: C = w1*sum + w2*max + w3*gaps Sum = sum of wire distances Max = max wire length (Euclidean dist.) Gaps = wires across architectural features FPGA P1 P2 Neighbor function: Swap PEs based on distance to neighbors
17
Frank Vahid, UCR 17 Results No placement strategy Simulated annealing placement Embedding placement 4 generations shown5 generations shown
18
Frank Vahid, UCR 18 Results Not routable Strategy# LUTS# BRAM#DSPEquivalent LUTs None58362512256306682 SA58567512256306887 Embed58569512256306889 No impact on size 2D Neuron model - 256PE – Xilinx Virtex6 StrategyTotal power (mW) Dynamic power (mW) Static power (mW) None1552587446481 SA16604100136590 Embed19859129996859 20% more power
19
Frank Vahid, UCR 19 Miller, B., F. Vahid, and T. Givargis. Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation. ACM Int. Symp. on FPGAs, 2013. Graph emb (Gen PEs) Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x
20
Frank Vahid, UCR 20 Custom Processing Element Custom datapath to solve specific type of equation MUL Const ROM Address Input_sel Address Inputs Output SUB Controller We Data RAM Controller PE SUBMUL FPGA Digital mockup Interface V’ = F 1 – F 2 F’ = P 1 -P 2 -(F*C R )*C L Custom PE for each ODE type Modified synthesis tool to create custom PEs for given ODEs first, then synthesis ODEs to PEs
21
Frank Vahid, UCR 21 0 100 200 300 400 500 600 700 800 900 1000 Weibel Neuron Weibel + gas Hemodynamic weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS General PEs Custom PEs Huang, Vahid, Givargis. Synthesis of networks of custom processing elements for real-time physical system emulation. Transactions on Design Automation of Electronic Systems (TODAES), 2013 (to appear). Custom PEs 1430 1490 1522 1184 Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x Custom PE: 6.1x
22
Frank Vahid, UCR 22 Networks of Heterogeneous PEs Huang, Miller, Vahid, Givargis. Synthesis of Heterogeneous Processing Elements for Physical System Emulation. CODES+ISSS 2012, Oct, 2012. General PE: –Slow, flexible (can solve any types of ODEs) Custom PE: –Fast, inflexible (only solves one type of ODEs) Multi-Type PE –Combined multiple types of ODEs into single custom PE FPGA Digital mockup Interface Huge solution space: How to choose types of PEs? How many PEs to allocate? How to bind ODEs to PEs?
23
Frank Vahid, UCR 23 Automatic allocation and binding Initial random allocation PE allocator ODE-to-PE mapper New PE allocation Cycles of each PE Better solution Best solution N Y Simulated annealing
24
Frank Vahid, UCR 24 0 100 200 300 400 500 600 700 800 900 1000 Weibel Neuron Weibel + gas Hemodynamic weibel + hemo Performance (ms) PC(1) PC(4) GPU HLS General PEs Custom PEs Heterogeneous PEs C. Huang, B. Miller, F. Vahid, T. Givargis. Synthesis of Custom Networks of Heterogeneous Processing Elements for Complex Physical System Emulation. IEEE/ACM Conf on Hardware/Software Codesign and System Synthesis (CODES/ISSS, part of ESWEEK), Finland, Oct 2012. 1430 1490 1522 1184 Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x Custom PE: 6.1x Heterog PE: 34.5x
25
Frank Vahid, UCR 25 Network of general/custom/heterogeneous PEs VS HLS (regularity extraction) Heterogeneous PE: (10x, 1.1x) HLS (7x, 0.85x) general PE (6x, 1.35x) custom PE (Speed, Size) Performance (ms): time to emulate 1000 ms, using Euler with 0.01 ms step. Size (equivalent LUTs)
26
Frank Vahid, UCR 26 Speedup / dollar CPU (I7-950 + Intel X58 board): $480 GPU(GTX460 + I3-540 + H55 board): $380 FPGA (Xilinx Virtex6 240T-2 board): $1800 Heterogeneous PEs: 3X better than PC(4) 4.5x better than GPU FPGA: Easier to build custom interfaces
27
Frank Vahid, UCR 27 Other projects Assistive monitoring -www.cs.ucr.edu/~vahid/assistivemonitoring/www.cs.ucr.edu/~vahid/assistivemonitoring/ -http://www.youtube.com/watch?feature=player_embedded&v=Sf8tU-78lXshttp://www.youtube.com/watch?feature=player_embedded&v=Sf8tU-78lXs –..\Desktop\Fall montage.mp4..\Desktop\Frank_pullChair_013113_cam3.video.wmv..\Desktop\Fall montage.mp4..\Desktop\Frank_pullChair_013113_cam3.video.wmv Web-based learning –"Textbook is dead" –Multi-univ synergy –pcpp.zyante.com (C++)pcpp.zyante.com Embedded systems educ. –New prog. model, virtual lab, programmingembeddedsystems.com programmingembeddedsystems.com –Also riosscheduler.orgriosscheduler.org Drunk driving (DUI) –..\Desktop\dui.MOV..\Desktop\dui.MOV –duicam.orgduicam.org http://www.utsandiego.com/news/2013/feb/11/ucr- drunken-driving-app/http://www.utsandiego.com/news/2013/feb/11/ucr- drunken-driving-app/
28
Frank Vahid, UCR 28 Summary Frank Vahid, UCR 28 http://www.meti.com/ Speedup vs real-time (avg) PC(1): 0.8x PC(4): 3.1x GPU: 1.6x HLS: 3.2x General PE: 4.9x Grph emb(GPE): 11.2x Custom PE: 6.1x Heterog PE: 34.5x (Grph emb+HPE: 48.5x) FPGAs: Fastest cost- effective execution of physical models –http://www.youtube.com/watch?v=ThUKVhqoA3Qhttp://www.youtube.com/watch?v=ThUKVhqoA3Q Future –Manycore device –Beyond testing CPS Implement end-products
29
Frank Vahid, UCR 29 Questions? Frank Vahid, UCR 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.