Machine Learning at the Edge

Slides:



Advertisements
Similar presentations
GPU-accelerated fractal imaging Jeremy Ehrhardt CS 81 - Spring 2009.
Advertisements

+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
K-means clustering –An unsupervised and iterative clustering algorithm –Clusters N observations into K clusters –Observations assigned to cluster with.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 Fast Extraction of Feature Salience Maps for Rapid Video Data Analysis Nikos P.
Desktop with Direct3D 10 capable hardware Laptop with Direct3D 10 capable hardware Direct3D 9 capable hardware Older or no graphics hardware.
General Purpose FIFO on Virtex-6 FPGA ML605 board Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf 1 Semester: spring 2012.
31st July 2008AIDA FEE Report1 AIDA Front end electronics Report July 2008 Progress Virtex5 FPGA choice Milestones for prototype delivery.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Technology Expectations in an Aeros Environment October 15, 2014.
SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI Workflow-Oriented Science Gateway for Astrophysical Visualization Eva.
Performance and Energy Efficiency of GPUs and FPGAs
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
Use/User:LabServerField Engineer Electrical Engineer Software Engineer Mechanical Engineer Requirements: Small form factor.
ECEn 191 – New Student Seminar - Session 9: Microprocessors, Digital Design Microprocessors and Digital Design ECEn 191 New Student Seminar.
£899 – Ultimatum Computers indiegogo.com/ultimatumcomputers The Ultimatum.
NVIDIA GPUs Power Adobe® Creative Suite® 6 Production Premium Best Performance - Broadest GPU Support ® Premiere Pro CS6 After Effects CS6 SpeedGrade.
Production System and Environment Sub-Program HPC applications in the International Potato Center Eng. Gonzalo Cucho-Padin Electronic Engineer,Research.
BELIEVE BOX. CPU : Intel Black Shield - 12 Cores - 5GHz Graphic Card : Nvidia GeForce 900GTX MHz Clock Mb Ram GDDR5 Hard Drive : 7200 RPM.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Accelerating a Software Radio Astronomy Correlator By Andrew Woods Supervisor: Prof. Inggs & Dr Langman.
Radar Pulse Compression Using the NVIDIA CUDA SDK
Tracking with CACTuS on Jetson Running a Bayesian multi object tracker on a low power, embedded system School of Information Technology & Mathematical.
Geant4 based simulation of radiotherapy in CUDA
Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn, Kevin Skadron University of Virginia.
Adam Wagner Kevin Forbes. Motivation  Take advantage of GPU architecture for highly parallel data-intensive application  Enhance image segmentation.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Biryaltsev E.V., Galimov M.R., Demidov D.E., Elizarov A.M. HPC CLUSTER DEVELOPMENT AND OPERATION EXPERIENCE FOR SOLVING THE INVERSE PROBLEMS OF SEISMIC.
ECEn 191 – New Student Seminar - Session 6 Digital Logic Digital Logic ECEn 191 New Student Seminar.
Workshop - November Toulouse (SoC toolKit for critical Embedded sysTems) Thales Use Case: Pedestrian tracking with smart cameras SoCKET Collaborative.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Locate Potential Support Vectors for Faster
How to use HybriLIT Matveev M. A., Zuev M.I. Heterogeneous Computations team HybriLIT Laboratory of Information Technologies (LIT), Joint Institute for.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
An Out-of-core Implementation of Block Cholesky Decomposition on A Multi-GPU System Lin Cheng, Hyunsu Cho, Peter Yoon, Jiajia Zhao Trinity College, Hartford,
Date of download: 6/1/2016 Copyright © 2016 SPIE. All rights reserved. Triangulated shapes of human head layer boundaries employed in simulations: (a)
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
SGI Rackable C2108-GP5 “Arcadia” Server
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Presented by Andrew Walker Migrating from to. What is the difference?  The differences between the two products come into three basic categories:- 
Accelerating particle identification for high-speed data-filtering using OpenCL on FPGAs and other architectures for FPL 2016 Srikanth Sridharan CERN 8/31/2016.
Comparing TensorFlow Deep Learning Performance Using CPUs, GPUs, Local PCs and Cloud Pace University, Research Day, May 5, 2017 John Lawrence, Jonas Malmsten,
Effects of Limiting Numerical Precision on Neural Networks
GPGPU use cases from the MoBrain community
M. Bellato INFN Padova and U. Marconi INFN Bologna
NFV Compute Acceleration APIs and Evaluation
Low-Cost High-Performance Computing Via Consumer GPUs
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Scientific requirements and dimensioning for the MICADO-SCAO RTC
Heterogeneous Computation Team HybriLIT
Sergei V. Gleyzer University of Florida
Hot Processors Of Today
Latency Measurement Testing
dawn.cs.stanford.edu/benchmark
ALICE Computing Model in Run3
Action Recognition Experiments
Sujay Yadawadkar, Virginia Tech
Super Micro Technology Computing
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
Accelerated Computing in Cloud
About Hardware Optimization in Midas SW
Hardware Accelerated Video Decoding in
CUBAN ICT NETWORK UNIVERSITY COOPERATION (VLIRED
PANN Testing.
Presentation transcript:

Machine Learning at the Edge High velocity data inferencing Audrey Corbeil Therrien Omar Quijano Averell Gatton Ryan Coffee

High velocity data - Timetool 10-100 us latency Schematic of timetool to laser sync 10-100 us latency Image of timetool waveform?

High velocity data - CookieBox Schematic cookiebox Downstream veto + time binning Image of cookie box single pulse, double pulse

Workflow Diagram of relationship between FPGA-GPU-CPU and their roles FPGA fast inference - ultimately ASIC? GPU for training the model CPU for the physics simulations providing ground truth - understanding physics of corner case Send corner case back for training Smart initializing - Model library Confidence metric - selects corner cases for further trainig, recommends reloading a new model to scientists when off track

Workflow Diagram of relationship between FPGA-GPU-CPU and their roles FPGA fast inference - ultimately ASIC? GPU for training the model CPU for the physics simulations providing ground truth - understanding physics of corner case Send corner case back for training Smart initializing - Model library Confidence metric - selects corner cases for further trainig, recommends reloading a new model to scientists when off track

Hardware CPU Simulation GPU Simulation FPGA 2TB NVMe NVIDIA Tesla P4 Dual CPU: Intel Xeon Gold 6148 20 Physical 40 Logical 2.4 GH at TDP 150W 3.7 GHz TB RAM: 764 GB DDR4 GPU Simulation NVIDIA Tesla P4 (Inferencing Accelerator) 5.5 TF 15x IL 60X EF NVIDIA Tesla P40 (GPU Accelerator) 3,840 cores 24 GB GDDR5 NVIDIA Tesla V100 640 Tensor Cors 5,120 Cores 32 GB FPGA KCU 1500 Hardware Accelerator 2TB NVMe

Software

Timeline Present CPU Simulation In Progress GPU Simulation CUDA 10 cuDNN 7.4 NCCL 2.4 TensorRT 5.0 In Progress FPGA JDK 1.8 Scala 2.12 Spatial sbt Milestone Pod Integration Fast Communication NVMe Cache Visualization Left: spatially multiplexed 2D timetool signal (January 2018) Right: Result of the convolution with the simulated waveform. 50% RAM / 80 CPU Data visualzation for interactivr mode

Conclusion Detectors provide data, users need information ML can convert data into information with low latency and act on the information to veto bad shots bin specific cases protect detectors We are building the software and hardware architecture for this objective Technology extendable to other detectors Poster presentation - Using FPGA for fast inferencing