Standards for Machine Learning, Inferencing and Vision Acceleration

Standards for Machine Learning, Inferencing and Vision Acceleration
Neil Trevett | Khronos President NVIDIA | VP Developer Ecosystems

Welcome to Khronos BOF Sessions!
Time Session Speakers 9:30AM-12:30PM 3D Graphics and Virtual Reality with Vulkan and OpenXR Neil Trevett, NVIDIA - Introduction to Vulkan and OpenXR Hai Nguyen, Google - Vulkan: Getting Started, Best Practices and using HLSL Ashley Smith, AMD - Vulkan Memory and Heap Types – and When to Use Which! Nuno Subtil, NVIDIA - Using Subgroups to Bring More Parallelism to your Application Teemu Virolainen, Basemark - Vulkan in the Rocksolid Engine 2:30PM-4:30PM WebGL: Latest Techniques Xavier Ho, CSIRO’s Data61 and Juan Miguel de Joya, UN ITU - Welcome! Neil Trevett, NVIDIA - A Brief History of WebGL Xavier Ho and Juan Miguel de Joya - Optimizing your WebGL Xavier Ho and Juan Miguel de Joya - Survey of WebGL Applications Mike Alger and Germain Ruffle, Google - User Experiences for Digital/Augmented Spaces 4:30PM-5:30PM glTF: Efficient 3D Models Neil Trevett, NVIDIA - glTF Overview and Latest Updates Mark Callow, glTF Working Group - Universal Compressed Texture Transmission Format Update 5:30PM-5:35PM Japanese Greeting and Invitation to Khronos! Hitoshi Kasai, Khronos - Learn how your company can benefit from joining Khronos! 5:35PM-6PM Standards for machine learning, inferencing and vision acceleration: NNEF, OpenVX and OpenCL Neil Trevett, NVIDIA - Overview of NNEF, OpenVX and OpenCL and how they relate to each other to deliver effective acceleration for inferencing, machine learning and vision processing All slides will be posted at

Khronos Mission Software Silicon Asian Members
Khronos is an International Industry Consortium creating royalty-free, open standards to enable software to access hardware acceleration for 3D Graphics, Virtual and Augmented Reality, Parallel Computing, Neural Networks and Vision Processing

Khronos Standards for AR and VR
Download 3D object and scene data Vision and sensor processing - including neural network inferencing for machine learning High-performance, low-latency 3D Graphics Portable interaction with VR/AR sensor, haptic and display devices

Machine Learning Acceleration
Training on Desktop and Cloud Deployment on Embedded Devices Training Data Sets Vision and Inferencing Applications Live Data Neural Net Training Frameworks Neural Net Training Frameworks Trained Networks OR Neural Net Training Frameworks Training Frameworks Optimization Vision and Inferencing Runtimes Desktop and Cloud GPU/TPU Acceleration Diverse Inferencing Acceleration Hardware

Machine Learning Training
Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Authoring Interchange GPUs have well established APIs and libraries for compute acceleration Desktop and Cloud Hardware cuDNN MIOpen clDNN TPU

NNEF - Neural Network Exchange Format
Before NNEF NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Every Tool Needs an Exporter to Every Accelerator With NNEF NN Authoring Framework 1 Inference Engine 1 NN Authoring Framework 2 Inference Engine 2 NN Authoring Framework 3 Inference Engine 3 Optimization and processing tools

NNEF 1.0 NNEF V1.0 released in August 2018
Released here at SIGGRAPH Asia! Uses the Python script that builds the TensorFlow graphs, to capture higher level functions and export compound operations TensorFlow and Caffe Import / Export Live NNEF V1.0 released in August 2018 After positive industry feedback on Provisional specification released in December 2017 Imminent Easy to use as conversion needs just trained protobuf files TensorFlow and Caffe2 Import / Export Files Syntax Parser/ Validator Google NNAPI Convertor OpenVX Ingestion & Execution NNEF open source projects hosted on Khronos NNEF GitHub repository Apache 2.0 license NNEF Working Group Participants Tools Roadmap: multiple converters in a unified tool - published as a Python package

NNEF and ONNX Comparing Neural Network Exchange Industry Initiatives
Same Industry Dynamics as LLVM and SPIR-V Embedded Inferencing Import Training Interchange Defined Specification Open Source Project Stability for hardware deployment Software stack flexibility Multi-company Governance Initiated by Facebook Bidirectional translator in open source Khronos tried to use LLVM as a hardware IR BUT LLVM evolves without needing to preserve backwards compatibility. Fine for software compilers – very difficult to manage for hardware toolchains and roadmaps SO Khronos created hardware oriented SPIR-V with bidirectional translation to LLVM ONNX and NNEF are Complementary - ONNX will HAVE to move fast to track authoring framework interchange - NNEF provides a stable bridge from training into edge inferencing engines Initiating open source bidirectional translator NNEF adopts a rigorous approach to design life cycles - especially needed for safety-critical or mission-critical applications in automotive, industrial and infrastructure markets

Platform Inferencing Stacks
Consistent Three Steps 1. Import trained NN model file 2. Build optimized version of graph 3. Run graph on accelerated runtime using underlying low-level API Core ML Model Microsoft Windows Windows Machine Learning (WinML) Google Android Neural Network API (NNAPI) Apple MacOS and iOS CoreML

NNVM - Open Compiler for AI Inferencing
Paul G. Allen School of Computer Science & Engineering, University of Washington 1.Import Trained Network Description 2. Graph-level Optimizations 3. Decompose to primitive instructions and emit programs for accelerated run-times LLVM IR for CPUs SPIR-V IR for parallel accelerators Backend in development Facebook Glow Compiler (Graph Lowering Optimizations)

OpenVX Portable, Efficient Vision Processing! Power Efficiency
Wide range of vision hardware architectures OpenVX provides a high-level Graph-based abstraction -> Enables Graph-level optimizations! Can be implemented on almost any hardware or processor! Portable, Efficient Vision Processing! Shipping Implementations Power Efficiency Computation Flexibility Dedicated Hardware GPU Compute Multi-core CPU X1 X10 X100 Vision DSPs Vision Node Vision Processing Graph

Extending OpenVX for Inferencing
Importing NNEF Neural Network Descriptions Neural Network Extension OpenVX Nodes to represent common NN Layers 1D-4D Tensors to connect layers INT16, INT7.8, INT8, and U8 Tensor Ops NNEF Translator Ingests NNEF File and builds OpenVX node graph Open source project in progress vxActivationLayer vxConvolutionLayer vxDeconvolutionLayer vxFullyConnectedLayer vxNormalizationLayer vxPoolingLayer vxSoftmaxLayer vxROIPoolingLayer … Vision Node Native Camera Control Downstream Application Processing Vision Node Vision Node CNN Nodes An OpenVX graph mixing CNN nodes with traditional vision nodes NNEF Translator converts NNEF representation into OpenVX Node Graphs

Extending OpenVX for Custom Nodes
OpenVX/OpenCL Interop Provisional Extension Enables custom OpenCL acceleration to be invoked from OpenVX User Kernels Memory objects can be mapped or copied Kernel/Graph Import Provisional Extension Defines container for executable or IR code Enables arbitrary code to be inserted as a OpenVX Node in a graph Application OpenVX user-kernels can access command queue and cl_mem objects to asynchronously schedule OpenCL kernel execution Fully asynchronous host-device operations during data exchange OpenVX data objects Copy or export cl_mem buffers into OpenVX data objects cl_mem buffers OpenCL Command Queue Runtime Map or copy OpenVX data objects into cl_mem buffers Runtime OpenVX/OpenCL Interop

NNEF and OpenVX for Inferencing
Many inferencing stacks end up using OpenCL for hardware acceleration Ingestion Compilation Translator Compile to IR/Binary Proprietary Runtimes Executable Code NN Extension Kernel Import Acceleration APIs Vision Nodes To mix inferencing with vision and other custom processing User Nodes Execute accelerated OpenVX Runtime Compile to executable code

OpenCL – Unique Heterogeneous Runtime
OpenCL is the only industry standard for low-level heterogeneous compute Portable control over memory and parallel task execution “The closest you can be to diverse accelerator hardware and still be portable” Application or Inferencing Run-time Application or Inferencing Run-time Growing number of optimized OpenCL vision and inferencing libraries Vision: OpenCV, Halide, Visioncpp Machine Learning: Xiaomi MACE, Arm Compute Library Linear Algebra: clDNN, clBlast, ViennaCL Fragmented GPU API Landscape GPU CPU CPU GPU CPU GPU Heterogeneous compute resources FPGA DSP Custom Hardware

OpenCL – Low-level Parallel Programing
Low-level programming of heterogeneous parallel compute resources One code tree can be executed on CPUs, GPUs, DSPs and FPGA … OpenCL C or C++ language to write kernel programs to execute on any compute device Platform Layer API - to query, select and initialize compute devices Runtime API - to build and execute kernels programs on multiple devices The programmer gets to control: What programs execute on what device Where data is stored in various speed and size memories in the system When programs are run, and what operations are dependent on earlier operations GPU OpenCL Kernel Code CPU OpenCL Kernel Code DSP Kernel code compiled for devices Runtime API loads and executes kernels across devices OpenCL Kernel Code CPU CPU OpenCL Kernel Code FPGA Devices Host

OpenCL Industry Adoption
Hardware Implementors 100s of applications using OpenCL acceleration Rendering, visualization, video editing, simulation, image processing Over 6,500 GitHub repositories using OpenCL Tools, applications, libraries, languages Up from 4,310 one year ago Khronos Resource Hub Sample Applications

OpenCL as Language/Library Backend
C++ based Neural network framework Language for image processing and computational photography Accelerated computing library in open source Single Source C++ Programming for OpenCL Java language extensions for parallelism Vision processing open source project Compiler directives for Fortran, C and C++ Open source software library for machine learning (beta) Hundreds of languages, frameworks and projects using OpenCL to access vendor-optimized, heterogeneous compute runtimes

OpenCL Conformant Implementations
1.0|May09 1.1|Jul11 1.2|Jun12 1.0|Aug09 1.1|Aug10 1.2|May12 2.0|Dec14 1.0|May10 1.1|Feb11 Desktop 1.1|Mar11 1.2|Dec12 2.0|Jul14 2.1 | Jun16 1.0|May09 1.1|Jun10 1.2|May15 1.1|Aug12 1.2|Mar16 2.0|Apr17 Mobile 1.0 | Feb11 1.2|Sep13 1.1|Nov12 1.2|Apr14 2.0|Nov15 1.1|Apr12 1.2|Dec14 1.2|Sep14 1.1|May13 Embedded 1.0|Jan10 1.2|May15 1.2|Aug15 FPGA 1.0|Jul13 1.0|Dec14 Vendor timelines are first conformant submission for each spec generation Dec08 Jun10 Nov11 Nov13 Nov15 OpenCL 1.0 Specification OpenCL 1.1 Specification OpenCL 1.2 Specification OpenCL 2.0 Specification OpenCL 2.1 Specification

OpenCL Ecosystem Roadmap
OpenCL has an active three track roadmap Bringing Heterogeneous compute to standard ISO C++ Khronos hosting C++17 Parallel STL C++20 Parallel STL with Ranges Proposal SYCL 1.2 C++11 Single source programming SYCL 1.2.1 C++11 Single source programming Processor Deployment Flexibility Parallel computation across diverse processor architectures 2011 2015 2017 OpenCL 1.2 OpenCL C Kernel Language OpenCL 2.1 SPIR-V in Core OpenCL 2.2 C++ Kernel Language Kernel Deployment Flexibility Execute OpenCL C kernels on Vulkan GPU runtimes

Bringing OpenCL Compute to Vulkan
Experimental Clspv Compiler from Google, Adobe and Codeplay Compiles OpenCL C to Vulkan’s SPIR-V execution environment Tested on over 200K lines of Adobe OpenCL C production code Open source - tracks top-of-tree LLVM and clang, not a fork Increasing deployment options for OpenCL developers e.g. Vulkan is a supported API on Android OpenCL C Source OpenCL Host Code Prototype open source project Prototype open source project Clspv Compiler Run-time API Translator Runtime

Refining clspv with Diverse Workloads
Test kernel repositories to upload kernels to be used in long-term perf/regression testing – Apache 2.0 Compile diverse OpenCL C kernel workloads Interesting domains to explore include existing OpenCL compute libraries - including vision and inferencing Efficient mapping to Vulkan SPIR-V? Updates to compiler achieve efficient mapping? Vulkan is already expanding its compute model e.g. 16-bit storage, Variable Pointers, Subgroups. Compact memory types and operations coming Propose updates to Vulkan programming model

OpenCL Next - Feature Set Flexibility
Defining OpenCL features that become optional for enhanced deployment flexibility API and language features e.g. floating point precisions Feature Sets avoid fragmentation Defined to suit specific markets – e.g. desktop, embedded vision and inferencing Implementations are conformant if fully support feature set functionality OpenCL 2.2 Functionality = queryable, optional feature Industry-defined Feature Set E.g. Embedded Vision and Inferencing Khronos-defined OpenCL 2.2 Full Profile Feature Set Khronos-defined OpenCL 1.2 Full Profile Feature Set

Thank You! Khronos is creating cutting-edge royalty-free open standards For 3D, VR/AR, Compute, Vision and Machine Learning Please join Khronos! We welcome members from Japan and Asia Influence the direction of industry standards Get early access to draft specifications Network with industry-leading companies More Information @neilt3d Khronos Group Japan Local Member Meeting Open to current and prospective members! Thursday, 6 December 2018 5:00 PM – 8:00 PM JST Yurakucho Cafe & Dining B1F, Tokyo International Forum Chiyoda-ku, Tōkyō-to Please Register for Free!

Standards for Machine Learning, Inferencing and Vision Acceleration

Similar presentations

Presentation on theme: "Standards for Machine Learning, Inferencing and Vision Acceleration"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Standards for Machine Learning, Inferencing and Vision Acceleration

Similar presentations

Presentation on theme: "Standards for Machine Learning, Inferencing and Vision Acceleration"— Presentation transcript:

Similar presentations

About project

Feedback