Rajy Rawther Kiriti Nagesh Gowda

Rajy Rawther Kiriti Nagesh Gowda
Inference with OpenVX Rajy Rawther Kiriti Nagesh Gowda 3:00 – 5:30 PM

Introduction to MIVisionX Toolkit
MIVisionX toolkit is a comprehensive set of computer vision and machine intelligence libraries, utilities and applications bundled into a single toolkit. AMD OpenVX is delivered as Open Source with MIVisionX. Primarily targeted at applications requiring a combination of machine learning inference, computer vision, and image/video processing. Includes a model compiler for converting and optimizing a pretrained model from existing formats such as Caffe, NNEF and ONNX to an OpenVX backend. After compilation, MIVisionX generates an optimized library specific for a backend to run inferencing and vision pre- and post-processing modules. It is beneficial to have lightweight and dedicated APIs optimized for AMD hardware for inference deployment as opposed to heavyweight frameworks.

Introduction to MIVisionX toolkit
All the software for MIVisionX is available under ROCm - AMD’s platform for GPU computing and machine learning. In this tutorial we will show how to use MIVisionX toolkit to run some sample neural net applications doing image classification, object detection or segmentation. The tutorial includes hands on training examples that allow participants to develop real world applications including computer vision, inferencing and visualization. The tutorial demonstrates how the OpenVX framework with neural network extension is used to deploy some sample applications.

Inference with OpenVX Convert Pre-trained models in Caffe/NNEF/ONNX to OpenVX graph Optimize NN model in OpenVX Layer Fusion Memory Optimization Quantization Tensor Fusion Dynamic batch size update Add pre- & post-processing nodes Run Optimized inference on target hardware

Inference with OpenVX

Prerequisites Inference with OpenVX
Ubuntu 16.04/18.04 or CentOS 7.5/7.6 ROCm supported hardware AMD Radeon GPU or APU required ROCm 2.5 and above Build & Install MIVisionX MIVisionX installs model compiler at /opt/rocm/mivisionx Tutorial : We will be using MIVisionX Dockers

hardware AMD Ryzen™ Embedded V1807B with Radeon™ RX Vega 11 Graphics
# of CPU Cores: 4 # of Threads: 8 CPU Max Freq.: 3.8GHz CPU Base Freq.: 3.35GHz Integrated GPU: AMD Radeon™ Vega 3 (V1202B) Vega 8 (V1605B / V1756B) Vega 11 (V1807B) Graphics

hardware

AMD ML SOFTWARE Stack with ROCm
Data Platform Tools Latest Machine Learning Frameworks Docker and Kubernetes support Optimized Math & Communication Libraries Up-Streamed for Linux Kernel Distributions open source Foundation for Machine learning Machine Learning Apps MIVisionX Apps Frameworks Exchange formats MIVisionX Middleware and Libraries MIOpen BLAS, FFT, RNG RCCL Eigen ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python 11 GPU CPU APU Future Accelerators Devices

Tutorial Example 1: Image classification
Using a pretrained ONNX model Step 1 : Convert Pre-trained models in Caffe/NNEF/ONNX to OpenVX graph Convert Pre-Trained model to AMD NNIR

Using a pretrained ONNX model Step 1 : Convert Pre-trained models in Caffe/NNEF/ONNX to OpenVX graph Apply Optimization

Using a pretrained ONNX model Step 1 : Convert Pre-trained models in Caffe/NNEF/ONNX to OpenVX graph Convert AMD NNIR to OpenVX C Code

Using a pretrained ONNX model Step 2 : Add pre & post processing nodes

Using pretrained ONNX model Step 3 : Run Optimized inference on target hardware

Tutorial Example 2: Object Detection
Using a Pre-Trained Caffe model

Tutorial Example 3: Image Classification
Using a Pre-Trained NNEF model

MIVisionX Inference Deployment API
Enables a pre-trained CNN model to be deployed on AMD’s ROCm stack with MIVisionX Architected to support different backends including OpenCL-ROCm, OpenCL-HIP, WinML etc. All options use OpenVX for executing the inference engine. Two stage process mv_compile for compiling the model (Caffe, ONNX, NNEF) for the specific backends with the option to run Model Optimizer for fuse operations, quantization etc. mv_compile generates deployment library (libmvdeploy.so), binary weights for the model, and some .cpp and .h files required to run inference engine. Also it generates a post-processing module for running argmax or bounding box object detection. Users can use the deployment package to develop real-world applications for both inference and vision.

MIVisionX Inference Deployment Block Diagram
Convert to NNIR NNIR2clib compiler Pre-trained and Optimized NN Model (Caffe, ONNX, NNEF) Deployment package: library, weights & sample [Application] Use deployment package and add pre/post processing modules to build real-world application Fuse kernels, memory optimization, quantize etc. Optimize NNIR Model mv_compile

MIVisionX Deployment Workflow
mv_compile --model <model_name: name of the trained model with path> [required] --install_folder <install_folder: the location for compiled model> [required] --input_dims <input_dims: n,c,h,w - batch size, channels, height, width> [required] --backend <backend: name of the backend for compilation> [optional - default:OpenVX_Rocm_OpenCL] --fuse_cba <fuse_cba: enable or disable Convolution_bias_activation fuse mode (0/1)> [optional - default: 0] --quant_mode <quant_mode: fp32/fp16 - quantization_mode for the model: if enabled the model and weights would be converted MIVisionX Deployment Workflow Compile model for specific backend using mv_compile. mv_compile options.. --model <model_name> : name of the trained model with full path [required] --install_folder <install_folder> : the location for compiled model [required] --input_dims <n,c,h,w>: dimension of input for the model given in format NCHW [required] --backend <backend>: is the name of the backend for compilation [optional : default:OpenVX_Rocm_OpenCL] --fuse_cba <0/1> :enable or disable Convolution_bias_activation fuse mode(0/1) [optional-default:0] --quant_mode <fp32/fp16>: if enabled the model and weights would be converted [optional(default:fp32)] Deploy the compiled model with optional pre and/or postprocessing functions. mvdeploy_api.h has all the API function definitions for deploying the model.

MIVisionX Deployment API
API Function Description mvInitializeDeployment (const char* install_folder) Initializes the deployment engine with the URL of the install_folder QueryInference (int *num_inputs, int *num_outputs,..) Returns the model config such as input and output dimensions SetPreProcessCallback (mv_add_preprocess_callback_f preproc_f, …) Callback function to add preprocessing nodes to the same OpenVX graph as the inference mvCreateInferenceSession(mivid_session *inf_session, const char *install_folder) Creates an inference deployment session and returns a handle mvSetInputDataFromMemory/File (mivid_session inf_session, int input_num, void *input_data, ..) sets input for the inference or copies it to input tensor mvRunInference (mivid_session inf_session, …) Executes the inference for a single frame for #iterations mvGetOutput (mivid_session inf_session, void *out_mem,..) Copies the output to user specified memory mvShutdown (mivid_session inf_session) Releases inference session and free all resources

Tutorial Example 5: Object detection from video using OpenVX video decoder preprocessing node.
This example shows how to add pre-processing nodes, such as decoding with the amd_video_decoder node along with inferencing for a realistic object detection application. Shows the use of MIVisionX utility mv_compile, which is used to compile an inference deployment package from a pre-trained model for a specific backend. The compiled package consists of a library, header files, application specific source files and the binary weights of the model. Also it shows how to add a pre-processing node to the inference graph using the MIVisionX deployment API. It demonstrates how to build an application which does objection detection using YoloV2 from one video stream or multiple video streams. This example shows the advantage of using OpenVX which avoids extra data copies thereby achieving a significance performance increase.

Tutorial Example 5: Flow chart
yolov2 pretrained model Lib_mvdeploy.so, weights.bin, mv_deployapi.cpp, sample app mv_compile mv_compile utility Initializes deployment from the library; loads weights for the constant tensors mvInitializeDeployment() preprocess_addnodes_cb() Adds vx_amdMediaDecoder node, vx_convertImageToTensor() node mvCreateInference() Creates vxGraph, vxContext and calls vxVerifyGraph() mvobjdetect example mvRunInference(), called in a loop for all frames in the video VxProcessGraph() with pre-processing nodes and CNN nodes mvShutdown(), Releases vxGraph, all vx memory resources

Tutorial Example 5: use of mv_compile.exe
Step 1: mv_compile.exe --model <model> --install_folder <install_folder> --input_dims n,c,h,w --fuse-cba <0/1> Generates library libmv_deploy.so, mvdeploy.h, weights.bin, mvdeploy_api.cpp, mvtestdeploy.cpp in the install_folder Generates mv_extras library for doing the most common postprocessing functions like argmax and bounding box detections Step 2:. Compile and run mvobjdetect example using the precompiled deployment package Copy additional source files and CMakeLists.txt from mvobjdetect folder to the install_folder Build the mvobjdetect example Run using mvobjdetect <input_file.mp4> --install_folder . --frames <num> --bb cls, th1, th2 --v Shows the usage of OpenCV for visualizing the results

Mv_objdetect using 4 video streams
Example shows decoding 4 video streams simultaneously using amd_media_decoder OpenVX node and running the inference on 4 streams and visualizing the results using OpenCV.

Resources Tutorial part 1: Tutorial part 2: MIVisionX:

Disclaimers and attributions
The information contained herein is for informational purposes only and is subject to change without notice. Timelines, roadmaps, and/or product release dates shown in these slides are plans only and subject to change. “Polaris”, “Vega”, “Radeon Vega”, “Navi”, “Zen” and “Naples” are codenames for AMD architectures, and are not product names. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18 ©2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Ryzen, Threadripper, EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Rajy Rawther Kiriti Nagesh Gowda

Similar presentations

Presentation on theme: "Rajy Rawther Kiriti Nagesh Gowda"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rajy Rawther Kiriti Nagesh Gowda

Similar presentations

Presentation on theme: "Rajy Rawther Kiriti Nagesh Gowda"— Presentation transcript:

Similar presentations

About project

Feedback