The Pitfalls and Guidelines for Mobile ML Benchmarking

Slides:



Advertisements
Similar presentations
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Advertisements

DEPARTMENT OF COMPUTER ENGINEERING
ANDROID OPERATING SYSTEM Guided By,Presented By, Ajay B.N Somashekar B.T Asst Professor MTech 2 nd Sem (CE)Dept of CS & E.
Mobile Application Development
Chapter 3.1 Teams and Processes. 2 Programming Teams In the 1980s programmers developed the whole game (and did the art and sounds too!) Now programmers.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Emerging Platform#4: Android Bina Ramamurthy.  Android is an Operating system.  Android is an emerging platform for mobile devices.  Initially developed.
Your Interactive Guide to the Digital World Discovering Computers 2012.
Chapter 3 Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
@2011 Mihail L. Sichitiu1 Android Introduction Platform Overview.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Liam Newcombe BCS Data Centre Specialist Group Secretary Modelling Data Centre Energy Efficiency and Cost.
Accelerating image recognition on mobile devices using GPGPU
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Welcome to CPS 210 Graduate Level Operating Systems –readings, discussions, and programming projects Systems Quals course –midterm and final exams Gateway.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
Google. Android What is Android ? -Android is Linux Based OS -Designed for use on cell phones, e-readers, tablet PCs. -Android provides easy access to.
Advanced Software Engineering Dr. Cheng
Android Mobile Application Development
Phase 4: Manage Deployment
L25 - PlantPAx Process Application Development Lab I
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Containers as a Service with Docker to Extend an Open Platform
PA181 – Service Systems, Modeling and Execution
Tips and Tricks to Help Your Team Succeed
Using Unity as an Animator and Simulator for PaypyrusRT Models
Success Stories.
Benchmarking Deep Learning Inference
CIM Modeling for E&U - (Short Version)
The world’s most advanced mobile platform
Opening slide.
Architecture of Android
ANDROID AN OPEN HANDSET ALLIANCE PROJECT
Mobile Application Development
Network Operating Systems (NOS)
Lesson Objectives Aims Key Words
IS301 – Software Engineering Dept of Computer Information Systems
Computer Software Lecture 5.
Evaluating state of the art in AI
Advanced Graphics Algorithms Ying Zhu Georgia State University
Mazen Alkoa’ & Ahmad Nabulsi Submitted to Dr. Sufyan Samara
Utilizing AI & GPUs to Build Cloud-based Real-Time Video Event Detection Solutions Zvika Ashani CTO.
Texas Instruments TDA2x and Vision SDK
A technical look at new capabilities and features
Speaker’s Name, SAP Month 00, 2017
CMPE419 Mobile Application Development
Software Defined Networking (SDN)
Chapter 2: The Linux System Part 2
Objective Understand web-based digital media production methods, software, and hardware. Course Weight : 10%
Software Architecture
Microsoft Virtual Academy
What's New in eCognition 9
Overview of Machine Learning
Introduction to Heterogeneous Parallel Computing
Delivering great hardware solutions for Windows
LO2 – Understand Computer Software
Emerging Platform#3 Android & Programming an App
Tioga Tae Kwon Do Student Management System
IST346: Virtualization and Containerization
Food Inventory Tracker
CMPE419 Mobile Application Development
What's New in eCognition 9
Windows Forms in Visual Studio 2005: An in-depth look at key features
What's New in eCognition 9
How Dell, SAP and SUSE Deliver Value Quickly
From Use Cases to Implementation
Presentation transcript:

The Pitfalls and Guidelines for Mobile ML Benchmarking In this talk, I’m going to talk about how we use the benchmark data to drive the design of the neural network models, the software implementations, as well as influencing the hardware vendors on the hardware implementation decisions. A lot of the processes are very research oriented. We welcome the research community to participate in the flow and advance the state-of-the-art. Fei Sun Software Engineer, AI platform, Facebook

Mobile ML Deployment at Facebook

ML Deployment at Facebook Machine learning workload used at Facebook Neural style transfer Person segmentation Object detection Classification etc. Need refinemennt

Neural Style Transfer Neural style transfer is the effect that applies the style from the style image to the content in the content image. With our engineers’ work, we managed to render the neural style transfer in real time video.

Person Segmentation Mermaid effect in Instagram The mermaid effect in Instagram uses person segmentation that replaces the background to an artificially rendered underwater background.

Object Detection Object detection is a building block for artificial reality. In the image on the left, the the cup with BFF on it is not a real cup on the table. But a phone rendered cup. In the image on the right, the flowers on the plant are also not real. We can only add those effects after we detect those objects.

Pitfalls of Mobile ML Benchmarking

Hardware Variations CPU GPU DSP NPU If the server side hardware variation is at single digit or low tens, the hardware variation for mobile is ten to hundred times more. How good it is if the hardware can only run the models in the benchmark suites efficiently? All of them may be in an SoC. How to specify what is used and what is not?

Software Variations ML frameworks (e.g. Pytorch/Caffe2, TFLite) Kernel libraries (e.g. NNPACK, EIGEN) Compute libraries (e.g. Open CL, Open GL ES) Proprietary libraries Different build flags This space is far more fragmented than the server counterpart. The mobile software support is far less mature than the server counter part. Many have bugs. How good it is if the software can only run the models in the benchmark suite bug free?

OEM Influence Some performance related knobs are not tunable at user domain. Require rooted phone or special built phone. But the OEM/Chipset vendors can change them for different scenarios. Some algorithms (e.g. scheduling) are hard-coded by OEM/Chipset vendors, but they influence performance a lot.

Run to Run Variation Run-to-run variation on mobile is very obvious Difference between different iterations of the same run Difference between different runs How to correlate the metrics collected from the benchmark runs to the production runs? How to specify the benchmark condition?

How to Avoid Benchmark Engineering Is it useful if the hardware only runs models in benchmark suite efficiently the software is only bug free running models in the benchmark suite a different binary is used to benchmark each model the code is bloated if the binary includes different algorithms for different models I don’t have solutions here, but I’d like to as a few questions

Guidelines of Mobile ML Benchmarking

Key Components of Mobile ML Benchmarking I want to emphasize the third point. Due to the large variations of the hardware platforms and the software frameworks, kernels, libraries, it is very important to design a methodology that is benchmark engineering proof. The last item is also important when we have such a methodology. We can use the same methodology evaluating our proprietary models and compare the result with the published benchmark results. Relevant mobile ML Models Representative mobile ML workload Unified and standardized methodology of evaluating the metrics in different scenarios

Make Mobile ML Repeatable Too many variables and pitfalls Judge benchmarking result by another person Easily reproduce the result Easily separate out the steps Easily visualize results from different benchmark runs It is difficult to give guidelines on ML benchmarking, because the conditions vary quite differently. The best judge is another person. As long as a person is okay with the benchmarking methodology, it is often fine.

Facebook AI Performance Evaluation Platform

Challenges of ML Benchmarking What ML models matter? How to enhance ML model performance? How to evaluate system performance? How to guide future HW design? Neural network algorithms are computation intensive. If you launch a camera effect on your phone, you may feel that the phone gets hot after a while. It drains too much power. Because the facebook app is deployed to billions of phones, phones with different CPU micro-architecture, different GPUs, DSPs, and accelerators. But we only have one app that cover the entire spectrum in terms of the device capabilities. We want to help all the hardware vendors to improve the their efficiency running neural network workload. The hardware becomes more efficient, the facebook app uses less power. Specifically, we’d like to the hardware vendors to optimize the important models. Which neural network models matter? We want to help them to improve the neural network model performance on their hardware. Help them to iterate faster. We’d also like to assist them to evaluate against market, in a fair, honest, and representative way. Lastly, we’d like to help them on future hardware designs.

Goals of AI Performance Evaluation Platform Provide a model zoo on important models Normalize the benchmarking metrics and conditions Automate the benchmarking process Honest measurement on performance

Backend & Framework Agnostic Benchmarking Platform Same code base deployed to different backends Vendor needs to plugin their backend to the platform The build/run environment is unique The benchmark metrics are displayed in different formats Good faith approach Easily reproducing the result

Benchmarking Flow To help the vendors, we propose a benchmarking procedure, which composes three steps. First, it contains a centralized model zoo, which contains some representative models that we care about. It can be the ONNX format (open neural network exchange format), but it can be in any model format. The key thing is that those models are comparable so we have standardized inputs to the benchmark. Then, because there are many different kinds of hardware platforms, such as GPU, CPU, Phone, or embedded, each platform has an optimal build and run environment. The benchmarking is done in a distributed manner. But we have one benchmark harness deployed to all the platforms so that we can ensure the benchmarking in each platform follow the same procedure and the same metrics collected. But of course, part of the harness is customized for each backend platform to make the best out of the platform. So, the benchmarking is distributed and the metrics collected. The metrics are sent to a centralized place for analysis. People can filter, plot, compare those data with ease.

Benchmark Harness Diagram Repository Models Harness Benchmark preprocess Meta data Repo driver Regression detection Task Benchmark driver Data Preprocess pipeline Online Data Postprocess pipeline Reporters Files Caffe2 driver Custom framework driver Benchmark inputs Data processing Screen Backend drivers Backend drivers Standard API Data Converter Data Converter Harness provides Vendor provides Backend Backend Caffe2 Custom framework

Current Features Support android devices, ios, windows, linux. Support Pytorch/Caffe2, TFLite etc. frameworks. Support latency, accuracy, FLOPs, power, energy metrics. Save results locally, remotely, or display on screen. Deal with git, hg version controls. Cache models, binaries locally. Continuously run benchmarks on new commits. A/B testing on regression detection. 

Open Source Facebook AI Performance Evaluation Platform https://github.com/facebook/FAI-PEP Encourage community to contribute

Questions