Huawei CBG AI Challenges

Slides:

Advertisements

Similar presentations

Spatial Pyramid Pooling in Deep Convolutional

Advertisements

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.

Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,

Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)

© 2013 by Larson Technical Services

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

Siri Voice controlled Virtual Assistant Haroon Rashid Mithun Bose 18/25/2014.

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Olivier Siohan David Rybach

Unsupervised Learning of Video Representations using LSTMs

Convolutional Neural Network

Research on Machine Learning and Deep Learning

ECE 417 Lecture 1: Multimedia Signal Processing

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

The Relationship between Deep Learning and Brain Function

Deep Learning Amin Sobhani.

Randomness in Neural Networks

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Data Mining, Neural Network and Genetic Programming

Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.

Chilimbi, et al. (2014) Microsoft Research

ARTIFICIAL NEURAL NETWORKS

Recurrent Neural Networks for Natural Language Processing

Performance of Computer Vision

Article Review Todd Hricik.

Introduction to Azure Bot Framework

Introductory Seminar on Research: Fall 2017

Ajita Rattani and Reza Derakhshani,

Deep learning and applications to Natural language processing

Lecture 5 Smaller Network: CNN

Neural Networks 2 CS446 Machine Learning.

Training Techniques for Deep Neural Networks

Mining the Data Charu C. Aggarwal, ChengXiang Zhai

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

ASAP and Deep ASAP: End-to-End Audio Sentiment Analysis Pipelines

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Object Recognition & Detection

Introduction to Neural Networks

CSSE463: Image Recognition Day 20

Speech Capture, Transcription and Analysis App

Lecture: Deep Convolutional Neural Networks

CSSE463: Image Recognition Day 18

Voice Activation for Wealth Management

John H.L. Hansen & Taufiq Al Babba Hasan

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

CSSE463: Image Recognition Day 18

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Advances in Deep Audio and Audio-Visual Processing

Heterogeneous convolutional neural networks for visual recognition

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Lecture 21: Machine Learning Overview AP Computer Science Principles

Automatic Handwriting Generation

Human-object interaction

Image Processing and Multi-domain Translation

Object Detection Implementations

What's New in eCognition 9

Sequence-to-Sequence Models

THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU

Listen Attend and Spell – a brief introduction

Lecture 9: Machine Learning Overview AP Computer Science Principles

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Huawei CBG AI Challenges Parkhomenko Denis, Bankevich Sergey, Korikov Kirill, Bakarov Amir

Huawei CBG AI Challenges Computer Vision Challenges Speech & Language Challenges NLU ASR

How to make neural net lighter? State-of-the-art neural nets are very complex in terms of - calculation - size How to incorporate them in so small chips? ImageNet 1K validation set accuracy

How to make neural net lighter? Tensor decomposition: Filter quantization, dictionary based convolutions: Target platform optimization: - deep knowledge of CPU/TPU architecture - vectorization, intrinsics, code optimization R&D in new methods of matrix, tensors decomposition Optimal parameter search Consider functional spaces C1(X,Y), C2(X,Y). For any given model params θ=(θ1,…, θN) and model f1(x,θ)∈C1 find f2∈C2 such that: 𝑓 2 (x)= 𝑎𝑟𝑔𝑚𝑖𝑛 𝑓 2 ∈ 𝐶 2 ( 𝐸 𝑥~𝑋 || 𝑓 1 (𝑥,𝜽)− 𝑓 2 (𝑥)||)  If you good at low-level programming

Optical character recognition Task 1: Text detection in image Task 2: Text recognition in cropped image Task 3: End-to-end detection+recognition Task 4: Inpaiting Humans are still better there [https://towardsdatascience.com/image-inpainting-humans-vs-ai-48fc4bca7ecc]

Huawei CBG AI Challenges Computer Vision Challenges Speech & Language Challenges NLU ASR

Dialogue System

Amazon Alexa Skill Builder Interface Intent Detection Amazon Alexa Skill Builder Interface

Whole-sentence features Intent Detection Corpus Word vector Neural network Whole-sentence features Rule-based heuristics ... Word-level features: Entities information Syntax parsing features Word-level vectors softmax More networks External information: Dictionary Sentence-level vectors Intent Classification Network selection: RNN CNN Attention / Transformer … Sentence vector

NLU Challenges Deep learning model needs a lot of labeled data For our skills we could use assessors to generate and classify corpus But for third-party skills we could rely only on provided corpus (usually, tens of samples) Is it possible to build a good classifier using such a small amount of data?

Challenge Deep learning model needs a lot of labeled data For our skills we could use assessors to generate and classify corpus But for third-party skills we could rely only on provided corpus (usually, tens of samples) Is it possible to build a good classifier using such a small amount of data? Example: 10 samples of labeled data 100 samples of unlabeled data train the model on 100 samples and transfer labels

More challenges Word sense disambiguation Cross-lingual transfer Integration of knowledge graphs to supervised models Anaphora and coreference resolution Chit-chatting support Personalization of conversational agents …

Huawei CBG AI Challenges Computer Vision Challenges Speech & Language Challenges NLU ASR

ASR task Convert audio input to text output Applications Voice assistants (phone, home, car) Recording/voice input transcription Movie captions

ASR pipeline Feature extraction Acoustic model: morphemes/letters Language model, decoder: text Postprocessing

ASR components Support specific input conditions Language, accent Close/far field Deal with noise, multiple people speaking, low volume/quality Different hardware

ASR components Support specific input conditions Provide specific output properties Normalization Domains

ASR components Support specific input conditions Provide specific output properties Related and relative tasks Voice activity detection Trigger phrase Direct classification

ASR challenges Speaker diarization, cocktail party, denoise Flexible language model Handling variety of accents ASR on device Text normalization Optimization for production: C/C++, low-level