Natural Language Processing (NLP) Systems Joseph E. Gonzalez

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Information Retrieval in Practice

1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.

Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Victorian Curriculum F–10 Online professional learning session Unpacking Digital Technologies Paula Christophersen Digital Technologies, Curriculum Manager.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Combining Neural Networks and Context-Driven Search for On- Line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandn J. Web, and Richard.

A NONPARAMETRIC BAYESIAN APPROACH FOR

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Chapter 3 Data Representation

Attention Model in NLP Jichuan ZENG.

Olivier Siohan David Rybach

Neural Machine Translation

TensorFlow– A system for large-scale machine learning

SUNY Korea BioData Mining Lab - Journal Review

Information Retrieval in Practice

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Benchmarking Deep Learning Inference

WAVENET: A GENERATIVE MODEL FOR RAW AUDIO

Compact Bilinear Pooling

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Yannis Flet-Berliac Tengyu Zhou Maciej Korzepa Gandalf Saxe

THIS IS TO EVIDENCE YOUR WORK AND GET THE BEST GRADE POSSIBLE

Recurrent Neural Networks for Natural Language Processing

R SE to the challenges of ntelligent systems

dawn.cs.stanford.edu/benchmark

Intelligent Information System Lab

Efficient Estimation of Word Representation in Vector Space

Multimedia Information Retrieval

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Introduction to Pattern Recognition

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Deep Cross-media Knowledge Transfer

Neural Speech Synthesis with Transformer Network

Lip movement Synthesis from Text

John H.L. Hansen & Taufiq Al Babba Hasan

Unsupervised Pretraining for Semantic Parsing

Probability and Time: Markov Models

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Advances in Deep Audio and Audio-Visual Processing

Word embeddings (continued)

Actively Learning Ontology Matching via User Interaction

TensorFlow: A System for Large-Scale Machine Learning

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Course Summary Joseph E. Gonzalez

Neural Modular Networks

Attention for translation

Learn to Comment Mentor: Mahdi M. Kalayeh

Model Compression Joseph E. Gonzalez

Dynamic Neural Networks Joseph E. Gonzalez

Automatic Handwriting Generation

Presented by: Anurag Paul

Welcome to ‘Planning for Media Arts activities for the classroom (F-6)

2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

Bidirectional LSTM-CRF Models for Sequence Tagging

Language Transfer of Audio Word2Vec:

Joint Learning of Correlated Sequence Labeling Tasks

Anirban Laha and Vikas C. Raykar, IBM Research – India.

CS249: Neural Language Model

Presentation transcript:

Natural Language Processing (NLP) Systems Joseph E. Gonzalez Co-director of the RISE Lab jegonzal@cs.berkeley.edu Natural Language Processing (NLP) Systems

Applications? Speech recognition Machine translation Text to speech Predict text from sound Machine translation Predict text from text Text to speech Predict sound from text Sentiment analysis Predict sentiment from text Automated question answering Predict text from (text, images, datasets) Information retrieval Ranking/clustering (text, images, and data) according a query

What makes language challenging? Representation (inputs) Images are easy to represent as tensors Word sequences are more complex Often structured prediction (outputs) Example: predicting complete sentences Requires semantic knowledge and logical reasoning “One morning I shot an elephant in my pajamas. How he got into my pajamas I’ll never know.” Substantial variation across languages Each language historically required different methods

Two NLP Systems Papers Speech Recognition Machine Translation Describe production solutions Present a full systems approach spanning training to inference Present the interaction between systems and model design

Context 2015 2014

Context 2015 2014 Demonstrated the viability of an end-to-end deep learning approach to speech recognition Previously: HMMs with (deep) acoustic models. Wreck a nice beach

DeepSpeech2: Improves the model architecture (8x Larger) Elaborates on the training and deployment process Provides a more complete description and analysis of the system 2014

Big Ideas Demonstrates super human performance from an end-to-end deep learning approach Input spectrograms, output letters (graphemes, bi-graphemes) Presents a single model and system for both English and Mandarin Speech Recognition Previous work required separate approaches Well presented/evaluated ideas: Data collection and preparation System training and inference performance Ablation studies for many of the design decisions

Model Innovations Predict letters (pairs of letters) directly from spectrograms Emerging trend at this point Deeper Models enabled by: Heavy use of batch normalization SortaGrad curriculum learning Train on shorter (easier) inputs first  improves/stabilizes training 2D convolution on spectrogram inputs Striding to reduce runtime

Simple 5-Gram language model Prediction requires beam search to maximize

Model and System Co-Design Minimize bi-directional layers to enable faster inference Introduce row convolutions Bidirectional RNN Placed at top of network

Variations in approach for Mandarin Key Observation: Minimal changes required Did not need to explicitly model language specific pronunciation Output requires extension to 6000 characters + Roman alphabet Shorter beam searches

Data Preparation Used models to align transcription with audio Used models to identify bad transcriptions Data augmentation Adding synthetic noise Constructed human baseline

Systems Innovations in Training Synchronous Distributed SGD for better reproducibility Custom GPU implementation of CTC Loss Combinatorial evaluation Optimized all-reduce with GPUDirect communication Substantial performance gains Custom memory allocators and fallback for long utterances 45% of peak theoretical performance per node.

Innovations during Inference Batch Dispatch: Combines queries when possible to improve throughput Low bit precision (16bit) floating point arithmetic Optimized Small Batch GPU GEMM Kernels Reduced beam search to most likely characters Speeds up Mandarin by 150x P98 latency of 67 milliseconds with 10 streams per server Is this good?

Metrics of Success Word error rate Similar to edit distance measures frequency of substitutions, deletions, and insertions https://awni.github.io/speech-recognition/

Big Results Outperforms humans on several benchmarks Current state of the art on librispeech test-clean is 3.43

Limitations and Future Impact Provided substantial evidence of end-to-end approach May be in use at Baidu? Batching techniques appear in other inference systems Limitations Not clear how to personalize? Substantial computational costs at prediction time Difficult to reproduce results (no code or data released)