Natural Language Processing (NLP) Systems Joseph E. Gonzalez

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Information Retrieval in Practice
1 Texmex – November 15 th, 2005 Strategy for the future Global goal “Understand” (= structure…) TV and other MM documents Prepare these documents for applications.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Victorian Curriculum F–10 Online professional learning session Unpacking Digital Technologies Paula Christophersen Digital Technologies, Curriculum Manager.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
Combining Neural Networks and Context-Driven Search for On- Line, Printed Handwriting Recognition in the Newton Larry S. Yaeger, Brandn J. Web, and Richard.
A NONPARAMETRIC BAYESIAN APPROACH FOR
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Chapter 3 Data Representation
Attention Model in NLP Jichuan ZENG.
Olivier Siohan David Rybach
Neural Machine Translation
TensorFlow– A system for large-scale machine learning
SUNY Korea BioData Mining Lab - Journal Review
Information Retrieval in Practice
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Benchmarking Deep Learning Inference
WAVENET: A GENERATIVE MODEL FOR RAW AUDIO
Compact Bilinear Pooling
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Yannis Flet-Berliac Tengyu Zhou Maciej Korzepa Gandalf Saxe
THIS IS TO EVIDENCE YOUR WORK AND GET THE BEST GRADE POSSIBLE
Recurrent Neural Networks for Natural Language Processing
R SE to the challenges of ntelligent systems
dawn.cs.stanford.edu/benchmark
Intelligent Information System Lab
Efficient Estimation of Word Representation in Vector Space
Multimedia Information Retrieval
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Introduction to Pattern Recognition
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Deep Cross-media Knowledge Transfer
Neural Speech Synthesis with Transformer Network
Lip movement Synthesis from Text
John H.L. Hansen & Taufiq Al Babba Hasan
Unsupervised Pretraining for Semantic Parsing
Probability and Time: Markov Models
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Advances in Deep Audio and Audio-Visual Processing
Word embeddings (continued)
Actively Learning Ontology Matching via User Interaction
TensorFlow: A System for Large-Scale Machine Learning
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Course Summary Joseph E. Gonzalez
Neural Modular Networks
Attention for translation
Learn to Comment Mentor: Mahdi M. Kalayeh
Model Compression Joseph E. Gonzalez
Dynamic Neural Networks Joseph E. Gonzalez
Automatic Handwriting Generation
Presented by: Anurag Paul
Welcome to ‘Planning for Media Arts activities for the classroom (F-6)
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
Bidirectional LSTM-CRF Models for Sequence Tagging
Language Transfer of Audio Word2Vec:
Joint Learning of Correlated Sequence Labeling Tasks
Anirban Laha and Vikas C. Raykar, IBM Research – India.
CS249: Neural Language Model
Presentation transcript:

Natural Language Processing (NLP) Systems Joseph E. Gonzalez Co-director of the RISE Lab jegonzal@cs.berkeley.edu Natural Language Processing (NLP) Systems

Applications? Speech recognition Machine translation Text to speech Predict text from sound Machine translation Predict text from text Text to speech Predict sound from text Sentiment analysis Predict sentiment from text Automated question answering Predict text from (text, images, datasets) Information retrieval Ranking/clustering (text, images, and data) according a query

What makes language challenging? Representation (inputs) Images are easy to represent as tensors Word sequences are more complex Often structured prediction (outputs) Example: predicting complete sentences Requires semantic knowledge and logical reasoning “One morning I shot an elephant in my pajamas. How he got into my pajamas I’ll never know.” Substantial variation across languages Each language historically required different methods

Two NLP Systems Papers Speech Recognition Machine Translation Describe production solutions Present a full systems approach spanning training to inference Present the interaction between systems and model design

Context 2015 2014

Context 2015 2014 Demonstrated the viability of an end-to-end deep learning approach to speech recognition Previously: HMMs with (deep) acoustic models. Wreck a nice beach

DeepSpeech2: Improves the model architecture (8x Larger) Elaborates on the training and deployment process Provides a more complete description and analysis of the system 2014

Big Ideas Demonstrates super human performance from an end-to-end deep learning approach Input spectrograms, output letters (graphemes, bi-graphemes) Presents a single model and system for both English and Mandarin Speech Recognition Previous work required separate approaches Well presented/evaluated ideas: Data collection and preparation System training and inference performance Ablation studies for many of the design decisions

Model Innovations Predict letters (pairs of letters) directly from spectrograms Emerging trend at this point Deeper Models enabled by: Heavy use of batch normalization SortaGrad curriculum learning Train on shorter (easier) inputs first  improves/stabilizes training 2D convolution on spectrogram inputs Striding to reduce runtime

Simple 5-Gram language model Prediction requires beam search to maximize

Model and System Co-Design Minimize bi-directional layers to enable faster inference Introduce row convolutions Bidirectional RNN Placed at top of network

Variations in approach for Mandarin Key Observation: Minimal changes required Did not need to explicitly model language specific pronunciation Output requires extension to 6000 characters + Roman alphabet Shorter beam searches

Data Preparation Used models to align transcription with audio Used models to identify bad transcriptions Data augmentation Adding synthetic noise Constructed human baseline

Systems Innovations in Training Synchronous Distributed SGD for better reproducibility Custom GPU implementation of CTC Loss Combinatorial evaluation Optimized all-reduce with GPUDirect communication Substantial performance gains Custom memory allocators and fallback for long utterances 45% of peak theoretical performance per node.

Innovations during Inference Batch Dispatch: Combines queries when possible to improve throughput Low bit precision (16bit) floating point arithmetic Optimized Small Batch GPU GEMM Kernels Reduced beam search to most likely characters Speeds up Mandarin by 150x P98 latency of 67 milliseconds with 10 streams per server Is this good?

Metrics of Success Word error rate Similar to edit distance measures frequency of substitutions, deletions, and insertions https://awni.github.io/speech-recognition/

Big Results Outperforms humans on several benchmarks Current state of the art on librispeech test-clean is 3.43

Limitations and Future Impact Provided substantial evidence of end-to-end approach May be in use at Baidu? Batching techniques appear in other inference systems Limitations Not clear how to personalize? Substantial computational costs at prediction time Difficult to reproduce results (no code or data released)