Gist of Achieving Human Parity in Conversational Speech Recognition

Slides:



Advertisements
Similar presentations
SQ3R: A Reading Technique
Advertisements

Lecture # 2 : Process Models
Writing an original research paper Part one: Important considerations
Writing a Research Paper
Essays CSCI102 - Systems ITCS905 - Systems MCS Systems.
Learning to Explain: Writing & Peer Review Techniques Laurie Burton Western Oregon University MAA PREP Active Learning Workshop July 9, 2003 Wednesday:
Writing tips Based on Michael Kremer’s “Checklist”,
TERM PROJECT The Project usually consists of the following: Title
Left click or use the forward arrows to advance through the PowerPoint Upon clicking, each section of the article will be highlighted one by one Read.
ME 195 A How to Write a Professional Technical Report Dr. Raghu Agarwal ME 195A Presentation: How to Write a Professional Technical Report 1.
How To Write a Lab Report
EE LECTURE 4 REPORT STRUCTURE AND COMPONENTS Electrical Engineering Dept King Saud University.
Scientific Paper. Elements Title, Abstract, Introduction, Methods and Materials, Results, Discussion, Literature Cited Title, Abstract, Introduction,
Christoph F. Eick: ML Project Post-Analysis 1 Project2 Post Analysis —General Things Reviewing is about voicing your opinion about the paper! Reviews.
Intensive Course in Research Writing Barbara Gastel, MD, MPH Texas A&M University Summer 2011.
A Roadmap towards Machine Intelligence
PSY 219 – Academic Writing in Psychology Fall Çağ University Faculty of Arts and Sciences Department of Psychology Inst. Nilay Avcı Week 9.
Research Methodology II Term review. Theoretical framework  What is meant by a theory? It is a set of interrelated constructs, definitions and propositions.
Valerie Mann, Author of Getting Your Share of the Pie- The Complete Guide to Finding Grants and President of Mann and Mann Grant Solutions, Fruitland,
Research Introduction to the concept of incorporating sources into your own work.
3 Simple Steps to Reading Scholarly Articles
How to get a paper published in IEEE
Primary vs. Secondary Sources
Writing a Critical Summary of an Article or Paper
AVID Ms. Richardson.
Writing Scientific Research Paper
Review of Related Literature
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
READING 35 Minutes; 40 Questions; 4 Passages
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Seminar Report Structure ARRANGEMENT OF CONTENTS
Experimental Psychology
Outline Goals: Searching scientific journal articles
Parts of an Academic Paper
Assignment: What you need to do
Writing for Academic Journals
Title of your science project
Are End-to-end Systems the Ultimate Solutions for NLP?
Building a Manuscript. Building a Manuscript Why Publish? **Disseminate New knowledge** Get feedback on work Advancement (academia) Bragging rights?
What can we learn from careful reading of an image?
What can we learn from careful reading of an image?
Writing the Results Section
What to write and how to write it!
Objective of This Course
Deep Learning Hierarchical Representations for Image Steganalysis
ECE 599/692 – Deep Learning Lecture 1 - Introduction
Effective Research and Integration Techniques
What can we learn from careful reading of an image?
Writing Academic Papers In English Language Journals
If you had to describe the article to someone else, how would you describe it? What is it about? What was something specific you learned from the article.
Today’s goals Discuss the expectations and uses of academic conversations Compare ideas and arguments between sources Begin to draft possible theses.
Effective Presentation
Writing Careful Long Reports
Parts of an Essay Ms. Ruttgaizer.
ภาควิชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์
A Template for Producing IT Research and Publication
Zhedong Zheng, Liang Zheng and Yi Yang
Parts of an Essay.
Advances in Deep Audio and Audio-Visual Processing
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
What can we learn from careful reading of an image?
Pick up a format sheet and sit with your lab groups
Automatic Handwriting Generation
Neural Machine Translation using CNN
COMP564 paper review <title & authors of the paper>
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Towards Formula SAE Driver/Vehicle Optimization
STEPS Site Report.
Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen
Presentation transcript:

Gist of Achieving Human Parity in Conversational Speech Recognition

Suggested Major Project Topic Areas Image recognition Category of image Handwriting recognition Object identification Image to text Reinforcement learning Unsupervised learning Generative adversarial networks Autoencoders Semi-supervised learning Transfer learning Other Speech Speech recognition Speech synthesis Other synthesis (music, painting, text) Natural language Machine translation Word embeddings Summarization Text understanding Information retrieval Other LSTM and gated network applications Health and Medicine Preference prediction Sequence prediction Time series Stocks Very deep networks Highway networks Residual learning Interpretability and Human-supplied knowledge and control Each team should post their tentative topic choice and begin posting gists. If you are undecided, post gists on multiple topics. Each person should post at least 4 gists. You can post as many as you want. There will be a running competition for the best gists. You may rate and comment on any posted gist.

Gist Format Quick gist of the paper: What is the significant result? How major? What is the premise? What are the main prior work? What are the new methodologies? What techniques are assumed known?

They are proud of this work They are proud of this work. Most of the abstract is spent bragging about the result.

Repeats the claim in the title.

Announces that what follows is important.

Things to look for in the paper: Convolutional and LSTM networks Existing techniques Things to look for in the paper: Convolutional and LSTM networks

Things to look for in the paper: Convolutional and LSTM networks What’s new! Things to look for in the paper: Convolutional and LSTM networks

Things to look for in the paper: Convolutional and LSTM networks What’s new! The novel technique Things to look for in the paper: Convolutional and LSTM networks Novel spatial smoothing

Things to look for in the paper: Convolutional and LSTM networks What’s new! Something else that may be non-standard. Things to look for in the paper: Convolutional and LSTM networks Novel spatial smoothing Lattice-free MMI acoustic training

Of course, also look for the results! Why did they emphasize that it was systematic use? Things to look for in the paper: Convolutional and LSTM networks Novel spatial smoothing Lattice-free MMI acoustic training Why “systematic” use? The results (compare to human)

Modest about Their Method Best practices: You could should do this, too!

This is explaining the emphasis on “systematic” use. While they are proud of their results, they are being modest about how they achieved them. They are attributing the results mainly to careful engineering rather than to the novel techniques that they have added to the old stand-bys. Yes, every course in deep learning should cover CNNs and RNNs.

FYI: More complete history, not necessary for gist

What’s new?

CNNs Notice that they only reference, but do not describe the prior work. Can you tell which of these references have Microsoft authors? You will need to read these references to understand the techniques used in this paper, and this is just to understand the CNNs. This gist should list the full title of each cited reference as required prior-work reading.

Final CNN Variant: LACE More prior-work references. Also, ResNet is an implicit prior-work reference.

At least the LACE architecture itself is shown in detail.

LSTMs Although LSTMs are only “a close second”, they are used in combination with the convolutional networks, so you need to know how to implement both. More prior-work references.

Spatial Smoothing The part that is new. There are no prior-work references, but there is quite a bit of jargon.

Speaker Adaptive Modeling The abstract didn’t even mention speaker-adaptive modeling, but you’ll need to know how to implement it. More prior-work references. Do you understand why CNN models are treated differently from the LSTM models with regard to the appended i-vector?

Lattice-Free Sequence Training This is only the initial paragraph. There are several more paragraphs describing “our implementation”. However, this clip shows the prior-work references. This section is a mix of a new implementation and prior work. They did not call it “novel” as they did the spatial smoothing.

LM Rescoring and System Combination This is how they combine the acoustic analysis with the language models. In addition to the explicit references, you will need look at prior references for RNN LMs and LSTM LMs, but we will not look at the details in the next two sections.

Results There are many tables of results, showing tests of individual components and various combinations. There are also comparisons with prior work. This table is a sufficient representative for the gist. It shows results for various versions of the Microsoft system and compares the final system with human performance.

Summary of Gist A “break-through” paper announcing the first result exceeding human performance on a well-known, heavily researched benchmark. The result was mainly achieved by “careful engineering and optimization” based on accumulated prior art. If you have already implemented all of the prior art, there is a relatively small amount of new things that you need to implement. However, even then there will be a lot of tuning and optimizing. If you are starting from scratch, there is a very large amount of prior work that you will need to implement as a prerequisite. This is an important paper on a major piece of work. If you want to understand state-of-the-art speech recognition very thoroughly, reading this paper and its references would be a good start. There are many sections in the paper that were skipped in this summary, such descriptions of the testing of human performance and the implementation on Microsoft’s CNTK framework. Conclusion: Worth reading this paper if you want to be up-to-date in speech recognition. Would be extremely ambitious as a student team project.