Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg

Lecture 1 - Overview Applications –speech recognition –speech synthesis –other applications: indexing, language id, etc. Information in speech –words –speaker identity –speaker state –discourse acts 1

Lecture 2 – From Sounds to Language Differences between orthography and sounds Phonetic symbol sets –e.g. IPA, ARPAbet. Vocal organs –articulators Classes of sounds Coarticulation 2

Lecture 3 – Spoken Dialog Systems Maxims of Conversational Implicature Dialog System Architecture –Speech Recognition –Dialog Management –Response Generation –Speech Synthesis Dialog Strategies 3

Lecture 4 – Acoustics of Speech Phone Recognition Prosody Speech Waveforms Analog to Digital Conversion Nyquist Rate Pitch Doubling and Halving 4

Lecture 5 – Speech Recognition Overview History of Speech Recognition –Rule based recognition –Dynamic Time Warping –Statistical Modeling What are qualities that make speech recognition difficult? Noisy Channel Model Training and Test Corpora Word Error Rate 5

Lecture 6 – Fast Fourier Transform Multiplying Polynomials Divide-and-Conquer for multiplying polynomials. Relationship between multiplying polynomials and cosine transform Complex roots at unity 6

Lecture 7 - MFCC What is the MFCC used for? Overlapping Windows Mel Frequency Spectrogram 7

Lecture 8 – Statistical Modeling Probabilities –Bayes Rule –Bayesians vs. Frequentists Maximum Likelihood Estimation Multinomial Distribution –Bernoulli Distribution Gaussian Distribution –Multidimensional Gaussian Difference between Classification, Clustering, Regression Black Swans and the Long Tail 8

Lecture 9 – Acoustic Modeling What does an Acoustic Model do? Gaussian Mixture Model Potential Problems –Inconsistent Numbers of Gaussians –Singularities Training Acoustic Models. 9

Lecture 10 – Hidden Markov Model The Markov Assumption Difference between states and observations Finite State Automata Decoding using Viterbi Forced Alignment Flat Start Silence 10

Lecture 11 - Pronunciation Modeling Dictionary Finite State Automata Use in speech recognition Using morphology for pronunciation modeling Grapheme to Phoneme Conversion –Letter to Sound rules Machine Learning for G-to-P 11

Lecture 12 – Language Modeling Using a Context Free Grammar to define a set of recognized sequences of words. –Terminals, non-terminals, start state N-Gram models –Mathematical underpinnings –Theoretical background How a “word” is defined. Learning n-gram statistics Terminology 12

Next Class Midterm Exam Reading: J&M Chapter 4 13

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

Similar presentations

Presentation on theme: "Midterm Review Spoken Language Processing Prof. Andrew Rosenberg."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.

Similar presentations

Presentation on theme: "Midterm Review Spoken Language Processing Prof. Andrew Rosenberg."— Presentation transcript:

Similar presentations

About project

Feedback