Natural Language Understanding

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Natural Language Processing - Speech Processing -
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
COMP 4060 Natural Language Processing Speech Processing.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Dynamic Time Warping Applications and Derivation
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Introduction to Automatic Speech Recognition
1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Graphical models for part of speech tagging
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
1 Computational Linguistics Ling 200 Spring 2006.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Chapter 23: Probabilistic Language Models April 13, 2004.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Performance Comparison of Speaker and Emotion Recognition
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Natural Language Processing (NLP)
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
By Kyle McCardle.  Issues with Natural Language  Basic Components  Syntax  The Earley Parser  Transition Network Parsers  Augmented Transition Networks.
Language Model for Machine Translation Jang, HaYoung.
Speech Recognition
Artificial Intelligence for Speech Recognition
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
N-Gram Model Formulas Word sequences Chain rule of probability
CS4705 Natural Language Processing
EE513 Audio Signals and Systems
Natural Language Processing
CPSC 503 Computational Linguistics
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Natural Language Understanding Raivydas Simenas

Overwiev History Speech Recognition Natural Language Understanding statistical methods to resolve ambiguities Current situation

History Roots in teaching the deaf to speak using “visible speech” 1874: Alexander Bell’s invention of harmonic telegraph Different frequency harmonics from an electrical signal could be separated Could sent multiple messages over the same wire at the same time 1940’s: separating the speech signal into different frequency components using the spectrogram 1950’s: the beginning of computer use for automatic speech recognition

The Nature of Speech Phoneme – a basic sound, e.g. a vowel The complexity of human vocal apparatus: about 18 phonemes per second Speech viewed as a sound wave Identifying sounds: analyzing the sound wave into its frequency components

The Spectrogram I A visual representation of speech which contains all the salient information Plots the amount of energy at different frequencies against time Discontinuous speech (making a pause after each word) – easier to recognize on the spectrogram

The Spectrogram II The same word uttered twice (especially by different speakers – speaker independence) might look radically different on a spectrogram The need to recognize invariant features in a spectrogram Formants: resonant frequencies sustained for a short time period in pronouncing a vowel Normalization: distinguishing between relevant and irrelevant information Nonlinear time compression: taking care of the changing speed of a speech Matching a spoken word to a template

Robust Speech Recognition Need to maintain accuracy when the quality of the input speech is degraded or when the speech characteristics differ due to change in environment or speakers Dynamic parameter adaptation: either alter the input signal or the internally stored representations Optimal parameter estimation: based on a statistical model characterizing the differences between training and test sets Empirical feature comparison: based on comparison between high-quality speech and the same speech recorded under degraded conditions

Stochastic Methods in Speech Recognition Generating the sequence of word hypotheses for an acoustic signal is most often done using statistics The process: A sequence of acoustic signals is represented using a collection of vectors Such collections are used to build acoustic word models, which consist of probabilities of certain sequences of vectors representing a word Acoustic word models utilize Markov chains

Representing Sentences Syntactic form: indicates the way the words are related to each other in a sentence Logical form: identifying the semantic relationships between words based solely on the knowledge of the language (independently of the situation) Final meaning representation: mapping the information from the syntactic and logical form into knowledge representation System uses knowledge representation to represent and reason about its application domain

Parsing a Sentence Parsing – determining the structure of the sentence according to the grammar Tree representation of a sentence Transition network grammars Start with initial node Can traverse an arc only if it is labeled with an appropriate category

Stochastic Methods for Ambiguity Resolution I Some sentences can be parsed many different ways, e.g. time flies like an arrow The most popular method for this is based on statistics Some facts from probability theory The concept of the random variable, e.g. the lexical category of “flies” Probability function assigns probability to every possible value of the random variable, e.g. 0.3 for “flies” being a noun, 0.7 for its being a verb conditional probability functions (Pr(A|B)), e.g. the probability for the occurrence of a verb given the fact that a noun already occurred

Stochastic Methods for Ambiguity Resolution II Probabilities are used to predict future events given some data about the past Maximum likelihood estimator (MLE) Probability of X happening in the future = number of cases of X happening in the past/total number of events in the past Works well only if X occurred often, not very useful for low-frequency events Expected likelihood estimator (ELE) Probability of X happening in the future = f(number of cases of X happening in the past)/Sum(f(number of cases of some event happening in the past)), e.g. if f(Pr(X))=Pr(X)+0.5 and we know that Pr(X)=0.4 and Pr(Y)=0.6, then ELE(X)=(0.4+0.5)/(0.4+0.5+0.6+0.5)=0.45 MLE is a special case of ELE, i.e. for MLE f(Pr(X)=Pr(X) Given a large amount of text, one can use MLE or ELE to determine the lexical category of an ambiguous word, e.g. the word flies

Stochastic Methods for Ambiguity Resolution III Always choosing the interpretation that occurs most frequently in the training set on average obtains 90% success rate (not good) Some of the local context should be used to determine the lexical category of a word Ideally, for a sequence of words w1,w2,…,wn we want a lexical category sequence c1,c2,…,cn which maximizes the probability of right interpretation In practice, approximations of such probabilities are made

Stochastic Methods for Ambiguity Resolution IV n-gram models Look at the probability of a lexical category Ci which follows the sequence of lexical categories Ci-1,Ci-2,…,Ci-n+1 Probability of c1,c2,…,ck occurring is approximately the product of n-gram probabilities for each word, e.g. the probability of a sequence ART, N, V is 0.71*1*0.43=.3053 In practice, bigram or trigram models are used most often The models capturing the concept are called Hidden Markov Models

Stochastic Methods for Ambiguity Resolution V In order to determine the most likely interpretation of a given sequence of n words, we want to maximize the value of The Viterbi algorithm Given k lexical categories, the total number of possibilities to consider for a sequence of n words is kn The Viterbi algorithm reduces this number to const*n*k2

Logical Form Although interpreting sentence often requires the knowledge of the context, some interpretation can be done independently of it basic semantic properties of a word, its different senses etc. Ontology each word has 1 or more senses in which it can be used, e.g. go has about 40 senses the different senses of all the words of a natural language are organized into classes of objects, such as events, actions etc. the set of such classes is called an ontology Logical form of an utterance can be viewed as a function that maps current discourse situation into a new one resulting from the occurrence of the utterance

Current Situation Inexpensive software for speech recognition The issues: large vocabulary, continuous speech and speaker independence Automated speech recognition for restricted domains The speed of serial processes in a computer vs. the number of parallel processes in human brain

References Survey of the State of the Art in Human Language Technology, edited by Ronald A. Cole, 1996 James Allen. Natural Language Understanding, 1995 Raymond Kurzweil. When will HAL understand what we are saying? Computer Speech Recognition and Understanding. Taken from HAL’s Legacy, 1996