Speech and Natural Language Processing Christel Kemke Department of Computer Science University of Manitobe Presentation for 74.402 Human-Computer Interaction.

Slides:



Advertisements
Similar presentations
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Advertisements

Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
PARSING WITH CONTEXT-FREE GRAMMARS
Statistical NLP: Lecture 3
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language Processing - Speech Processing -
Natural Language Processing - Feature Structures - Feature Structures and Unification.
Natural Language Processing Christel Kemke Department of Computer Science University of Manitoba Natural Language Processing, 1st term 2004/5.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Artificial Intelligence Speech and Natural Language Processing.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Artificial Intelligence 2005/06 From Syntax to Semantics.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
COMP 4060 Natural Language Processing Speech Processing.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CS 4705 Basic Parsing with Context-Free Grammars.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Natural Language Understanding
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
Introduction to CL & NLP CMSC April 1, 2003.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Linguistic Essentials
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Artificial Intelligence 2004
Computerlinguistik II / Sprachtechnologie Vorlesung im SS 2010 (M-GSW-10) Prof. Dr. Udo Hahn Lehrstuhl für Computerlinguistik Institut für Germanistische.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
Natural Language Processing (NLP)
NATURAL LANGUAGE PROCESSING
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Natural Language - General
Linguistic Essentials
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Speech and Natural Language Processing Christel Kemke Department of Computer Science University of Manitobe Presentation for Human-Computer Interaction II March 2004

Evolution of Human Language communication for "work" social interaction basis of cognition and thinking (Whorff & Saphir)

Communication "Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs." [Russell & Norvig, p.651]

Natural Language - General Natural Language is characterized by  a common or shared set of signs alphabeth; lexicon  a systematic procedure to produce combinations of signs syntax  a shared meaning of signs and combinations of signs (constructive) semantics

Speech and Natural Language Processing Communication Natural Language Syntax Semantics Pragmatics Speech Presentation for Human-Computer Interaction II

Speech and Natural Language  Speech Recognition  acoustic signal as input  conversion into phonemes and written words  Natural Language Processing  written text as input; sentences (or 'utterances')  syntactic analysis: parsing; grammar  semantic analysis: "meaning", semantic representation  pragmatics;  dialogue; discourse  Spoken Language Processing  transcribed utterances  Phenomena of spontaneous speech

Phoneme Recognition: HMM, Neural Networks Phonemes Acoustic / sound wave Filtering, FFT; Spectral Analysis Frequency Spectrum Features (Phonemes; Context) Grammar or Statistics Phoneme Sequences / Words Grammar or Statistics for likely word sequences Word Sequence / Sentence Speech Recognition Signal Processing / Analysis

Areas in Natural Language Processing  Morphology (word stem + ending)  Syntax, Grammar & Parsing (syntactic description & analysis)  Semantics & Pragmatics (meaning; constructive; context-dependent; references; ambiguity)  Pragmatic Theory of Language; Intentions; Metaphor (Communication as Action)  Discourse / Dialogue / Text  Spoken Language Understanding  Language Learning

Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det Noun NP recognized NP Det Noun parse tree Linguistic Background Knowledge NLP Syntax Analysis - Processes

Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser NLP - Syntactic Analysis eat + s eat – verb Verb VP → Verb Noun VP recognized 3rd sing VP Verb Noun parse tree

Morphology A morphological analyzer determines (at least)  the stem + ending of a word, and usually delivers related information, like  the word class,  the number,  the person and  the case of the word. The morphology can be part of the lexicon or implemented as a single component, for example as a rule-basedsystem. eats  eat + s verb, singular, 3rd pers dog  dog noun, singular

Lexicon The Lexicon contains information on words, as  inflected forms (e.g. goes, eats) or  word-stems (e.g. go, eat). The Lexicon usually assigns a syntactic category,  the word class or Part-of-Speech category Sometimes also  further syntactic information (see Morphology);  semantic information (e.g. agent);  syntactic-semantic information (e.g. verb complements like: 'give' requires a direct object).

Lexicon Example contents: eats  verb; singular, 3 rd person (-s); can have direct object (verb subcategorization) dog  dog, noun, singular; animal (semantic annotation)

POS (Part-of-Speech) Tagging POS Tagging determines the word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems. Thedet (determiner) dog noun eatsverb (3rd person; singular) the det bone noun

Open Word Class: Nouns Nouns denote objects, concepts, … Proper Nouns Names for specific individual objects, entities e.g. the Eiffel Tower, Dr. Kemke Common Nouns Names for categories or classes or abstracts e.g. fruit, banana, table, freedom, sleep,... Count Nouns enumerable entities, e.g. two bananas Mass Nouns not countable items, e.g. water, salt, freedom

Open Word Class: Verbs Verbs denote actions, processes, states e.g. smoke, dream, rest, run several morphological forms e.g. non-3rd person-eat 3rd person-eats progressive/-eating present participle/ gerundive past participle-eaten Auxiliaries, e.g. be, as sub-class of verbs

Open Word Class: Adjectives Adjectives denote qualities or properties of objects, e.g. heavy, blue, content most languages have concepts for colour- white, green,... age- young, old,... value- good, bad,... not all languages have adjectives as separate class

Open Word Class: Adverbs Adverbs denote modifications of actions (verbs), qualities (adjectives) e.g. walk slowly, heavily drunk Directional or Locational Adverbs Specify direction or location e.g. go home, stay here Degree Adverbs Specify extent of process, action, property e.g. extremely slow, very modest

Open Word Class: Adverbs 2 Manner Adverbs Specify manner of action or process e.g. walk slowly, run fast Temporal Adverbs Specify time of event or action e.g. yesterday, Monday

Closed Word Classes prepositions: on, under, over, at, from, to, with,... determiners: a, an, the,... pronouns: he, she, it, his, her, who, I,... conjunctions: and, or, as, if, when,... auxiliary verbs: can, may, should, are particles: up, down, on, off, in, out, numerals: one, two, three,..., first, second,...

Language and Grammar Natural Language described as Formal Language L using a Formal Grammar G: start-symbol S ≡ sentence non-terminals NT ≡ syntactic constituents terminals T ≡ lexical entries/ words production rules P ≡ grammar rules Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules.

Grammar Here, POS Tags are included in the grammar rules. det  the noun  dog | bone verb  eat NP  det noun(NP  noun phrase) VP  verb(VP  verb phrase) VP  verb NP S  NP VP(S  sentence) Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence).

Parsing  derive the syntactic structure of a sentence based on a language model (grammar)  construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite system)

Parsing (here: bottom-up) determine the syntactic structure of the sentence the  det dog  noun det noun  NP eats  verb the  det bone  noun det noun  NP verb NP  VP NP VP  S

Sample Grammar Grammar (S, NT, T, P)- NT Non-Terminal; T Terminals; P Productions Sentence Symbol S  NT Word-Classes / Part-of-Speech  NT syntactic Constituents  NT terminal words  NT Grammar Rules P  NT  (NT  T)* S → NP VP| Aux NP VP NP → Det Nominal | Proper-Noun Nominal → Noun | Nominal PP VP → Verb | Verb NP | Verb PP | Verb NP PP PP → Prep NP Det → that | this | a Noun → book | flight | meal | money Proper-Noun → Houston | American Airlines | TWA Verb → book | include | prefer Prep → from | to | on Auc → do | does

Parse "Does this flight include a meal?" S Aux NP VP Det Nominal Verb NP Noun Det Nominal does this flight include a meal Sample Parse Tree

Bottom-up – from word-nodes to sentence-symbol Top-down Parsing – from sentence-symbol to words S Aux NP VP Det Nominal Verb NP Noun Det Nominal does this flight include a meal Bottom-up vs. Top-Down Parsing

Problems - Ambiguity / Binding Ambiguity “One morning, I shot an elephant in my pajamas. How he got into my pajamas, I don’t know.” Groucho Marx syntactical or structural ambiguity – several parse trees above sentence semantic or lexical ambiguity – several word meanings bank (where you get money) and (river) bank even different word categories possible (interim) “He books the flight.” vs. “The books are here.“ “Fruit flies from the balcony” vs. “Fruit flies are on the balcony.”

Lexical Ambiguity Several word senses or word categories e.g. chase – noun or verb e.g. plant - ????

Syntactic Ambiguity Several parse trees e.g. “The dog eats the bone in the park.” e.g. “The dog eats the bone in the package.” Who/what is in the park and who/what is in the package? Syntactically speaking: How do I bind the Prepositional Phrase "in the... " ?

Problems in Parsing Problems with left-recursive rules like NP → NP PP: don’t know how many times recursion is needed. Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid. Combine top-down and bottom-up approach: Start with sentence; use rules top-down (look- ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right). → Chart-Parsing / Earley-Parser

Chart-Parsing / Early Algorithm Essence:  Integrate top-down and bottom-up parsing.  Keep recognized sub-structures (sub-trees) for shared use during parsing. Top-down Prediction: Start with S-symbol. Generate all applicable rules for S. Go further down with left-most constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is a word category (POS). Bottom-up Completion: Read input word and compare. If word matches, mark as recognized and continue the recognition bottom-up, trying to complete active rules.

Earley Algorithm - Functions predictor generates new rules for partly recognized RHS with constituent right of (top-down generation); indicates how far a rule has been recognized scanner if word category (POS) is found right of the, the Scanner reads the next input word and adds a rule for it to the chart (bottom-up mode) completer if rule is completely recognized (the is far right), the recognition state of earlier rules in the chart advances: the is moved over the recognized constituent (bottom-up recognition).

Chart VP  V NP. V Book this flight S  VP. NP  Det Nom. Det Nom  Noun. Noun

Semantics

Semantic Representation Representation of the meaning of a sentence. Generate a logic-based representation or a frame-based representation based on the syntactic structure, lexical entries, and particularly the head-verb (determines how to arrange parts of the sentence in the semantic representation).

Semantic Representation Verb-centered Representation Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory) Typing of case roles possible (e.g. 'agent' refers to a specific sort or concept)

General Frame for eat Agent: animate Action: eat Patiens: food Manner: {e.g. fast} Location: {e.g. in the yard} Time: {e.g. at noon}

Example-Frame with Fillers Agent: the dog Action: eat Patiens: the bone / the bone in the package Location: in the park

General Frame for driveFrame with fillers Agent: animateAgent: she Action: driveAction: drives Patiens: vehiclePatiens: the convertible Manner:{the way it is done}Manner: fast Location: Location-specLocation: [in the] Rocky Mountains Source:Location-specSource:[from] home Destination: Location-specDestination: [to the] ASIC conference Time: Time-specTime: [in the] summer holidays

Representation in Logic Action: eat Agent: the dog Patiens: the bone / the bone in the package Location: in the park predicate constants eat (dog-1, bone-1, park-1)

Representation in Logic variables eat (dog-1, bone-1, park-1) eat ( x, y, z ) animate-being (x) food (y) location (z) NP-1 (x) NP-2 (y) PP (z) eat ( NP-1, NP-2, PP ) general syntactic lexical syntactic framesemantic frame

Pragmatics

Pragmatics includes context-related aspects of NL expressions (utterances). These are in particular anaphoric references, elliptic expressions, deictic expressions, … anaphoric references – refer to items mentioned before deictic expressions – simulate pointing gestures elliptic expressions – incomplete expression; relate to item mentioned before

Pragmatics “I put the box on the top shelve.” “I know that. But I can’t find it there.” elliptic expression deictic expression anaphoric reference “The candy-box?”

Intentions One philosophical assumption is that natural language is used to achieve things or situations: “Do things with words.” The meaning of an utterance is essentially determined by the intention of the speaker.

Intentionality - Examples What was said:What was meant: “There is a terrible "Can you please draft here.”close the window." “How does it look "I am really mad; here?”clean up your room." "Will this ever end?""I would prefer to be with my friends than to sit in class now."

Metaphors The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings. Metaphors transfer concepts and relations from one area of discourse into another area, for example, seeing time as line (in space) or seing friendship or life as a journey.

Metaphors - Examples “This car eats a lot of gas.” “She devoured the book.” “He was tied up with his clients.” “Marriage is like a journey.” “Their marriage was a one-way road into hell.” (see also George Lakoff, Women, Fire and Dangerous Things)

Dialogue and Discourse

Discourse / Dialogue Structure Grammar for various sentence types (speech acts): dialogue, discourse, story grammar Distinguish questions, commands, and statements:  Where is the remote-control?  Bring the remote-control!  The remote-control is on the brown table. Dialogue Grammars describe possible sequences of Speech Acts in communication, e.g. that a question is followed by an answer/statement.

Speech

Speech Production & Reception Sound and Hearing change in air pressure  sound wave reception through inner ear membrane / microphone break-up into frequency components: receptors in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT)  Frequency Spectrum perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)

Speech Recognition Phases Speech Recognition acoustic signal as input signal analysis - spectrogram feature extraction phoneme recognition word recognition conversion into written words

Speech Signal Speech Signal composed of  harmonic signal (sinus waves) with different frequencies and amplitudes  frequency - waves/second  like pitch  amplitude - height of wave  like loudness  non-harmonic signal (noise, not sinus wave)

Video of glottis and speech signal in lingWAVES (from

Speech Signal Analysis Analog-Digital Conversion of Acoustic Signal Sampling in Time Frames ( “ windows ” )  frequency = 0-crossings per time frame  e.g. 2 crossings/second is 1 Hz (1 wave)  e.g. 10kHz needs sampling rate 20kHz  measure amplitudes of signal in time frame  digitized wave form  separate different frequency components  FFT (Fast Fourier Transform)  spectrogram  other frequency based representations  LPC (linear predictive coding),  Cepstrum

Waveform (fig. 7.20) Time Amplitude/ Pressure "She just had a baby."

Waveform for Vowel ae (fig. 7.21) Time Amplitude/ Pressure Time

Waveform and Spectrogram (figs. 7.20, 7.23)

Waveform and LPC Spectrum for Vowel ae (figs. 7.21, 7.22) Energy Frequency Formants Amplitude/ Pressure Time

Phoneme Recognition Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general

Speech Signal Characteristics From Signal Representation derive, e.g.  formants - dark stripes in spectrum strong frequency components; characterize particular vowels; gender of speaker  pitch – fundamental frequency baseline for higher frequency harmonics like formants; gender characteristic  change in frequency distribution characteristic for e.g. plosives (form of articulation)

Pronunciation Networks / Word Models as Probabilistic FAs

Word Recognition with Probabilistic FA / Markov Chain

Viterbi-Algorithm - Overview (cf. Jurafsky Ch.5) The Viterbi Algorithm finds an optimal sequence of states in continuous Speech Recognition, given an observation sequence of phones and a probabilistic (weighted) FA (state graph). The algorithm returns the path through the automaton which has maximum probability and accepts the observation sequence. a[s,s'] is the transition probability (in the phonetic word model) from current state s to next state s', and b[s',o t ] is the observation likelihood of s' given o t. b[s',o t ] is 1 if the observation symbol matches the state, and 0 otherwise.

Speech Recognizer Architecture

Speech Processing Types and Characteristics  Speech Recognition vs. Speaker Identification (Voice Recognition)  speaker-dependent vs. speaker-independent  training  unlimited vs. large vs. small vocabulary  single word vs. continuous speech

Speech and NLP

References Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000 Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001 Kemke, C., Natural Language and Speech Processing - Course Notes, 2nd Term 2004, Dept. of Computer Science, U. of Manitoba Kemke, C., Artificial Intelligence - Course Notes, 1st Term 2004, Dept. of Computer Science, U. of Manitoba

Figures Figures taken from: Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000, Chapters 5 and 7. lingWAVES (from