Prosody and NLP Seminar by Nikhil: 06005004 Adith: 06005005 Prachur: 06D05011 We have a presentation this Friday ?

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.
Statistical NLP: Lecture 3
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
SYNTAX 1 DAY 30 – NOV 6, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Stemming, tagging and chunking Text analysis short of parsing.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Sound and Speech. The vocal tract Figures from Graddol et al.
1 ENGLISH PHONETICS AND PHONOLOGY Lesson 3A Introduction to Phonetics and Phonology.
STUDY OF ENGLISH STRESS AND INTONATION
The syntax of language How do we form sentences? Processing syntax. Language and the brain.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Arabic TTS (status & problems) O. Al Dakkak & N. Ghneim.
9/8/20151 Natural Language Processing Lecture Notes 1.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
1. Information Conveyed by Speech 2. How Speech Fits in with the Overall Structure of Language TWO TOPICS.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Acoustic Properties of Taiwanese High School Students ’ Stress in English Intonation Advisor: Dr. Raung-Fu Chung Student: Hong-Yao Chen.
Discourse 2 – Multi-speaker interaction LO: to understand key features of conversational analysis and be able to analyse spoken texts Starter: imagine.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Teaching Pronunciation. The articulation of consonants and vowels and the discrimination of minimal pairs had shifted Emphasis on suprasegmental features.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The terminology and concepts of semantics, pragmatics and discourse.
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Pragmatics. Definitions of pragmatics Pragmatics is a branch of general linguistics like other branches that include: Phonetics, Phonology, Morphology,
NATURAL LANGUAGE PROCESSING
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Definition of syllable One or more letters representing a unit ofletters spoken language consisting of a single uninterrupted sound.language A syllable.
INTONATION And IT’S FUNCTIONS
Language and Linguistics An Introduction. Brief Introduction  Language  A human speech;  The ability to communicate;  A system of vocal sounds; 
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Statistical NLP: Lecture 3
SUPRASEGMENTAL PHONEME
Studying Spoken Language Text 17, 18 and 19
ENGLISH PHONETICS AND PHONOLOGY Week 2
Artificial Intelligence 2004 Speech & Natural Language Processing
Deconstructing a text.
Presentation transcript:

Prosody and NLP Seminar by Nikhil: Adith: Prachur: 06D05011 We have a presentation this Friday ?

Abstract Speech Processing and Natural Language Processing share a common area of study: Language. However, over time, they have grown to have little in common regarding theoretical models or methods of analysis. NLP takes written text as the starting point for it's analysis, however, a lot of valuable information is lost in encoding speech as merely text. It is commonly accepted that intonational features of spoken language can greatly aid NLP tasks (like adjective scope resolution). We explore the foundations of the study of Prosody and observe some approaches that use prosodic cues to aid NLP.

Motivation Language is not text driven but speech driven. NLP currently has written text as the starting point for it's analysis (primarily due to the abundance of such data). A lot of information is lost on ignoring spoken features and just looking at the text.

A Way Out ? Utilize spoken features for NLP tasks NLP needs all the help it can get.  Dealing at the pragmatics or discourse level is extremely untenable  Prosodic cues carry useful pragmatic information

What is Prosody, exactly? Comes from Poetry, prosody refers to the study of poetic meters[1] (rhythms)‏ Written text treats words as the basic building blocks of language. Spoken language treats syllables as the basic building block.

Wikipedia has this to say : Prosody is the rhythm, stress, and intonation of connected speech (as opposed to smaller elements like syllables or words). Prosody may reflect :  features of the speaker,  emotional state of a speaker,  features of the utterance,  ironic or sarcastic,  emphasis, contrast, and focus What is Prosody, exactly? (contd.)

Intonation ? Conveys paralinguistic information, emphasis and contrast. Intonation on a particular word could differentiate between sentence moods. –You are finISHED (interrogative)‏ –You are FINIshed (imperative)‏ Image courtesy Google Image Search

Stress is applied on Content Words in spoken utterances. cOntent - Noun. "I really liked their presentation's content.“ contEnt - Verb. "I have done my best. I am content.“ Stress on a pair of words distinguishes between the syntactic role played by each word in the pair. tight rope : A rope that is held taut. tight-rope : A circus-act uses this contraption :)‏ [2] And Stress !

Courtesy tom The Dancing Bug

Prosodic cues Prosodic functions important for linguistics are –Marking of boundaries (syntactic, semantic or dialogue units.)‏ –Relative duration of phonetic segments –At syllable level : Energy, intensity, duration and intonation of syllable. We shall see two approaches of using these features in tasks central to NLP.

Cue Used : Relative duration of phonetic segments Aim : To improve the parsing of ambiguous sentences. Method : Augmenting the syntax grammar with a few non-terminals and rules. Prosody-Augmented Syntax Grammars[3]

Concept of “Word Break Indices” used to show prosodic decoupling between neighboring words. E.g. - Andrea 1 moved 1 the 0 bottle 3 under 0 the 0 bridge. Andrea 1 moved 3 the 0 bottle 1 under 0 the 0 bridge. Break indices were generated by analysing the coda that have a pause. Coda is the final consonant of a word E.g. – cup, milk

Grammar Modification Original grammar rules like S -> NP VP etc. are changed to S -> NP link1 VP. “Link” non-terminals are used for the word- break indices. –For rules like NP ->  we allow rules of the form link -> . –To prevent spurious parses due to the introduction of empty links, we need some constraints which can be easily incorporated

Results The incorporation of prosody resulted in a reduction of about 25% in the number of parses found. Parse times increase about 37%. Extremely common cases of syntactic ambiguity can be resolved with prosodic information, and that grammars can be modified to take advantage of prosodic information for improved parsing

Using Prosodic Features in Language Models[4] The outlined approach uses syllable-based prosodic cues, namely –Duration of the syllable –Average energy (intensity)‏ –The average F0 (fundamental frequency of the syllable) contour –The slope of the F0 contour (visualised as intonation-rising or falling/flat)‏

Recognition of Prosodic Features

Prosody in Language Model We want to measure P(w n | w n-1,w n-2,…,F) Naively modelled by linear interpolation : –Assumption : prosody features independent of previous words (not true!!). P(w n | w n-1,w n-2,…,F) = αP(w n | w n-1,w n-2,…) + (1- α)P(w n |F) We want something better

Factored Language Model Instead of a word W we will deal with a set of word-factors F={f1,f2..fk} (Factors may include the word itself)

Here, F is chosen as {W, prosodic features} The four prosodic features are encoded as binary numbers(s0 to s15). These numbers are assigned to each syllable of the word. For e.g. the prosodic representation for the word “Actually” can be either ‘s10s12s6’ or ‘s10s15s6’.

Conclusion Prosodic Cues can play an important role as a heuristic for many NLP tasks. All is not one way traffic though. POS tagging (since its relatively accurate) is used to aid speech synthesis tasks which conventionally used only prosodic cues[5]

Future Work Handling prosodic information is a first step towards integration of Speech Processing and NLP Courtesy ZITS

References 1.Wikipedia 2.Fromkin, Rodman and Hyams, An Introduction to Language, 7 th Ed, Thomson and Wadsworth 3.John Bear and Patti Price (1990), “Prosody, Syntax and Parsing”, in proceedings of the 28 th annual meeting of the ACL 4.Songfang Huang and Steve Renals (2007), “Using Prosodic Features in Language Models for Meetings”, IRTG annual meeting 5. rts.pdf