Focus Contrast in Web Harvested Data Mats Rooth Linguistics and CIS Cornell University based on joint research with Jonathan Howell.

Slides:



Advertisements
Similar presentations
Information structuring in English dialogue class 4
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Function words are often reduced or even deleted in casual conversation (Fig. 1). Pairs may neutralize: he’s/he was, we’re/we were What sources of information.
The Meaning of Language
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
General Problems  Foreign language speakers of a target language cause a great difficulty to native speakers because the sounds they produce seems very.
Ways of classifying varieties of English Style, register, genre, …
Albert Gatt LIN1180 – Semantics Lecture 10. Part 1 (from last week) Theories of presupposition: the semantics- pragmatics interface.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Information Retrieval in Practice
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Phonetics Study of the sounds of Speech Articulatory Acoustic Experimental.
A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University.
Intonation September 18, 2014 The Plan for Today Also: I have posted a couple of readings on TOBI (an intonation transcription system) to the course.
Overview of Search Engines
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Phonetics and Phonology
Essay 1 Strategy Where to start, how to get organized and how to write your paper.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
1 Computational Linguistics Ling 200 Spring 2006.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Structure in DPs The Syntax/Information Structure Interface in DPs: Internal Syntactic Properties and their Informational Correlates Anja Kleemann,
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Linguistics The first week. Chapter 1 Introduction 1.1 Linguistics.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
LECTURE 2: SEMANTICS IN LINGUISTICS
Introduction to Computational Linguistics
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
National Taiwan University, Taiwan
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
Unit 2 The Nature of Learner Language 1. Errors and errors analysis 2. Developmental patterns 3. Variability in learner language.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Pragmatics. Definitions of pragmatics Pragmatics is a branch of general linguistics like other branches that include: Phonetics, Phonology, Morphology,
Powerpoint Templates Page 1 Presupposition By Rina Husnaini Febriyanti.
Phonetics, part III: Suprasegmentals October 18, 2010.
What does the speaker mean when s/he utters a sentence? Berg (1993): “What we understand from an utterance could never be just the literal meaning of the.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Topic The common errors in usage of written cohesive devices among secondary school Malaysian learners of English of intermediate proficiency.
Information Retrieval in Practice
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Meanings of Intonational Contours
Communicative competence
Discourse & Dialogue CMSC October 28, 2004
Traditional Grammar VS. Generative Grammar
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Sentence-Utterance-Proposition
Presentation transcript:

Focus Contrast in Web Harvested Data Mats Rooth Linguistics and CIS Cornell University based on joint research with Jonathan Howell

Radio sites Hundreds use Everyzing/Ramp technology Full ASR transcripts often available Time offset sometimes available Either URL of audio or RSS feed almost always available Not not enough hits for one target on a single site A lot or repetitions of same audio Seemingly less “spontaneous” speech than on Everyzing

Youtube Searchable closed captions, some obtained with ASR and some provided by video author Time offset available on hit page and in URL Youtube player can seek to a time Transcript of snippet available Full transcript not available Not enough data now Can hope that a lot of indexed spontaneous speech will become available

Reuters Insider Searchable audio based on Everyzing/Ramp Full transcripts available Player seeks to timestamp

Goals Assemble large, focused datasets of examples where intonation varies in a way that correlates with syntax, semantics, or pragmatics. Study correlation between lexical/grammatical/pragmatic context and acoustic realization.

he stayed longer than I did -er [[ he he stayed x long] 2 than [ I F stayed x long ]~2] [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus

… I should have liked that song a lot more than I did. [more x[[should w[ I like that song x well in w]] than [I like that song x well in w 0 ]]]

I understand even less than I did before even less [[ I prs understand x much] 2 than [I understood x much before F ] ]~2]

Alternative semantics for focus -er [[ he he stayed x long] 2 than [ I F stayed x long ]~2] [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus Semantics of focus is the set of alternative propositions of the form ‘y stayed x long’. Licensing condition for focus The proposition contributed by the antecedent is an element of the alternative set that is distinct from the proposition contributed by the scope.

Givenness/Entailment semantics for focus [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus Licensing condition for focus The antecedent entails the union of the alternative set (focus existential closure). If he stayed d long, then someone stayed d long.

Alternative semantics and givenness semantics are predictive theories of focus licensing, if the antecedent is stipulated. Almost always, the antecedent for focus in the than-clause is the main clause. With that hedge, grammar makes a prediction about where focus should go. Try to correlate this with acoustic signal.

Focus in comparative clauses Coherent semantic theory about where focus should go Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause On a theoretical basis, we often think we know the correct grammatical analysis of comparative sentences people use, including the features that determine focus Nice model system for studying contextual conditioning and phonetic realization of contrastive intonation

Automatic harvest procedure Replicates how a user would interact with website.

curlretrieve information designated by URL cutmp3 cut audio file given offsets awk process html awk, bash make control Time for one run retrieving 1000 hits is less than a day.

116 a1135.g.akamai.net 110 hosted-media.podzinger.com 76 media.weei.podzinger.com 58 feeds.wnyc.org 54 media.libsyn.com 51 podcastdownload.npr.org 50 feeds.feedburner.com 39 library.kraftsportsgroup.com media.wrko.podzinger.com

Jonathan Howell

Classification experiment He stayed longer than I F did. s class antecedent: He stayed x long I should have liked that song a lot more than I did F. ns class antecedent: I should have liked that song x much I understand even less than I did before F I understand even x littlens class

SVM classifier in R statistical environement (e1071 package) 308 acoustic parameters extracted with Praat 91 tokens in cross-validated design (Several hundred more tokens with similar results.)

1.all parameters 3.duration of “I” only 4.duration of “I”, duration of “d” closure, formant difference 40% into “I”

Jonathan Howell

Method suggested by comparatives experiment Find common grammatical or lexical contexts that trigger representations with different prosodic realization, according to relatively well- understood and well-supported theory. Correlate the semantic-grammatical categories directly with the speech signal using machine learning. Don’t worry about phonemic/morphemic categories like the accent types H* and L+H*, or assume they be annotated on the basis of pitch contour.

Fery and Ishihara (2009) Journal of Linguistics 45.3 SOF: Prenuclear Die meisten unserer Kollegen waren beim Betriebsausflug lässig angezogen. Nur Peter hat eine Krawatte getragen. Nur Peter hat sogar einen Anzug getragen.

He’s gotta pick someone who is younger than he is, and is definitely more conservative than he is. [-er [ t is d young than he is d young]] 2 and more [[ t is is d conservative F ] 3 than [ he F is d conservative ] ~3 ] ~2

+Generic corpus of focused pronouns The SVM classifier is good at detecting focused pronouns using local features on pronoun: Duration of vowel “I” [ai] Distance between f1 and f2 halfway into vowel “i” [ai]

Method suggested by comparatives experiment Find common grammatical or lexical contexts that trigger representations with different prosodic realization, according to relatively well- understood and well-supported theory. Correlate the semantic-grammatical categories directly with the speech signal using machine learning. Don’t worry about phonemic/morphemic categories like the accent types H* and L+H*, or assume they be annotated on the basis of pitch contour.

Inherently contrastive phrases in MY opinion... admits that other things might be true in other people’s opinions NEXT Friday... at end weekly Friday radio program on the TENOR saxophone... in Jazz program where there is frequent mention also of the Alto saxophone

1162 of> my life 1110 in> my life 681 in> my mind 377 in> my opinion 276 in> my view 231 in> my heart 217 of> my career 199 in> my career 183 in> my head 146 with> my life 146 with> my family 141 on> my way 140 of> my mind 139 on> my part 134 in> my lifetime 125 in> my office 115 of> my family 108 with> my wife 106 on> my face 106 in> my house 99 on> my mind 96 over> my head 96 in> my family 91 for> my family 90 in> my face

+ Does general SVM pronoun focus classifier work on SOF tokens? + How common is SOF?

[you made a very small amount more than I did] 2 [now F I make much F more than you F do] ~2 2 is of the form required form of antecedent: at t speaker makes d-much more than hearer makes actual: at t hearer makes d-much more than speaker makes

two SOF tokens You made a very small amount more than I did. Now I make much F more than you F do.

There is a correlation between the string context and prosody type? + Learn information-theoretically -- two distributions of acoustic pronoun realizations -- two distributions of trigram contexts that condition them

P( in opinion) = def P(type 1) P( 〈 in,opinion 〉 | type 1) P( | type 1) + P(type 2) P( 〈 in,opinion 〉 | type 2) P( | type 2)

What don’t we know about Focus realization? Accent type Claim that English focal accents divide into Topic (T), contrastive theme, L+H* Focus (F), H* What about Anna? Who did she come with? Anna T came with Manny F. What about Manny? Who came with him? Anna F came with Manny T.

Attempt to make do pragmatically without a T/F distinction in alternative semantics Michael Wagner (2008). A Compositional Theory of Contrastive Topics. NELS 28. Controversy whether there is a categorial phonetic distinction among H*, L*+H, L+H*.

He’s gotta pick someone who is younger than he is, and is definitely more conservative than he is. [-er [[t is d young F ] 5 than [he F is d young] ~5 ]] 2 ~4 and more [[ t is is d conservative F ] 3 than [ he F is d conservative ] ~3 ] 4 ~2

A. Nenkova, J. Brenier, A. Kothari, S. Calhoun, L. Whitton, D. Beaver, D. Jurafsky To memorize or predict: prominence labeling in conversational speech Sasha Calhoun. Information Structure and the Prosodic Structure of English: a Probabilistic Relationship. PhD thesis, University of Edinburgh, 2006 Markup and prediction of accented words in Switchboard corpus Try to do this for pronouns only

Inherently contrastive phrases in MY opinion... admits that other things might be true in other people’s opinions NEXT Friday... at end weekly Friday radio program on the TENOR saxophone... in Jazz program where there is frequent mention also of the Alto saxophone

There is a correlation between the string context and prosody type? + Learn information-theoretically -- two distributions of acoustic pronoun realizations -- two distributions of trigram contexts that condition them

There is a correlation between the string context and prosody type? + Learn information-theoretically -- two distributions of acoustic pronoun realizations -- two distributions of trigram contexts that condition them

What don’t we know about Focus realization? Accent type Claim that English focal accents divide into Topic (T), contrastive theme, L+H* Focus (F), H* What about Anna? Who did she come with? Anna T came with Manny F. What about Manny? Who came with him? Anna F came with Manny T.

What don’t we know about Focus realization? Non-anaphoric focus. Fery and Samek-Lodovici (2007) Language 82.1 [(An AMERICANf farmer) (with a purple CHEVROLET) (was talking to a CANADIANf farmer) (with a purple Chevrolet)]f

What don’t we know about Focus realization? Accent type Claim that English focal accents divide into Topic (T), contrastive theme, L+H* Focus (F), H* What about Anna? Who did she come with? Anna T came with Manny F. What about Manny? Who came with him? Anna F came with Manny T.

two SOF tokens You made a very small amount more than I did. Now I make much F more than you F do.

He’s gotta pick someone who is younger than he is, and is definitely more conservative than he is. [-er [ t is d young than he is d young]] 2 and more [[ t is is d conservative F ] 3 than [ he F is d conservative ] ~3 ] ~2

Distribution of datasets Audio snippets can probably by distributed under fair use. /Prosody+Datasets

A lot of naturalistic data bearing on theories of prosody can be found using search engines that index audio using ASR. Machine learning classification is a good methodology for prosody, because one can work with semantic-pragmatic categories that figure in formal theories. For focus, try to do build classifiers, not just find statistically significant correlations with acoustic parameters. Classifiers such as SVM can combine information from a lot of features.