Download presentation
Presentation is loading. Please wait.
2
A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University
3
Goals Study phonetic realization of focus in cases where formal-semantic theories make clear predictions. Natural data from podcasts, radio, etc. Find data using speech search engine based on speech recognition (Everyzing) Automate all of the workflow Today: preliminary data from pilot
4
he stayed longer than I did -er [[ he he stayed x long] 2 than [ I F stayed x long ]~2] [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus
5
… I should have liked that song a lot more than I did. [more x[[should w[ I like that song x well in w]] than [I like that song x well in w 0 ]]]
6
I understand even less than I did before even less [[ I prs understand x much] 2 than [I understood x much before F ] ]~2]
7
Focus in comparative clauses Coherent syntactic-semantic theory about where focus should go Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause On a theoretical basis, we often think we know the correct grammatical analysis of sentences people use
15
Result Hundreds of elements of a minimal pair varying position for focus Speech files for short and 10-second intervals spanning than I did Everyzing html contains time offsets for beginnings words. These are converted by program into a Praat representation. Alingments are not good enough to use without correction.
16
Classification Listen to sound snippet to determine if there is an actual token of “than I did”. True in 56% of cases in a sample of 179 tokens.
17
Classify correct tokens into three grammatical-semantic classes scomparing than- and main clauses, reference varies in the position of “I”. This licenses focus on the subject “I”. [ he looked younger than I did. ] 21/40 tokens
18
d Comparing than- and main clauses, reference is constant in the position of “I”, but varies in the possible-world or temporal index of did, and not in any following position. Depending on details of the representation of modality and time, this could license a focus on “did”. 5/40 tokens
19
f comparing than- and main clauses, reference in the position of I is constant, but varies in some position following did, often a temporal phrase. I actually look younger now than I did 5 years ago 13/40 tokens
22
Mark vowel intervals in I and did with hand work. Pitch in vowel region and duration of vowel region contribute positively to the area under the pitch curve (definite integral of pitch). Number of glottal pulses in the vowel region.
26
NLP vs. Acoustic Phonetics Classification based on signal NLP classifier based on correct sentence (or speech recognition output), using parsing and machine learning on text features
27
Multiple focus Issues marking of multiple foci with different scopes, and prominence of focus relative to accents not marking focus. You made a very small amount more than I did. Now I make much F more than you F do.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.