Presentation is loading. Please wait.

Presentation is loading. Please wait.

Focus Contrast in Web Harvested Data Mats Rooth Linguistics and CIS Cornell University based on joint research with Jonathan Howell.

Similar presentations


Presentation on theme: "Focus Contrast in Web Harvested Data Mats Rooth Linguistics and CIS Cornell University based on joint research with Jonathan Howell."— Presentation transcript:

1

2 Focus Contrast in Web Harvested Data Mats Rooth Linguistics and CIS Cornell University based on joint research with Jonathan Howell

3

4

5

6

7

8

9

10 Radio sites Hundreds use Everyzing/Ramp technology Full ASR transcripts often available Time offset sometimes available Either URL of audio or RSS feed almost always available Not not enough hits for one target on a single site A lot or repetitions of same audio Seemingly less “spontaneous” speech than on Everyzing

11 Youtube Searchable closed captions, some obtained with ASR and some provided by video author Time offset available on hit page and in URL Youtube player can seek to a time Transcript of snippet available Full transcript not available Not enough data now Can hope that a lot of indexed spontaneous speech will become available

12 Reuters Insider Searchable audio based on Everyzing/Ramp Full transcripts available Player seeks to timestamp

13

14 Goals Assemble large, focused datasets of examples where intonation varies in a way that correlates with syntax, semantics, or pragmatics. Study correlation between lexical/grammatical/pragmatic context and acoustic realization.

15 he stayed longer than I did -er [[ he he stayed x long] 2 than [ I F stayed x long ]~2] [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus

16 … I should have liked that song a lot more than I did. [more x[[should w[ I like that song x well in w]] than [I like that song x well in w 0 ]]]

17 I understand even less than I did before even less [[ I prs understand x much] 2 than [I understood x much before F ] ]~2]

18 Alternative semantics for focus -er [[ he he stayed x long] 2 than [ I F stayed x long ]~2] [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus Semantics of focus is the set of alternative propositions of the form ‘y stayed x long’. Licensing condition for focus The proposition contributed by the antecedent is an element of the alternative set that is distinct from the proposition contributed by the scope.

19 Givenness/Entailment semantics for focus [ y stayed x-long ] antecedent clause [ speaker stayed x-long ] scope of focus Licensing condition for focus The antecedent entails the union of the alternative set (focus existential closure). If he stayed d long, then someone stayed d long.

20 Alternative semantics and givenness semantics are predictive theories of focus licensing, if the antecedent is stipulated. Almost always, the antecedent for focus in the than-clause is the main clause. With that hedge, grammar makes a prediction about where focus should go. Try to correlate this with acoustic signal.

21 Focus in comparative clauses Coherent semantic theory about where focus should go Possibilities are constrained, because the main clause is usually the antecedent for focus interpretation in the comparative clause On a theoretical basis, we often think we know the correct grammatical analysis of comparative sentences people use, including the features that determine focus Nice model system for studying contextual conditioning and phonetic realization of contrastive intonation

22

23

24

25

26 Automatic harvest procedure Replicates how a user would interact with website.

27

28

29 curlretrieve information designated by URL cutmp3 cut audio file given offsets awk process html awk, bash make control Time for one run retrieving 1000 hits is less than a day.

30 116 a1135.g.akamai.net 110 hosted-media.podzinger.com 76 media.weei.podzinger.com 58 feeds.wnyc.org 54 media.libsyn.com 51 podcastdownload.npr.org 50 feeds.feedburner.com 39 library.kraftsportsgroup.com 33 www.whiterosesociety.org 24 www.kpbs.org 21 www.podtrac.com 21 media.wrko.podzinger.com

31 Jonathan Howell

32

33 Classification experiment He stayed longer than I F did. s class antecedent: He stayed x long I should have liked that song a lot more than I did F. ns class antecedent: I should have liked that song x much I understand even less than I did before F I understand even x littlens class

34

35 SVM classifier in R statistical environement (e1071 package) 308 acoustic parameters extracted with Praat 91 tokens in cross-validated design (Several hundred more tokens with similar results.)

36 1.all parameters 3.duration of “I” only 4.duration of “I”, duration of “d” closure, formant difference 40% into “I”

37 Jonathan Howell

38

39 Method suggested by comparatives experiment Find common grammatical or lexical contexts that trigger representations with different prosodic realization, according to relatively well- understood and well-supported theory. Correlate the semantic-grammatical categories directly with the speech signal using machine learning. Don’t worry about phonemic/morphemic categories like the accent types H* and L+H*, or assume they be annotated on the basis of pitch contour.

40

41 Fery and Ishihara (2009) Journal of Linguistics 45.3 SOF: Prenuclear Die meisten unserer Kollegen waren beim Betriebsausflug lässig angezogen. Nur Peter hat eine Krawatte getragen. Nur Peter hat sogar einen Anzug getragen.

42 He’s gotta pick someone who is younger than he is, and is definitely more conservative than he is. [-er [ t is d young than he is d young]] 2 and more [[ t is is d conservative F ] 3 than [ he F is d conservative ] ~3 ] ~2

43 +Generic corpus of focused pronouns The SVM classifier is good at detecting focused pronouns using local features on pronoun: Duration of vowel “I” [ai] Distance between f1 and f2 halfway into vowel “i” [ai]

44 Method suggested by comparatives experiment Find common grammatical or lexical contexts that trigger representations with different prosodic realization, according to relatively well- understood and well-supported theory. Correlate the semantic-grammatical categories directly with the speech signal using machine learning. Don’t worry about phonemic/morphemic categories like the accent types H* and L+H*, or assume they be annotated on the basis of pitch contour.

45 Inherently contrastive phrases in MY opinion... admits that other things might be true in other people’s opinions NEXT Friday... at end weekly Friday radio program on the TENOR saxophone... in Jazz program where there is frequent mention also of the Alto saxophone

46 1162 of> my life 1110 in> my life 681 in> my mind 377 in> my opinion 276 in> my view 231 in> my heart 217 of> my career 199 in> my career 183 in> my head 146 with> my life 146 with> my family 141 on> my way 140 of> my mind 139 on> my part 134 in> my lifetime 125 in> my office 115 of> my family 108 with> my wife 106 on> my face 106 in> my house 99 on> my mind 96 over> my head 96 in> my family 91 for> my family 90 in> my face

47

48 + Does general SVM pronoun focus classifier work on SOF tokens? + How common is SOF?

49 [you made a very small amount more than I did] 2 [now F I make much F more than you F do] ~2 2 is of the form required form of antecedent: at t speaker makes d-much more than hearer makes actual: at t hearer makes d-much more than speaker makes

50 two SOF tokens You made a very small amount more than I did. Now I make much F more than you F do.

51 There is a correlation between the string context and prosody type? + Learn information-theoretically -- two distributions of acoustic pronoun realizations -- two distributions of trigram contexts that condition them

52 P( in opinion) = def P(type 1) P( 〈 in,opinion 〉 | type 1) P( | type 1) + P(type 2) P( 〈 in,opinion 〉 | type 2) P( | type 2)

53 What don’t we know about Focus realization? Accent type Claim that English focal accents divide into Topic (T), contrastive theme, L+H* Focus (F), H* What about Anna? Who did she come with? Anna T came with Manny F. What about Manny? Who came with him? Anna F came with Manny T.

54 Attempt to make do pragmatically without a T/F distinction in alternative semantics Michael Wagner (2008). A Compositional Theory of Contrastive Topics. NELS 28. Controversy whether there is a categorial phonetic distinction among H*, L*+H, L+H*.

55 He’s gotta pick someone who is younger than he is, and is definitely more conservative than he is. [-er [[t is d young F ] 5 than [he F is d young] ~5 ]] 2 ~4 and more [[ t is is d conservative F ] 3 than [ he F is d conservative ] ~3 ] 4 ~2

56 A. Nenkova, J. Brenier, A. Kothari, S. Calhoun, L. Whitton, D. Beaver, D. Jurafsky To memorize or predict: prominence labeling in conversational speech Sasha Calhoun. Information Structure and the Prosodic Structure of English: a Probabilistic Relationship. PhD thesis, University of Edinburgh, 2006 Markup and prediction of accented words in Switchboard corpus Try to do this for pronouns only

57 Inherently contrastive phrases in MY opinion... admits that other things might be true in other people’s opinions NEXT Friday... at end weekly Friday radio program on the TENOR saxophone... in Jazz program where there is frequent mention also of the Alto saxophone

58 There is a correlation between the string context and prosody type? + Learn information-theoretically -- two distributions of acoustic pronoun realizations -- two distributions of trigram contexts that condition them

59 There is a correlation between the string context and prosody type? + Learn information-theoretically -- two distributions of acoustic pronoun realizations -- two distributions of trigram contexts that condition them

60 What don’t we know about Focus realization? Accent type Claim that English focal accents divide into Topic (T), contrastive theme, L+H* Focus (F), H* What about Anna? Who did she come with? Anna T came with Manny F. What about Manny? Who came with him? Anna F came with Manny T.

61 What don’t we know about Focus realization? Non-anaphoric focus. Fery and Samek-Lodovici (2007) Language 82.1 [(An AMERICANf farmer) (with a purple CHEVROLET) (was talking to a CANADIANf farmer) (with a purple Chevrolet)]f

62 What don’t we know about Focus realization? Accent type Claim that English focal accents divide into Topic (T), contrastive theme, L+H* Focus (F), H* What about Anna? Who did she come with? Anna T came with Manny F. What about Manny? Who came with him? Anna F came with Manny T.

63 two SOF tokens You made a very small amount more than I did. Now I make much F more than you F do.

64 He’s gotta pick someone who is younger than he is, and is definitely more conservative than he is. [-er [ t is d young than he is d young]] 2 and more [[ t is is d conservative F ] 3 than [ he F is d conservative ] ~3 ] ~2

65

66

67

68 Distribution of datasets Audio snippets can probably by distributed under fair use. http://confluence.cornell.edu/display/prosody /Prosody+Datasets

69 A lot of naturalistic data bearing on theories of prosody can be found using search engines that index audio using ASR. Machine learning classification is a good methodology for prosody, because one can work with semantic-pragmatic categories that figure in formal theories. For focus, try to do build classifiers, not just find statistically significant correlations with acoustic parameters. Classifiers such as SVM can combine information from a lot of features.


Download ppt "Focus Contrast in Web Harvested Data Mats Rooth Linguistics and CIS Cornell University based on joint research with Jonathan Howell."

Similar presentations


Ads by Google