Download presentation
Presentation is loading. Please wait.
Published byΆρχέλαος Δασκαλόπουλος Modified over 6 years ago
1
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Julia Hirschberg CS 4706 11/27/2018
2
Today Recognizing structural information in speech
Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 11/27/2018
3
Today Recognizing structural information in speech
Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 11/27/2018
4
Recall: Discourse Structure for Speech Generation
Theoretical accounts (e.g. Grosz & Sidner ’86) Empirical studies Text vs. speech How can they help in recognition? Features to test Acoustic/prosodic features Lexical features 11/27/2018
5
Today Recognizing structural information in speech
Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 11/27/2018
6
Indicators of Structure in Text
Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99): Domain dependent Domain independent 11/27/2018
7
Methods of Text Segmentation
Lexical cohesion methods vs. multiple source Vocabulary similarity indicates topic cohesion Intuition from Halliday & Hasan ’76 Features: Stem repetition Entity repetition Word frequency Context vectors Semantic similarity Word distance Methods: Sliding window 11/27/2018
8
Combine lexical cohesion with other cues
Lexical chains Clustering Combine lexical cohesion with other cues Features Cue phrases Reference (e.g. pronouns) Syntactic features Methods Machine Learning from labeled corpora 11/27/2018
9
Choi 2000: Text Segmentation
Implements leading methods and compares new algorithm to them on corpus of 700 concatenated documents Comparison algorithms: Baselines: No boundaries All boundaries Regular partition Random # of random partitions Actual # of random partitions 11/27/2018
10
Textiling Algorithm (Hearst ’94) DotPlot algorithms (Reynar ’98)
Segmenter (Kan et al ’98) Choi ’00 proposal Cosine similarity measure Same: 1; no overlap 0 11/27/2018
11
Choi’s algorithm has best performance (9-12% error)
Similarity matrix rank matrix Minimize effect of outliers How likely is this sentence to be a boundary, compared to other sentences? Divisive clustering based on D(n) = sum of rank values (sI,j) of segment n/ inside area of segment n (j-i+1) – for i,j the sentences at the beginning and end of segment n Keep dividing the corpus until D(n) = D(n) - D(n-1) shows little change Choi’s algorithm has best performance (9-12% error) 11/27/2018
12
Utiyama & Isahara ’02: What if we have no labeled data for our domain?
11/27/2018
13
Today Recognizing structural information in speech
Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 11/27/2018
14
Types of Discourse Structure in Spoken Corpora
Domain independent Sentence/utterance boundaries Speaker turn segmentation Topic segmentation Domain dependent Broadcast news Meetings Telephone conversations 11/27/2018
15
Today Recognizing structural information in speech
Learning from generation Learning from text segmentation Types of structural information Segmentation in spoken corpora 11/27/2018
16
Spoken Cues to Discourse Structure
Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 11/27/2018
17
Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 Add Audix tree?? 11/27/2018
18
Finding Sentence and Topic Boundaries
Statistical, Machine Learning approaches with large segmented corpora Features: Lexical cues Domain dependent Sensitive to ASR performance Acoustic/prosodic cues Domain independent Sensitive to speaker identify 11/27/2018
19
Shriberg et al ’00: Prosodic Cues
Prosody cues perform as well or better than text-based cues at sentence and topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions from LM 11/27/2018
20
Features For each potential boundary location:
Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Raw pause worked better than normalized Speaker id/change hand marked apparently F0 reset measured before and after potential boundary, range is defined over preceding word and compared to speaker-specific parameters 11/27/2018
21
Trained/tested on Switchboard and Broadcast News
Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 11/27/2018
22
Sentence segmentation results
Prosodic features Better than LM for BN Worse (on transcription) and same for ASR transcript on SB All better than chance Useful features for BN Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration Useful features for SB Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn BN sentence boundaries vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model 25.8 11.0 Chance(nonb) 22.5 4.0 Comb HMM 22.8 4.3 LM only 22.9 6.7 Pros only 11/27/2018 BN topic vs SB ASR Trans Model 13.3 6.2 Chance(nonb) 11.7 3.3 Comb HMM 11.8 4.1 LM only 10.9 3.6 Pros only ASR Trans Model .3 Chance(nonb) .1438 .1377 Comb HMM .1897 .1895 LM only .1731 .1657 Pros only
23
Topic segmentation results (BN only):
Useful features Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly 11/27/2018
24
Next Class Identifying Speech Acts Reading:
This chapter of J&M is a beta version Please keep a diary for: Any typos Any passages you think are hard to follow Any suggestions HW 3a due by class (2:40pm) 11/27/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.