Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007
Introduction Broadcast News shows are semantically heterogeneous, however, a number of GALE (NLP) systems expect homogeneous material –Anaphora Resolution, Information Extraction, Information Retrieval, Multi-Document Summarization Story Segmentation divide a show into homogenous regions, each addressing a single topic.
Story Segmentation Technique Extract features over the unit of analysis as determined by the input segmentation Use a decision tree (C4.5) to predict which segment boundaries are also story boundaries Evaluate using WinDiff –A variant of Beeferman’s Pk that counts misses and false positives equally
Input Segmentations ASR Word boundaries –Atomic level segmentation: Assume a story boundary may occur at any word boundary. Hypothesized Sentence Boundaries –Delivered hypotheses, 0.3 and 0.1 confidence thresholds Pause-based Acoustic Chunking –250ms, 500ms threshold based on gap between ASR word boundaries
Acoustic Segmentations Intonational Phrase Segmentation –Decision Tree model built using manual IP annotation of a single English BN show. Acoustic feature vector –NB:This model is applied to Arabic and Mandarin shows Likely poor IP detection on non-English, but possibly a useful input segmentation regardless
AcousticTiling Operates as TextTiling does, though the sliding window is now composed of vectors of acoustic features extracted from either side of a sliding boundary. Local minima define segmentation boundaries Features: –Mean F0 –Mean Intensity –Mean Speaking Rate (ASR vowels / sec) –Mean Vowel Length (speaker and vowel id normalized) –Pause Length The comparison here is between the current and the surrounding boundaries, not between the previous and following windows.
Input Segmentation Statistics Input Segmentation Story Boundary Dist. Exact Coverage Mean placement error (wds) Word0.48%100%0 Hyp SUs8.3%68.3%3.6 SU thresh.36.4%74.4%1.8 SU thresh.14.3%82.9% ms pause5.1%83.5% ms pause12.2%71.8%12.7 Hyp IPs2.6%62.0%1.1 AcousticTiling2.5%19.1%199.2
Story Boundary Detection Results (WinDiff, k=100wds) Input Segmentation AraEngMan Word Hyp SUs SU thresh SU thresh ms pause ms pause Hyp IPs AcousticTiling
Conclusions There is a significant impact of input segmentation to story boundary detection. Using a low SU boundary threshold or a short (250ms) pause based segmentation to be the most reliable input segmentations for story segmentation across language. –Using no segmentation at all performs competitively in English and Arabic AcousticTiling is not especially well-suited to the task of story segmentation –However, does not produce the worst results for English and Arabic -- Tonal interaction in Mandarin?
Thank you
Lexical Features TextTiling coefficients LCSeg coefficients Keywords immediately preceding or following the current boundary
Speaker Features Is this segment boundary a speaker boundary? Is the previous segment, the last spoken by a speaker in this show? The first? What percentage of the show’s material was spoken by the speaker of the segment immediately preceding this boundary?
Acoustic Features Min, max, mean, median, stdev, mean slope of raw and speaker normalized F0 and raw intensity (and delta between previous and following segment) Speaking Rate (and delta) –Frame based (voiced/unvoiced) –Raw and spkr norm Vowels per sec Length (and change in length) of the segment final vowel and rheme –Raw and spkr normalized