Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.

Slides:

Advertisements

Similar presentations

Story Segmentation of Broadcast News Mehrbod Sharifi Thanks to Andrew Rosenberg ~mehrbod/presentations/SSegDec06.pdf.

Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.

® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

9/5/20051 Acoustic/Prosodic and Lexical Correlates of Charismatic Speech Andrew Rosenberg & Julia Hirschberg Columbia University Interspeech Lisbon.

Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,

10/10/20051 Acoustic/Prosodic and Lexical Correlates of Charismatic Speech Andrew Rosenberg & Julia Hirschberg Columbia University 10/10/05 - IBM.

2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.

Macquarie RT05s Speaker Diarisation System Steve Cassidy Centre for Language Technology Macquarie University Sydney.

9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.

New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,

National Taiwan University, Taiwan

1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

A Maximum Entropy Based Honorificity Identification for Bengali Pronominal Anaphora Resolution Apurbalal Senapati and Utpal Garain Presented by Samik Some.

Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Using Conversational Word Bursts in Spoken Term Detection Justin Chiu Language Technologies Institute Presented at University of Cambridge September 6.

Major Steps. 1.State the hypotheses.  Be sure to state both the null hypothesis and the alternative hypothesis, and identify which is the claim. H0H0.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.

Experience Report: System Log Analysis for Anomaly Detection

Machine Learning for Computer Security

Linguistic Graph Similarity for News Sentence Searching

Investigating Pitch Accent Recognition in Non-native Speech

College of Engineering

Why Study Spoken Language?

Recognizing Structure: Sentence and Topic Segmentation

Detecting Prosody Improvement in Oral Rereading

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,

Why Study Spoken Language?

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Section 12.2: Tests about a Population Proportion

An Introduction to Supervised Learning

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue

Automatic Prosodic Event Detection

Presentation transcript:

Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007

Introduction Broadcast News shows are semantically heterogeneous, however, a number of GALE (NLP) systems expect homogeneous material –Anaphora Resolution, Information Extraction, Information Retrieval, Multi-Document Summarization Story Segmentation divide a show into homogenous regions, each addressing a single topic.

Story Segmentation Technique Extract features over the unit of analysis as determined by the input segmentation Use a decision tree (C4.5) to predict which segment boundaries are also story boundaries Evaluate using WinDiff –A variant of Beeferman’s Pk that counts misses and false positives equally

Input Segmentations ASR Word boundaries –Atomic level segmentation: Assume a story boundary may occur at any word boundary. Hypothesized Sentence Boundaries –Delivered hypotheses, 0.3 and 0.1 confidence thresholds Pause-based Acoustic Chunking –250ms, 500ms threshold based on gap between ASR word boundaries

Acoustic Segmentations Intonational Phrase Segmentation –Decision Tree model built using manual IP annotation of a single English BN show. Acoustic feature vector –NB:This model is applied to Arabic and Mandarin shows Likely poor IP detection on non-English, but possibly a useful input segmentation regardless

AcousticTiling Operates as TextTiling does, though the sliding window is now composed of vectors of acoustic features extracted from either side of a sliding boundary. Local minima define segmentation boundaries Features: –Mean F0 –Mean Intensity –Mean Speaking Rate (ASR vowels / sec) –Mean Vowel Length (speaker and vowel id normalized) –Pause Length The comparison here is between the current and the surrounding boundaries, not between the previous and following windows.

Input Segmentation Statistics Input Segmentation Story Boundary Dist. Exact Coverage Mean placement error (wds) Word0.48%100%0 Hyp SUs8.3%68.3%3.6 SU thresh.36.4%74.4%1.8 SU thresh.14.3%82.9% ms pause5.1%83.5% ms pause12.2%71.8%12.7 Hyp IPs2.6%62.0%1.1 AcousticTiling2.5%19.1%199.2

Story Boundary Detection Results (WinDiff, k=100wds) Input Segmentation AraEngMan Word Hyp SUs SU thresh SU thresh ms pause ms pause Hyp IPs AcousticTiling

Conclusions There is a significant impact of input segmentation to story boundary detection. Using a low SU boundary threshold or a short (250ms) pause based segmentation to be the most reliable input segmentations for story segmentation across language. –Using no segmentation at all performs competitively in English and Arabic AcousticTiling is not especially well-suited to the task of story segmentation –However, does not produce the worst results for English and Arabic -- Tonal interaction in Mandarin?

Thank you

Lexical Features TextTiling coefficients LCSeg coefficients Keywords immediately preceding or following the current boundary

Speaker Features Is this segment boundary a speaker boundary? Is the previous segment, the last spoken by a speaker in this show? The first? What percentage of the show’s material was spoken by the speaker of the segment immediately preceding this boundary?

Acoustic Features Min, max, mean, median, stdev, mean slope of raw and speaker normalized F0 and raw intensity (and delta between previous and following segment) Speaking Rate (and delta) –Frame based (voiced/unvoiced) –Raw and spkr norm Vowels per sec Length (and change in length) of the segment final vowel and rheme –Raw and spkr normalized