Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia.

Slides:



Advertisements
Similar presentations
Statistical modelling of MT output corpora for Information Extraction.
Advertisements

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
Speech and Language Processing Chapter 10 of SLP Advanced Automatic Speech Recognition (II) Disfluencies and Metadata.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
ASSESSING ORAL CLASSROOM PRESENTATIONS DAVID W. KALE, PH.D. PROFESSOR OF COMMUNICATION, MVNU.
1 A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors Joachim Wagner, Jennifer Foster, and.
LingPipe Does a variety of tasks  Tokenization  Part of Speech Tagging  Named Entity Detection  Clustering  Identifies.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
1 ICSI-SRI-UW Structural MDE: Modeling, Analysis, & Issues Yang Liu 1,3, Elizabeth Shriberg 1,2, Andreas Stolcke 1,2, Barbara Peskin 1, Jeremy Ang 1, Mary.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
The ‘London Corpora’ projects - the benefits of hindsight - some lessons for diachronic corpus design Sean Wallis Survey of English Usage University College.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Discourse Analysis Force Migration and Refugee Studies Program The American University in Cairo Professor Robert S. Williams.
Academia Británica Pulling teeth UTTERANCE above ALL March ̍11 UTTERANCE above ALL Academia Británica Pulling teeth March ̍11 um, so...what are we talkin’about?
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
FACULTY OF ENGLISH LANGUAGE AND LITERATURE G. TOGIA SECTION ΠΗ-Ω 31/05/2016 Introduction to linguistics II.
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
1 Using a Large LM Nicolae Duta Richard Schwartz EARS Technical Workshop September 5, Martigny, Switzerland.
AQUAINT Herbert Gish and Owen Kimball June 11, 2002 Answer Spotting.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Sentence Unit Detection in Conversational Dialogue Elizabeth Lingg, Tejaswi Tennetti, Anand Madhavan it has a lot of garlic in it too does n't it i it.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
A quick walk through phonetic databases Read English –TIMIT –Boston University Radio News Spontaneous English –Switchboard ICSI transcriptions –Buckeye.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
In-Service Teacher Training Assessment in IGCSE English as a Second Language 0510 Session 2: Question papers and mark schemes.
CALL (COMPUTER-ASSISTED LANGUAGE LEARNING)
Communicative and Academic English for the EFL Professional.
Let’s get it right the first time…. What Do I Expect? Today you are going to have the opportunity to edit and revise your own paper and a classmate’s.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
In-Service Teacher Training Assessment in IGCSE English as a Second Language 0510 Session 2: Question papers and mark schemes.
Gender What question would you like to ask these people? DO NOT CHOOSE THE OBVIOUS QUESTION tch?v=WDswiT87oo8.
An Introduction to Programming with C++ Sixth Edition
Chapter Four Use the Telephone Well for Good Service.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Helpful Hints for Paper Writing Let’s get it right the first time…
CS 4705 Corpus Linguistics and Machine Learning Techniques.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Dec. 4-5, 2003EARS STT Workshop1 Broadcast News Training Experiments Anand Venkataraman, Dimitra Vergyri, Wen Wang, Ramana Rao Gadde, Martin Graciarena,
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Automatic Writing Evaluation
EmSAT English Achieve.
Turn-taking and Disfluencies
Studying Spoken Language Text 17, 18 and 19
Searching and Summarizing Speech
Turn-taking and Disfluencies
Agustín Gravano & Julia Hirschberg {agus,
Tetsuya Nasukawa, IBM Tokyo Research Lab
Asking Questions There are several ways of asking questions in English
Presentation transcript:

Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia in Pilsen, Czech Republic

J. Kolar and J. Svec2 Talk Overview Structural metadata annotation Speech data Statistics about fillers Statistics about edit disfluencies Statistics about sentence-like units Summary

J. Kolar and J. Svec3 Structural Metadata Extraction Metadata Extraction (MDE) research started as part of DARPA EARS program Metadata annotation scheme for MDE introduced by LDC (originally for English  we have extended it to Czech) ULTIMATE GOAL of MDE: Automatic conversion of raw speech recognition output to forms more useful to humans and downstream automatic processes

J. Kolar and J. Svec4 MDE Annotation Subtasks Boundaries of syntactic/semantic units (SUs) Statements, Interrogatives, Incompletes Coordination breaks, Clausal breaks Non-content words (fillers): Filled pauses (FPs) Discourse markers (DMs) Speech disfluencies (edits): Deletable regions (DelRegs), Interruption points, Explicit editing terms, Corrections

J. Kolar and J. Svec5 MDE Annotation Example but I you know really pre- uh prefer this form of of um presentation she Sheila told me on Tuesday no on Wednesday she didn’t so let’s move on because we don’t have uh don’t have time well do you like this this example but I you know really [pre-]* uh prefer this form [of]* of um presentation/. [she]* Sheila told me [on Tuesday]* no on Wednesday/, she didn’t/. so let’s move on/, because we [don’t have]* uh don’t have time/. well do you like [this]* this example/?

J. Kolar and J. Svec6 Goal of This Paper Analyse and compare two Czech MDE corpora from different domains in terms of metadata statistics Compare Czech Broadcast News (BN) vs. Broadcast Conversations (BC) Also compare Czech and English MDE corpora – English Broadcast News and Conversational Telephone Speech (CTS)

J. Kolar and J. Svec7 Czech Broadcast News Data News from 3 TV channels and 4 radio stations Both public and commercial broadcast companies Differing in presentation style 26 hours of transcribed speech ~ 300 speakers Speech recordings and verbatim transcripts publicly available from LDC

J. Kolar and J. Svec8 Broadcast Conversation Data 52 recordings of a Czech radio talk show – Radioforum 24 hours of transcribed speech ~ 100 speakers 1-3 guests spontaneously answer questions asked by 1-2 interviewers Mostly political debates Currently being extended by additional 20 recordings (~10 hours)

J. Kolar and J. Svec9 Statistics about Fillers Filled pauses more frequent in Czech Broadcast Conversations (3.8% of words) than in News (0.5%) English MDE: CTS – 2.2%, BN – 1.4% Discourse markers also more frequent in Czech Conversations (1.6%) than in News (0.1%) English MDE: CTS – 4.4%, BN – 0.5%

J. Kolar and J. Svec10 Statistics about Edit Disfluencies Deletable regions – 2.8% of words in Conversations and 0.2% in News English MDE: 5.4% in CTS and 1.5% in BN Percentage of disfluencies having a correction larger in News (94.6%) than in Conversations (83.8%) Explicit editing terms rare in both corpora – occur just at 4% of disfluencies

J. Kolar and J. Svec11 POS Analysis of Edit Disfluencies Tagged the Czech corpora employing an automatic POS tagger Czech uses structured tags with 15 positions; we only used the first position distinguishing 10 basic POS Computed and compared three POS distributions: 1)Whole corpus 2)Deletable regions only 3)Corrections only

J. Kolar and J. Svec12 POS Analysis of Edit Disfluencies

J. Kolar and J. Svec13 Statistics about SUs Average SU length: Conversations (14.5 words) shows longer SUs than News (13.0) English BN (12.5) similar to Czech, but CTS shows much shorter SUs (7.0) than Broadcast Conversations SU-internal breaks (clausal and coordination) more frequent in Conversations than in News (49% vs. 31% of all SU symbols)  Complex and compound sentences more common in spontaneous conversations than in prearranged news

J. Kolar and J. Svec14 Summary Broadcast Conversations contain significantly more fillers and disfluencies than News Conversations also show longer SUs and contain a higher number of complex sentences than News Deletable regions and corrections in both corpora show different POS distributions in comparison with the general POS distributions We plan to make Czech MDE corpora publicly available

Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia in Pilsen, Czech Republic