Searching and Summarizing Speech

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Improved Name Recognition with Meta-data Dependent Name Networks published by Sameer R. Maskey, Michiel Bacchiani, Brian Roark, and Richard Sproat presented.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Web Mining Research: A Survey
CS 4705 Robust Semantics, Information Extraction, and Information Retrieval.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Appendix A Implementing Unified Messaging. Appendix Overview Overview of Telephony Introducing Unified Messaging Configuring Unified Messaging.
Julia Hirschberg, Michiel Bacchiani, Phil Isenhour, Aaron Rosenberg, Larry Stead, Steve Whittaker, Jon Wright, and Gary Zamchick (with Martin Jansche,
The PrestoSpace Project Valentin Tablan. 2 Sheffield NLP Group, January 24 th 2006 Project Mission The 20th Century was the first with an audiovisual.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Appendix A Implementing Unified Messaging. Appendix Overview Overview of Telephony Introducing Unified Messaging Configuring Unified Messaging.
1 Applications of video-content analysis and retrieval IEEE Multimedia Magazine 2002 JUL-SEP Reporter: 林浩棟.
1 Language Technologies (2) Valentin Tablan University of Sheffield, UK ACAI 05 ADVANCED COURSE ON KNOWLEDGE DISCOVERY.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek.
CS 4705 Corpus Linguistics and Machine Learning Techniques.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Teaching Listening Why teach listening?
Pepper modifying Sommerville's Book slides
التوجيه الفني العام للغة الإنجليزية
Building Community around Tools for Automated Video Transcription for Rich Media Notebooks: The SpokenMedia Project Brandon Muramatsu MIT,
Disruptive Skilling for On Demand Data Services
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Visual Information Retrieval
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Investigating Pitch Accent Recognition in Non-native Speech
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
Automatic Speech Recognition
Introduction Multimedia initial focus
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
E-Commerce Lecture 8.
Towards Emotion Prediction in Spoken Tutoring Dialogues
Conditional Random Fields for ASR
Automatic Hedge Detection
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
CSC480 Software Engineering
Amazing The Re:Search Engine Jaime Teevan MIT, CSAIL.
Recognizing Structure: Dialogue Acts and Segmentation
Professor John Canny Spring 2003
Aspect-based sentiment analysis
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Automatic Speech Recognition
Teaching Listening Based on Active Learning.
Social Knowledge Mining
Data Mining, Information Extraction and Search in Spoken Documents
PT2520 Unit 2: Gather Information and Define Requirements
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Phone, voice mail & phone conferencing
High Frequency Word Entrainment in Spoken Dialogue
Searching and Summarizing Speech
Data Mining, Information Extraction and Search in Spoken Documents
Hands-on tutorial: Using Praat for analysing a speech corpus
Spoken Dialogue Systems
iSRD Spam Review Detection with Imbalanced Data Distributions
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
Recognizing Structure: Dialogue Acts and Segmentation
AGMLAB Information Technologies
And make a TV Entertainment Piece
Speaker Identification:
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

Searching and Summarizing Speech Julia Hirschberg CS 6998 11/28/2018

Today Speech browsing and search Speech summarization: 2 views Hori et al Barzilay et al Speech data mining 11/28/2018

Searching Audio Data Today, large amounts of audio data available: on the web, in company archives, in our homes But what can we do with it? We have tools supporting random access to text – but for audio we’re limited to serial search Goal: tools to search audio as easily as text 11/28/2018

Why? Searching online news and archives Searching a/v archives, movies Searching trial recordings and legislative sessions Browsing meetings, customer care exchanges, focus groups Telephone calls and voicemail 11/28/2018

Audio Browsing/Retrieval for Voicemail Motivated by interviews, surveys and usage logs of heavy users: Hard to scan new msgs to find those you need to deal with quickly Hard to find msg you want in archive Hard to locate information you want in any msg How could we help? Increasing amounts of audio data available in corporate, public and private collections – but useless without tools for searching, browsing SCAN and SCANMail prototypes: tools for searching and browsing speech data in Broadcast News and voicemail domains Future applications: customer care, conference call and meeting browsing 11/28/2018

SCANMail Architecture Caller SCANMail Architecture SCANMail Subscriber

Corpus Collection Recordings collected from 138 AT&T Labs employees’ mailboxes 100 hours; 10K msgs; 2500 speakers Gender balanced: 12% non-native speakers Mean message duration 36.4 secs, median 30.0 secs Hand-transcribed and annotated with caller id, gender, age, entity demarcation (names, dates, telnos) 11/28/2018

Transcription and Bracketing [ Greeting: hi R ] [ CallerID: it's me ] give me a call [ um ] right away cos there's [ .hn ] I guess there's some [ .hn ] change [ Date: tomorrow ] with the nursery school and they [ um ] [ .hn ] anyway they had this idea [ cos ] since I think J's the only one staying [ Date: tomorrow ] for play club so they wanted to they suggested that [ .hn ] well J2 actually offered to take J home with her and then would she 11/28/2018

would meet you back at the synagogue at [ Time: five thirty ] to pick her up [ .hn ] [ uh ] so I don't know how you feel about that otherwise M_ and one other teacher would stay and take care of her till [ Date: five thirty tomorrow ] but if you [ .hn ] I wanted to know how you feel before I tell her one way or the other so call me [ .hn ] right away cos I have to get back to her in about an hour so [ .hn ] okay [ Closing: bye [ .nhn ] [ .onhk ] 11/28/2018

Audix password: (null) SCANMail Demo http://www.fancentral.org/~isenhour/scanmail/demo.html Audix extension: 8380 Audix password: (null) 11/28/2018

Information Extraction from Speech Jansche & Abney ‘02 11/28/2018

Speech Summarization: Extraction Techniques Hori et al ‘02 Inoue et al ‘04 11/28/2018

Domain Specific Summarization (Barzilay et al ‘00) Motivation: lab experiments show little facilitation of speech summarization by techniques that do improve search Domain: Broadcast News Idea: knowing what type of speaker (anchor, reporter, interviewee) is speaking provides structural clues that can “outline” the newscast since programs are predictable 11/28/2018

SCAN: Spoken Content-based Audio Navigator TREC SDR corpus of Broadcast News Segment speech `documents’ into audio `paratones’ acoustically Segmentation module trained on hand-labeled discourse structure annotation in another domain Classify recording conditions, e.g. Music, telephone bandwidth, wide-band Run ASR with appropriate acoustic models (~70% wac) Index (errorful) transcripts using SMART IR 11/28/2018

Transcript prosodically formatted Overview provides abstract structure Results in WYSIAWY (“What you see is almost what you hear”) GUI Transcript prosodically formatted Overview provides abstract structure 11/28/2018

SCAN db Acoustic Condition Classification Paratone Detector Recognition SCAN db Broadcast News corpus Information Retrieval GUI 11/28/2018

Search Overview Transcript 11/28/2018

Patterns in Newscasts Anchors present headlines and introduce stories Most frequent speakers Anchor/reporter turn alternation Reporter/guest turntaking during stories 11/28/2018

Data 35 broadcasts of “All Things Considered” Human and ASR transcripts (without commercials but with turn boundaries) Features to predict speaker role Lexical: ngrams 1-5, explicit introductions (current and prior segment) Contextual: labels and features of prior turns Durational: turn length (absolute and relative to previous) 11/28/2018

Methods and Results Boosting and maximum entropy --> simple weighted rules to predict speaker role Baseline: guess anchor (35.4%) Result on human transcripts: BoostTexter 79% MaxEnt 80.5% Result on ASR transcripts: BoostTexter 72.8% MaxEnt 77% 11/28/2018

Speech Data Mining How does it differ from text data mining? Maskey et al ‘04 11/28/2018