Download presentation
Presentation is loading. Please wait.
1
Searching and Summarizing Speech
Julia Hirschberg CS 6998 11/28/2018
2
Today Speech browsing and search Speech summarization: 2 views
Hori et al Barzilay et al Speech data mining 11/28/2018
3
Searching Audio Data Today, large amounts of audio data available: on the web, in company archives, in our homes But what can we do with it? We have tools supporting random access to text – but for audio we’re limited to serial search Goal: tools to search audio as easily as text 11/28/2018
4
Why? Searching online news and archives Searching a/v archives, movies
Searching trial recordings and legislative sessions Browsing meetings, customer care exchanges, focus groups Telephone calls and voic 11/28/2018
5
Audio Browsing/Retrieval for Voicemail
Motivated by interviews, surveys and usage logs of heavy users: Hard to scan new msgs to find those you need to deal with quickly Hard to find msg you want in archive Hard to locate information you want in any msg How could we help? Increasing amounts of audio data available in corporate, public and private collections – but useless without tools for searching, browsing SCAN and SCANMail prototypes: tools for searching and browsing speech data in Broadcast News and voic domains Future applications: customer care, conference call and meeting browsing 11/28/2018
6
SCANMail Architecture
Caller SCANMail Architecture SCANMail Subscriber
7
Corpus Collection Recordings collected from 138 AT&T Labs employees’ mailboxes 100 hours; 10K msgs; 2500 speakers Gender balanced: 12% non-native speakers Mean message duration 36.4 secs, median 30.0 secs Hand-transcribed and annotated with caller id, gender, age, entity demarcation (names, dates, telnos) 11/28/2018
8
Transcription and Bracketing
[ Greeting: hi R ] [ CallerID: it's me ] give me a call [ um ] right away cos there's [ .hn ] I guess there's some [ .hn ] change [ Date: tomorrow ] with the nursery school and they [ um ] [ .hn ] anyway they had this idea [ cos ] since I think J's the only one staying [ Date: tomorrow ] for play club so they wanted to they suggested that [ .hn ] well J2 actually offered to take J home with her and then would she 11/28/2018
9
would meet you back at the synagogue at [ Time: five thirty ] to pick her up [ .hn ] [ uh ] so I don't know how you feel about that otherwise M_ and one other teacher would stay and take care of her till [ Date: five thirty tomorrow ] but if you [ .hn ] I wanted to know how you feel before I tell her one way or the other so call me [ .hn ] right away cos I have to get back to her in about an hour so [ .hn ] okay [ Closing: bye [ .nhn ] [ .onhk ] 11/28/2018
10
Audix password: (null)
SCANMail Demo Audix extension: 8380 Audix password: (null) 11/28/2018
11
Information Extraction from Speech
Jansche & Abney ‘02 11/28/2018
12
Speech Summarization: Extraction Techniques
Hori et al ‘02 Inoue et al ‘04 11/28/2018
13
Domain Specific Summarization (Barzilay et al ‘00)
Motivation: lab experiments show little facilitation of speech summarization by techniques that do improve search Domain: Broadcast News Idea: knowing what type of speaker (anchor, reporter, interviewee) is speaking provides structural clues that can “outline” the newscast since programs are predictable 11/28/2018
14
SCAN: Spoken Content-based Audio Navigator
TREC SDR corpus of Broadcast News Segment speech `documents’ into audio `paratones’ acoustically Segmentation module trained on hand-labeled discourse structure annotation in another domain Classify recording conditions, e.g. Music, telephone bandwidth, wide-band Run ASR with appropriate acoustic models (~70% wac) Index (errorful) transcripts using SMART IR 11/28/2018
15
Transcript prosodically formatted Overview provides abstract structure
Results in WYSIAWY (“What you see is almost what you hear”) GUI Transcript prosodically formatted Overview provides abstract structure 11/28/2018
16
SCAN db Acoustic Condition Classification Paratone Detector
Recognition SCAN db Broadcast News corpus Information Retrieval GUI 11/28/2018
17
Search Overview Transcript 11/28/2018
18
Patterns in Newscasts Anchors present headlines and introduce stories
Most frequent speakers Anchor/reporter turn alternation Reporter/guest turntaking during stories 11/28/2018
19
Data 35 broadcasts of “All Things Considered”
Human and ASR transcripts (without commercials but with turn boundaries) Features to predict speaker role Lexical: ngrams 1-5, explicit introductions (current and prior segment) Contextual: labels and features of prior turns Durational: turn length (absolute and relative to previous) 11/28/2018
20
Methods and Results Boosting and maximum entropy --> simple weighted rules to predict speaker role Baseline: guess anchor (35.4%) Result on human transcripts: BoostTexter 79% MaxEnt 80.5% Result on ASR transcripts: BoostTexter 72.8% MaxEnt 77% 11/28/2018
21
Speech Data Mining How does it differ from text data mining?
Maskey et al ‘04 11/28/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.