Download presentation
Presentation is loading. Please wait.
1
A Prototype Personal Dictation System Adam Janin janin@icsi.berkeley.edu
2
Final Goal – A Portable Meeting Recorder zRecord impromptu meetings in a natural environment. zDetect multiple speakers. zAllow correction and annotation. zSupport indexing and searching. zSelf-contained (using IRAM).
3
Intermediate Goal – A Personal Dictation System zRecord a single user dictating text. zAllow correction and editing. zHosted system: yASR runs on workstation. yGUI runs on Pilot. yCommunicate via wired network. yClose-talking mic. yLimited domain (Broadcast News).
4
Asides... zWhy not Wizard of Oz? yStructure of correction mechanism is recognizer specific. yDevelop infrastructure. yProduce a working demo. zInformal user study, mostly with speech researchers.
5
Architecture Palm Pilot Correct transcripts Edit transcripts Create new text Sun Workstation Audio frontend Speech recognizer Correction server
6
Correcting and Editing zCorrecting – informing the recognizer that it has made an error. yIf recognizer has a good idea of alternatives, it may be faster to correct than to edit. yRecognizer can adapt to user and vocabulary. zEditing – changing the output. y“That’s not what I meant to say”. yText vs. speech input.
7
Correction Methods: Background zLattice contains recognizer’s best guesses. zMore compact than N-best lists. zContains word order and timing. 1). the records … 2). a rack... 3). the wreck or … 4). a record...
8
Correction Methods: Selecting Hypotheses zUser corrects “records”. 1). the records … 2). a rack... 3). the wreck or … 4). a record... zSystem picks all words that overlap in time. zPresents in order from most likely to least. zNote: full overlap is probably not optimal.
9
Correction Methods: Rescoring zUser corrects “records” to “record”. 1). the records … 2). a rack... 3). the wreck or … 4). a record... Unexpected changes! zSelect only paths with “record”. zRescore lattice.
10
Editing zAllows user to add or edit text arbitrarily. zMust synchronize with correction server. zEdit vs. Correct is currently implemented modally with push buttons on-screen. zGestural interface for correcting and editing would be preferable.
11
Details... zCorrection allows for words not in lattice. zTap to correct worked better than press-and-hold. zSystem updates text when user pauses. zDoesn’t handle punctuation, paragraphs, etc. zCorrection is fast, but dictation is slow.
12
Future Work z“Real” user studies. zExperiment more with correction mechanisms. zImplement editing synchronization. zImplement gestures. zMove to wireless network and mic. zAdd punctuation, paragraphs, etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.