Download presentation
Presentation is loading. Please wait.
Published byDominique Prime Modified over 10 years ago
1
Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by:Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris
2
Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
3
Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
4
Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
5
Examples RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
6
More Systems LARRI multimodal system that assists F/A-18 aircraft maintenance personnel throughout the execution of procedural tasks [Symphony] Madeleine text-based prototype for medical diagnosis system [MITRE workshop] Eureka dialogue interface to the Vivisimo web search engine
7
The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research
8
Overall Architecture Classical pipeline architecture Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA examples : architecture : development : components : miscellaneous : research
9
Galaxy HUB Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA HUB Recognition SPHINX Synthesis THETA Galaxy -Generic centralized, message- passing communication architecture -Developed at MIT, used in Communicator program -Competitor: OAA examples : architecture : development : components : miscellaneous : research
10
Getting Even Closer Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Language Gen. ROSETTA HUB Recognition SPHINX Synthesis THETA examples : architecture : development : components : miscellaneous : research
11
PROCESS MONITOR SPHINX Getting Even Closer Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA HUB Lang. Understand. PHOENIX/HELIOS Recognition Server Synthesis THETA Multiple, parallel decoders DateTime Other domain agents Back-end Galaxy Stub Actual Perl Back-end Lang. Generation ROSETTA (Perl) Lang. Generation Galaxy Stub Text I/O TTYServer Parsing PHOENIX Confidence HELIOS examples : architecture : development : components : miscellaneous : research Inputs from other modalities
12
The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous examples : architecture : development : components : miscellaneous : research
13
Building a Spoken Dialogue System Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic, Lexical Models (Limited Domain) Voice examples : architecture : development : components : miscellaneous : research
14
Language, Acoustic, Lexical Models (Limited Domain) Voice So How Long Will It Take? Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) -MITRE Workshop on Dialogue Management (Fall 2003) -Develop a Text-based SDS for medical diagnosis (provided backend) -Madeleine (22 hours) examples : architecture : development : components : miscellaneous : research
15
Okay, How Long Will It Really Take? To get a system running with a reasonable performance [poll amongst 3 RavenClaw developers] 1 month to get a working system up and running 1 month to fine-tune performance Further iterative improvements will continue as more data accumulates examples : architecture : development : components : miscellaneous : research
16
The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous examples : architecture : development : components : miscellaneous : research
17
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
18
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
19
SPHINX II Semi-continuous acoustic models Off-the-shelf 8kHz, 11.025kHz, 16kHz models Scripts for building your own PLSA adapted models perform better Language models 2-gram & 3-gram model CMU-Cambridge SLM Toolkit Generate from Phoenix Grammar Finite state grammar Sphinx supports state-specific LMs Dictionary (lexical models) CMU Dictionary examples : architecture : development : components : miscellaneous : research
20
Sphinx II - continued Multiple parallel decoders [e.g., male + female] Multiple hypothesis forwarded, selection done later Typical WER: 15-30% With pronounced differences native vs. non-native Lowered by retuning acoustic and language models to the domain Migration to SPHINX 3.x in the near future Expected: big improvement in WER Concern: real-time performance
21
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
22
Phoenix Parser / Grammar Phoenix: Robust Parser CFG Grammar Manually-generated domain- specific grammar rules Reusable, generic sub-grammars [Yes], [No], [Number], [DateTime], [Help], [Repeat], [Suspend], etc… [room_size_spec] ([rss_large]) ([rss_small]) ([rss_larger]) ([rss_smaller]) ([rss_smallest]) ([rss_largest]) ; [rss_large] (large) (big) (huge) ; [rss_larger] (*the larger) (*the bigger) (too small) ; [rss_largest] (*the largest) (*the biggest) ; [rss_small] (small) (little) ; examples : architecture : development : components : miscellaneous : research DO YOU HAVE SOMETHING A BIT LARGER? [NeedRoom] ( [_i_want] (DO YOU HAVE SOMETHING) ) [RoomSizeSpec] ( [room_size_spec] ( [rss_larger] (LARGER))) Parses all incoming hypotheses and passes all parses along…
23
Helios / Confidence Annotation Builds accurate confidence scores using features from 3 sources of knowledge: Speech recognition Language understanding Dialogue management Selects hypothesis with maximum confidence score Research in progress on hypothesis- selection, and transferability across domains examples : architecture : development : components : miscellaneous : research
24
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
25
RavenClaw Architecture Captures all domain-specific dialog (task) logic using a hierarchical description The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine Manages dialog by executing the dialog task specification Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research
26
RavenClaw Architecture Captures all domain-specific dialog (task) logic with a hierarchical description The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine Manages dialog by executing the dialog task specification Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research
27
RavenClaw: Dialogue Task Specification Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome general_feeling have_fever diagnostic Tree of dialog agents Terminals: Inform, Request, Expect, Execute Non-terminals / Dialog agency: plans execution of child nodes Basically a Hierarchical Task Execution Network; each agent: Preconditions & effects Success & failure criteria Trigger (focus) criteria Effects examples : architecture : development : components : miscellaneous : research
28
Sample DTS Code // /Madeleine/GeneralFeel DEFINE_AGENCY(CGeneralFeel, DEFINE_CONCEPTS( STRING_USER_CONCEPT(general_feeling, none)) DEFINE_SUBAGENTS( SUBAGENT(HowAreYou, CHowAreYou) SUBAGENT(Glad, CGlad) SUBAGENT(Sorry, CSorry)) SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry))) // /Madeleine/GeneralFeel/HowAreYou DEFINE_REQUEST_AGENT(CHowAreYou, REQUEST_CONCEPT(general_feeling) GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, " "![FeelingSoSo]>soso, ![FeelingBad]>bad"))) // /Madeleine/GeneralFeel/Glad DEFINE_INFORM_AGENT(CGlad, PRECONDITION(C("general_feeling") == CString("good")) PROMPT("inform glad_youre_good") ON_COMPLETION(FINISH(/Madeleine))) // /Madeleine/GeneralFeel/Sorry DEFINE_INFORM_AGENT(CSorry, PRECONDITION(C("general_feeling") != CString("good")) PROMPT("inform sorry_youre_bad")) R:HowAreYou? general_feeling GeneralFeel I:GladI:Sorry examples : architecture : development : components : miscellaneous : research
29
RavenClaw Execution Dialog Stack Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research
30
RavenClaw Execution Dialog Stack Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research
31
RavenClaw Execution Dialog Stack Madeleine Welcome Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research
32
RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research
33
RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome LoadSymptoms R:HeadacheR: Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research
34
RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research
35
RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research
36
RavenClaw Execution / Input Pass Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel How are you feeling today? general_feeling chart have_fever diagnostic HowAreYou Expectation Agenda general_feeling: [good], [bad], [soso] GeneralFeel I:GladI:Sorry Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) headache GeneralFeel examples : architecture : development : components : miscellaneous : research
37
RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever)
38
RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) Sorry Oh, I’m sorry to hear that… Let me take your temperature…
39
RavenClaw – Other features Dialogue Engine transparently provides a set of conversational skills Universal dialogue mechanisms: Repeat, Suspend / Resume, Quit Help: Help!, Where are we?, What can I say? Error handling: Explicit and implicit confirmations Strategies for recovering from non-understandings Dynamic dialogue task generation Dynamic dialogue control policy
40
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
41
Backend & Domain Agents Various problem-specific solutions RoomLine Connects to a static Perl database or to the CMU CorporateTime server; Let’s Go! Bus Information system Connects to a PostGRES database Sublime Connects to a MySQL database; also functions as a web-server; DTW search domain agent Basically, build your own; we provide a stub for interfacing with the Galaxy-Hub examples : architecture : development : components : miscellaneous : research
42
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
43
Rosetta Language Generation Template- and stochastic-based language generation Input: (act, object, {slot=value}) Output: text (tagged with concepts) # welcome to the system “welcome” => “Welcome to RoomLine, the automated conference room “. “reservation system.”, # greet user “greet_user” => (“Hi,.”, “Hi,, good to hear from you again.”), # inform the user that the system has misunderstood the times (order) “wrong_time_order” => sub { my %args = @_; my $time_interval_as_string = get_wrong_time_interval_as_string(\%args, “room_query.date_time.time”); my $answer = “I'm sorry, I must have misunderstood the “. “time you needed the room. “; $answer.= “I heard $time_interval_as_string. “; return [“$answer So, let's see... “, “$answer So, let's try this again... “, “$answer So, let's try this once more... “]; }, examples : architecture : development : components : miscellaneous : research
44
Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research
45
Synthesis Cepstral Theta synthesis Open-domain unit-selection synthesis SSML tags [Currently working on barge-in location] Festival synthesis Diphone synthesis; Open-domain, Limited-domain unit-selection synthesis SABLE tags Server running separately on a Linux box examples : architecture : development : components : miscellaneous : research
46
The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research
47
Miscellaneous – Documentation Transmitted largely by oral tradition :) A bit of documentation available Research papers, slides WIKI: http://hap.speech.cs.cmu.edu/commwiki mostly for developers, postings of updates, recent developments; hopefully more introductory materials soon. More under work Tutorials: 2 available, but a bit outdated examples : architecture : development : components : miscellaneous : research
48
Miscellaneous – Portability Current systems work on PC Windows platforms Galaxy has Linux version Components are C, C++, (Visual Studio 6.0, Visual Studio.NET), Perl How about using different input / output components? Modify RavenClaw DMInterface class Has been done for the Gemini parser / language generator examples : architecture : development : components : miscellaneous : research
49
Miscellaneous – Research Platform Communicator / RavenClaw framework is a research platform! Constantly evolving Modular Easy to change, develop and test new technologies Research on variety of topics in a real-world, full- blown system: Recognition, Language understanding, Dialogue management, Language generation, Synthesis Your work can be evaluated / reused easily across multiple existing systems examples : architecture : development : components : miscellaneous : research
50
Miscellaneous - Download www.cs.cmu.edu/~dbohus/RavenClaw www.cs.cmu.edu/~dbohus/RavenClaw Download a version of RoomLine An installation script can seed your own project from this RoomLine version examples : architecture : development : components : miscellaneous : research
51
Miscellaneous – RavenClaw Team RavenClaw Team Dan Bohus(dbohus@cs) Antoine Raux(antoine@cs) Jahanzeb Sherwani(jsherwan@cs) Thomas Harris(tkharris@cs) Satanjeev Banerjee(satanjeev@cs) Brian Langner(blangner@cs) More users / developers / documentation writers are always welcome!! Dialogs on Dialogs Reading Group www.cs.cmu.edu/~dod www.cs.cmu.edu/~dod examples : architecture : development : components : miscellaneous : research
52
The Communicator / RavenClaw Spoken Dialogue Systems Framework Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research
53
Error awareness and recovery Problem: lack of robustness when faced with understanding errors Solution: build mechanisms for acting robustly at the dialogue management level Error awareness Building better confidence annotators, hypothesis selection; transference across domains Error recovery strategies Recovery from non-understandings Error handling decision process Scalable, adaptable, task-independent architecture for making error handling decisions examples : architecture : development : components : miscellaneous : research
54
Let’s Go! Research Speech Recognition: acoustic adaptation on non-native speech WER: 50% 30% Speech Synthesis: flexible and natural F0 modeling (F0 unit selection) Emphasis on erroneous/uncertain words for utterance confirmation examples : architecture : development : components : miscellaneous : research
55
Sublime Interface for personalized information management Narrow functionality in unrestricted domains Currently, handle information without understanding it Eventually, learn relationships and a shallow ontology examples : architecture : development : components : miscellaneous : research
56
That’s all, folks! THANK YOU!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.