Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by:Dan.

Similar presentations


Presentation on theme: "Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by:Dan."— Presentation transcript:

1 Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by:Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris

2 Examples  RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH  Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project]  Sublime personalized information management system  TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

3 Examples  RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH  Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project]  Sublime personalized information management system  TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

4 Examples  RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH  Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project]  Sublime personalized information management system  TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

5 Examples  RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH  Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project]  Sublime personalized information management system  TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

6 More Systems  LARRI multimodal system that assists F/A-18 aircraft maintenance personnel throughout the execution of procedural tasks [Symphony]  Madeleine text-based prototype for medical diagnosis system [MITRE workshop]  Eureka dialogue interface to the Vivisimo web search engine

7 The Communicator / RavenClaw Spoken Dialogue Systems Framework  Examples  Overall Architecture  System Development  Components & Resources  Miscellaneous  Current Research examples : architecture : development : components : miscellaneous : research

8 Overall Architecture  Classical pipeline architecture Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA examples : architecture : development : components : miscellaneous : research

9 Galaxy HUB Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA HUB Recognition SPHINX Synthesis THETA Galaxy -Generic centralized, message- passing communication architecture -Developed at MIT, used in Communicator program -Competitor: OAA examples : architecture : development : components : miscellaneous : research

10 Getting Even Closer Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Language Gen. ROSETTA HUB Recognition SPHINX Synthesis THETA examples : architecture : development : components : miscellaneous : research

11 PROCESS MONITOR SPHINX Getting Even Closer Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA HUB Lang. Understand. PHOENIX/HELIOS Recognition Server Synthesis THETA Multiple, parallel decoders DateTime Other domain agents Back-end Galaxy Stub Actual Perl Back-end Lang. Generation ROSETTA (Perl) Lang. Generation Galaxy Stub Text I/O TTYServer Parsing PHOENIX Confidence HELIOS examples : architecture : development : components : miscellaneous : research Inputs from other modalities

12 The Communicator / RavenClaw Spoken Dialogue Systems Framework  Examples  Overall Architecture  System Development  Components & Resources  Miscellaneous examples : architecture : development : components : miscellaneous : research

13 Building a Spoken Dialogue System Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic, Lexical Models (Limited Domain) Voice examples : architecture : development : components : miscellaneous : research

14 Language, Acoustic, Lexical Models (Limited Domain) Voice So How Long Will It Take? Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) -MITRE Workshop on Dialogue Management (Fall 2003) -Develop a Text-based SDS for medical diagnosis (provided backend) -Madeleine (22 hours) examples : architecture : development : components : miscellaneous : research

15 Okay, How Long Will It Really Take?  To get a system running with a reasonable performance [poll amongst 3 RavenClaw developers]  1 month to get a working system up and running  1 month to fine-tune performance  Further iterative improvements will continue as more data accumulates examples : architecture : development : components : miscellaneous : research

16 The Communicator / RavenClaw Spoken Dialogue Systems Framework  Examples  Overall Architecture  System Development  Components & Resources  Miscellaneous examples : architecture : development : components : miscellaneous : research

17 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

18 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

19 SPHINX II  Semi-continuous acoustic models  Off-the-shelf 8kHz, 11.025kHz, 16kHz models  Scripts for building your own PLSA adapted models perform better  Language models  2-gram & 3-gram model CMU-Cambridge SLM Toolkit Generate from Phoenix Grammar  Finite state grammar  Sphinx supports state-specific LMs  Dictionary (lexical models)  CMU Dictionary examples : architecture : development : components : miscellaneous : research

20 Sphinx II - continued  Multiple parallel decoders [e.g., male + female]  Multiple hypothesis forwarded, selection done later  Typical WER: 15-30%  With pronounced differences native vs. non-native  Lowered by retuning acoustic and language models to the domain  Migration to SPHINX 3.x in the near future  Expected: big improvement in WER  Concern: real-time performance

21 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

22 Phoenix Parser / Grammar  Phoenix: Robust Parser  CFG Grammar  Manually-generated domain- specific grammar rules  Reusable, generic sub-grammars [Yes], [No], [Number], [DateTime], [Help], [Repeat], [Suspend], etc… [room_size_spec] ([rss_large]) ([rss_small]) ([rss_larger]) ([rss_smaller]) ([rss_smallest]) ([rss_largest]) ; [rss_large] (large) (big) (huge) ; [rss_larger] (*the larger) (*the bigger) (too small) ; [rss_largest] (*the largest) (*the biggest) ; [rss_small] (small) (little) ; examples : architecture : development : components : miscellaneous : research DO YOU HAVE SOMETHING A BIT LARGER? [NeedRoom] ( [_i_want] (DO YOU HAVE SOMETHING) ) [RoomSizeSpec] ( [room_size_spec] ( [rss_larger] (LARGER)))  Parses all incoming hypotheses and passes all parses along…

23 Helios / Confidence Annotation  Builds accurate confidence scores using features from 3 sources of knowledge:  Speech recognition  Language understanding  Dialogue management  Selects hypothesis with maximum confidence score  Research in progress on hypothesis- selection, and transferability across domains examples : architecture : development : components : miscellaneous : research

24 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

25 RavenClaw Architecture  Captures all domain-specific dialog (task) logic using a hierarchical description  The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine  Manages dialog by executing the dialog task specification  Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research

26 RavenClaw Architecture  Captures all domain-specific dialog (task) logic with a hierarchical description  The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine  Manages dialog by executing the dialog task specification  Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research

27 RavenClaw: Dialogue Task Specification Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome general_feeling have_fever diagnostic  Tree of dialog agents  Terminals: Inform, Request, Expect, Execute  Non-terminals / Dialog agency: plans execution of child nodes  Basically a Hierarchical Task Execution Network; each agent:  Preconditions & effects  Success & failure criteria  Trigger (focus) criteria  Effects examples : architecture : development : components : miscellaneous : research

28 Sample DTS Code // /Madeleine/GeneralFeel DEFINE_AGENCY(CGeneralFeel, DEFINE_CONCEPTS( STRING_USER_CONCEPT(general_feeling, none)) DEFINE_SUBAGENTS( SUBAGENT(HowAreYou, CHowAreYou) SUBAGENT(Glad, CGlad) SUBAGENT(Sorry, CSorry)) SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry))) // /Madeleine/GeneralFeel/HowAreYou DEFINE_REQUEST_AGENT(CHowAreYou, REQUEST_CONCEPT(general_feeling) GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, " "![FeelingSoSo]>soso, ![FeelingBad]>bad"))) // /Madeleine/GeneralFeel/Glad DEFINE_INFORM_AGENT(CGlad, PRECONDITION(C("general_feeling") == CString("good")) PROMPT("inform glad_youre_good") ON_COMPLETION(FINISH(/Madeleine))) // /Madeleine/GeneralFeel/Sorry DEFINE_INFORM_AGENT(CSorry, PRECONDITION(C("general_feeling") != CString("good")) PROMPT("inform sorry_youre_bad")) R:HowAreYou? general_feeling GeneralFeel I:GladI:Sorry examples : architecture : development : components : miscellaneous : research

29 RavenClaw Execution Dialog Stack Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research

30 RavenClaw Execution Dialog Stack Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research

31 RavenClaw Execution Dialog Stack Madeleine Welcome Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research

32 RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome Expectation Agenda general_feeling chart have_fever diagnostic examples : architecture : development : components : miscellaneous : research

33 RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome LoadSymptoms R:HeadacheR: Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research

34 RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research

35 RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research

36 RavenClaw Execution / Input Pass Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel How are you feeling today? general_feeling chart have_fever diagnostic HowAreYou Expectation Agenda general_feeling: [good], [bad], [soso] GeneralFeel I:GladI:Sorry Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) headache GeneralFeel examples : architecture : development : components : miscellaneous : research

37 RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever)

38 RavenClaw Execution Dialog Stack Madeleine Hi, this is Madeleine, the automated… Madeleine E:LoadSymptomsGeneralFeel R:HowAreYou?I:GladI:Sorry Diagnose FeverTravel R:AskFeverE:MeasureTempI:InformFever I:Welcome R:HeadacheR: GeneralFeel Expectation Agenda general_feeling chart have_fever diagnostic headache examples : architecture : development : components : miscellaneous : research How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) Sorry Oh, I’m sorry to hear that… Let me take your temperature…

39 RavenClaw – Other features  Dialogue Engine transparently provides a set of conversational skills  Universal dialogue mechanisms: Repeat, Suspend / Resume, Quit  Help: Help!, Where are we?, What can I say?  Error handling: Explicit and implicit confirmations Strategies for recovering from non-understandings  Dynamic dialogue task generation  Dynamic dialogue control policy

40 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

41 Backend & Domain Agents  Various problem-specific solutions  RoomLine Connects to a static Perl database or to the CMU CorporateTime server;  Let’s Go! Bus Information system Connects to a PostGRES database  Sublime Connects to a MySQL database; also functions as a web-server; DTW search domain agent  Basically, build your own; we provide a stub for interfacing with the Galaxy-Hub examples : architecture : development : components : miscellaneous : research

42 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

43 Rosetta Language Generation  Template- and stochastic-based language generation  Input: (act, object, {slot=value})  Output: text (tagged with concepts) # welcome to the system “welcome” => “Welcome to RoomLine, the automated conference room “. “reservation system.”, # greet user “greet_user” => (“Hi,.”, “Hi,, good to hear from you again.”), # inform the user that the system has misunderstood the times (order) “wrong_time_order” => sub { my %args = @_; my $time_interval_as_string = get_wrong_time_interval_as_string(\%args, “room_query.date_time.time”); my $answer = “I'm sorry, I must have misunderstood the “. “time you needed the room. “; $answer.= “I heard $time_interval_as_string. “; return [“$answer So, let's see... “, “$answer So, let's try this again... “, “$answer So, let's try this once more... “]; }, examples : architecture : development : components : miscellaneous : research

44 Components & Resources Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Lang. Generation ROSETTA Recognition SPHINX Synthesis THETA Grammar Templates RavenClaw Dialog Task Specification Back-end (perl) Language, Acoustic Models Limited Domain Voice examples : architecture : development : components : miscellaneous : research

45 Synthesis  Cepstral Theta synthesis  Open-domain unit-selection synthesis  SSML tags  [Currently working on barge-in location]  Festival synthesis  Diphone synthesis; Open-domain, Limited-domain unit-selection synthesis  SABLE tags  Server running separately on a Linux box examples : architecture : development : components : miscellaneous : research

46 The Communicator / RavenClaw Spoken Dialogue Systems Framework  Examples  Overall Architecture  System Development  Components & Resources  Miscellaneous  Current Research examples : architecture : development : components : miscellaneous : research

47 Miscellaneous – Documentation  Transmitted largely by oral tradition :)  A bit of documentation available  Research papers, slides  WIKI: http://hap.speech.cs.cmu.edu/commwiki mostly for developers, postings of updates, recent developments; hopefully more introductory materials soon.  More under work  Tutorials: 2 available, but a bit outdated examples : architecture : development : components : miscellaneous : research

48 Miscellaneous – Portability  Current systems work on PC Windows platforms  Galaxy has Linux version  Components are C, C++, (Visual Studio 6.0, Visual Studio.NET), Perl  How about using different input / output components?  Modify RavenClaw DMInterface class Has been done for the Gemini parser / language generator examples : architecture : development : components : miscellaneous : research

49 Miscellaneous – Research Platform  Communicator / RavenClaw framework is a research platform!  Constantly evolving  Modular Easy to change, develop and test new technologies  Research on variety of topics in a real-world, full- blown system: Recognition, Language understanding, Dialogue management, Language generation, Synthesis  Your work can be evaluated / reused easily across multiple existing systems examples : architecture : development : components : miscellaneous : research

50 Miscellaneous - Download  www.cs.cmu.edu/~dbohus/RavenClaw www.cs.cmu.edu/~dbohus/RavenClaw  Download a version of RoomLine  An installation script can seed your own project from this RoomLine version examples : architecture : development : components : miscellaneous : research

51 Miscellaneous – RavenClaw Team  RavenClaw Team  Dan Bohus(dbohus@cs)  Antoine Raux(antoine@cs)  Jahanzeb Sherwani(jsherwan@cs)  Thomas Harris(tkharris@cs)  Satanjeev Banerjee(satanjeev@cs)  Brian Langner(blangner@cs)  More users / developers / documentation writers are always welcome!!  Dialogs on Dialogs Reading Group  www.cs.cmu.edu/~dod www.cs.cmu.edu/~dod examples : architecture : development : components : miscellaneous : research

52 The Communicator / RavenClaw Spoken Dialogue Systems Framework  Examples  Overall Architecture  System Development  Components & Resources  Miscellaneous  Current Research examples : architecture : development : components : miscellaneous : research

53 Error awareness and recovery  Problem: lack of robustness when faced with understanding errors  Solution: build mechanisms for acting robustly at the dialogue management level  Error awareness Building better confidence annotators, hypothesis selection; transference across domains  Error recovery strategies Recovery from non-understandings  Error handling decision process Scalable, adaptable, task-independent architecture for making error handling decisions examples : architecture : development : components : miscellaneous : research

54 Let’s Go! Research  Speech Recognition: acoustic adaptation on non-native speech WER: 50%  30%  Speech Synthesis: flexible and natural F0 modeling (F0 unit selection) Emphasis on erroneous/uncertain words for utterance confirmation examples : architecture : development : components : miscellaneous : research

55 Sublime  Interface for personalized information management  Narrow functionality in unrestricted domains  Currently, handle information without understanding it  Eventually, learn relationships and a shallow ontology examples : architecture : development : components : miscellaneous : research

56 That’s all, folks! THANK YOU!


Download ppt "Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by:Dan."

Similar presentations


Ads by Google