Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sphinx Lunch Talk Carnegie Mellon University, October 2004

Similar presentations


Presentation on theme: "Sphinx Lunch Talk Carnegie Mellon University, October 2004"— Presentation transcript:

1 Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework
Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by: Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris

2 Examples RoomLine Let’s Go! Bus Information System Sublime TeamTalk
conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

3 Examples RoomLine Let’s Go! Bus Information System Sublime TeamTalk
conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

4 Examples RoomLine Let’s Go! Bus Information System Sublime TeamTalk
conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

5 Examples RoomLine Let’s Go! Bus Information System Sublime TeamTalk
conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] Sublime personalized information management system TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

6 More Systems LARRI Madeleine Eureka
multimodal system that assists F/A-18 aircraft maintenance personnel throughout the execution of procedural tasks [Symphony] Madeleine text-based prototype for medical diagnosis system [MITRE workshop] Eureka dialogue interface to the Vivisimo web search engine

7 The Communicator / RavenClaw Spoken Dialogue Systems Framework
Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research

8 Overall Architecture Classical pipeline architecture
Recognition SPHINX Synthesis THETA Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research

9 Galaxy HUB Generic centralized, message- passing communication architecture Developed at MIT, used in Communicator program Competitor: OAA Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Galaxy HUB Dialog Manag. RAVENCLAW Back-end (various) Synthesis THETA Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research

10 Getting Even Closer HUB
Recognition SPHINX Lang. Understand. PHOENIX/HELIOS HUB Dialog Manag. RAVENCLAW Back-end (perl) Synthesis THETA Language Gen. ROSETTA examples : architecture : development : components : miscellaneous : research

11 Getting Even Closer HUB
PROCESS MONITOR Multiple, parallel decoders SPHINX SPHINX SPHINX Inputs from other modalities DateTime Other domain agents Recognition Server Parsing PHOENIX Confidence HELIOS Lang. Understand. PHOENIX/HELIOS Text I/O TTYServer HUB Dialog Manag. RAVENCLAW Back-end Galaxy Stub Actual Perl Back-end (perl) Synthesis THETA Lang. Generation ROSETTA (Perl) Galaxy Stub Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research

12 The Communicator / RavenClaw Spoken Dialogue Systems Framework
Examples Overall Architecture System Development Components & Resources Miscellaneous examples : architecture : development : components : miscellaneous : research

13 Building a Spoken Dialogue System
Language, Acoustic, Lexical Models Grammar Recognition SPHINX Synthesis THETA Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Lang. Generation ROSETTA (Limited Domain) Voice Templates examples : architecture : development : components : miscellaneous : research

14 RavenClaw Dialog Task Specification
So How Long Will It Take? MITRE Workshop on Dialogue Management (Fall 2003) Develop a Text-based SDS for medical diagnosis (provided backend) Madeleine (22 hours) Language, Acoustic, Lexical Models Grammar Recognition SPHINX Synthesis THETA Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Lang. Generation ROSETTA (Limited Domain) Voice Templates examples : architecture : development : components : miscellaneous : research

15 Okay, How Long Will It Really Take?
To get a system running with a reasonable performance [poll amongst 3 RavenClaw developers] 1 month to get a working system up and running 1 month to fine-tune performance Further iterative improvements will continue as more data accumulates examples : architecture : development : components : miscellaneous : research

16 The Communicator / RavenClaw Spoken Dialogue Systems Framework
Examples Overall Architecture System Development Components & Resources Miscellaneous examples : architecture : development : components : miscellaneous : research

17 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Synthesis THETA Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

18 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

19 SPHINX II Semi-continuous acoustic models Language models
Off-the-shelf 8kHz, kHz, 16kHz models Scripts for building your own PLSA adapted models perform better Language models 2-gram & 3-gram model CMU-Cambridge SLM Toolkit Generate from Phoenix Grammar Finite state grammar Sphinx supports state-specific LMs Dictionary (lexical models) CMU Dictionary examples : architecture : development : components : miscellaneous : research

20 Sphinx II - continued Multiple parallel decoders [e.g., male + female]
Multiple hypothesis forwarded, selection done later Typical WER: 15-30% With pronounced differences native vs. non-native Lowered by retuning acoustic and language models to the domain Migration to SPHINX 3.x in the near future Expected: big improvement in WER Concern: real-time performance

21 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Synthesis THETA Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

22 Phoenix Parser / Grammar
[room_size_spec] ([rss_large]) ([rss_small]) ([rss_larger]) ([rss_smaller]) ([rss_smallest]) ([rss_largest]) ; [rss_large] (large) (big) (huge) [rss_larger] (*the larger) (*the bigger) (too small) [rss_largest] (*the largest) (*the biggest) [rss_small] (small) (little) Phoenix: Robust Parser CFG Grammar Manually-generated domain-specific grammar rules Reusable, generic sub-grammars [Yes], [No], [Number], [DateTime], [Help], [Repeat], [Suspend], etc… DO YOU HAVE SOMETHING A BIT LARGER? [NeedRoom] ( [_i_want] (DO YOU HAVE SOMETHING) ) [RoomSizeSpec] ( [room_size_spec] ( [rss_larger] (LARGER))) Parses all incoming hypotheses and passes all parses along… examples : architecture : development : components : miscellaneous : research

23 Helios / Confidence Annotation
Builds accurate confidence scores using features from 3 sources of knowledge: Speech recognition Language understanding Dialogue management Selects hypothesis with maximum confidence score Research in progress on hypothesis-selection, and transferability across domains examples : architecture : development : components : miscellaneous : research

24 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Synthesis THETA Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

25 RavenClaw Architecture
Captures all domain-specific dialog (task) logic using a hierarchical description The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine Manages dialog by executing the dialog task specification Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research

26 RavenClaw Architecture
Captures all domain-specific dialog (task) logic with a hierarchical description The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine Manages dialog by executing the dialog task specification Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research

27 RavenClaw: Dialogue Task Specification
diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Tree of dialog agents Terminals: Inform, Request, Expect, Execute Non-terminals / Dialog agency: plans execution of child nodes Basically a Hierarchical Task Execution Network; each agent: Preconditions & effects Success & failure criteria Trigger (focus) criteria Effects examples : architecture : development : components : miscellaneous : research

28 general_feeling Sample DTS Code GeneralFeel R:HowAreYou? I:Glad I:Sorry // /Madeleine/GeneralFeel DEFINE_AGENCY(CGeneralFeel, DEFINE_CONCEPTS( STRING_USER_CONCEPT(general_feeling, none)) DEFINE_SUBAGENTS( SUBAGENT(HowAreYou, CHowAreYou) SUBAGENT(Glad, CGlad) SUBAGENT(Sorry, CSorry)) SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry))) // /Madeleine/GeneralFeel/HowAreYou DEFINE_REQUEST_AGENT(CHowAreYou, REQUEST_CONCEPT(general_feeling) GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, " "![FeelingSoSo]>soso, ![FeelingBad]>bad"))) // /Madeleine/GeneralFeel/Glad DEFINE_INFORM_AGENT(CGlad, PRECONDITION(C("general_feeling") == CString("good")) PROMPT("inform glad_youre_good") ON_COMPLETION(FINISH(/Madeleine))) // /Madeleine/GeneralFeel/Sorry DEFINE_INFORM_AGENT(CSorry, PRECONDITION(C("general_feeling") != CString("good")) PROMPT("inform sorry_youre_bad")) examples : architecture : development : components : miscellaneous : research

29 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda examples : architecture : development : components : miscellaneous : research

30 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Madeleine examples : architecture : development : components : miscellaneous : research

31 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Welcome Madeleine examples : architecture : development : components : miscellaneous : research

32 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel general_feeling R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research

33 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: general_feeling headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… LoadSymptoms Madeleine examples : architecture : development : components : miscellaneous : research

34 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: general_feeling headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research

35 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: general_feeling headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research

36 RavenClaw Execution / Input Pass
chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel GeneralFeel Diagnose R:HowAreYou? I:Glad I:Glad I:Sorry I:Sorry Fever Travel R:Headache R: R: R: general_feeling headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… general_feeling: [good], [bad], [soso] How are you feeling today? general_feeling: [good], [bad], [soso] Not so good, I think I have a fever general_feeling: [good], [bad], [soso] have_fever: [fever]. ![yes], ![no] headache: [headache], ![yes], ![no] cough: [cough], ![yes], ![no] … [soso](not so good) [fever](I think I have a fever) HowAreYou GeneralFeel GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research

37 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: general_feeling headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research

38 RavenClaw Execution chart diagnostic Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: general_feeling headache R:AskFever E:MeasureTemp I:InformFever have_fever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good) [fever](I think I have a fever) Sorry GeneralFeel Oh, I’m sorry to hear that… Let me take your temperature… Madeleine examples : architecture : development : components : miscellaneous : research

39 RavenClaw – Other features
Dialogue Engine transparently provides a set of conversational skills Universal dialogue mechanisms: Repeat, Suspend / Resume, Quit Help: Help!, Where are we?, What can I say? Error handling: Explicit and implicit confirmations Strategies for recovering from non-understandings Dynamic dialogue task generation Dynamic dialogue control policy

40 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

41 Backend & Domain Agents
Various problem-specific solutions RoomLine Connects to a static Perl database or to the CMU CorporateTime server; Let’s Go! Bus Information system Connects to a PostGRES database Sublime Connects to a MySQL database; also functions as a web-server; DTW search domain agent Basically, build your own; we provide a stub for interfacing with the Galaxy-Hub examples : architecture : development : components : miscellaneous : research

42 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

43 Rosetta Language Generation
Template- and stochastic-based language generation Input: (act, object, {slot=value}) Output: text (tagged with concepts) # welcome to the system “welcome” => “Welcome to RoomLine, the automated conference room “. “reservation system.”, # greet user “greet_user” => (“Hi, <user_name>.”, “Hi, <user_name>, good to hear from you again.”), # inform the user that the system has misunderstood the times (order) “wrong_time_order” => sub { my %args my $time_interval_as_string = get_wrong_time_interval_as_string(\%args, “room_query.date_time.time”); my $answer = “I'm sorry, I must have misunderstood the “. “time you needed the room. “; $answer .= “I heard $time_interval_as_string. “; return [“$answer So, let's see ... “, “$answer So, let's try this again ... “, “$answer So, let's try this once more ... “]; }, examples : architecture : development : components : miscellaneous : research

44 Components & Resources
Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClaw Dialog Task Specification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

45 Synthesis Cepstral Theta synthesis Festival synthesis
Open-domain unit-selection synthesis SSML tags [Currently working on barge-in location] Festival synthesis Diphone synthesis; Open-domain, Limited-domain unit-selection synthesis SABLE tags Server running separately on a Linux box examples : architecture : development : components : miscellaneous : research

46 The Communicator / RavenClaw Spoken Dialogue Systems Framework
Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research

47 Miscellaneous – Documentation
Transmitted largely by oral tradition :) A bit of documentation available Research papers, slides WIKI: mostly for developers, postings of updates, recent developments; hopefully more introductory materials soon. More under work Tutorials: 2 available, but a bit outdated examples : architecture : development : components : miscellaneous : research

48 Miscellaneous – Portability
Current systems work on PC Windows platforms Galaxy has Linux version Components are C, C++, (Visual Studio 6.0, Visual Studio.NET), Perl How about using different input / output components? Modify RavenClaw DMInterface class Has been done for the Gemini parser / language generator examples : architecture : development : components : miscellaneous : research

49 Miscellaneous – Research Platform
Communicator / RavenClaw framework is a research platform! Constantly evolving Modular Easy to change, develop and test new technologies Research on variety of topics in a real-world, full-blown system: Recognition, Language understanding, Dialogue management, Language generation, Synthesis Your work can be evaluated / reused easily across multiple existing systems examples : architecture : development : components : miscellaneous : research

50 Miscellaneous - Download
Download a version of RoomLine An installation script can seed your own project from this RoomLine version examples : architecture : development : components : miscellaneous : research

51 Miscellaneous – RavenClaw Team
Dan Bohus Antoine Raux Jahanzeb Sherwani Thomas Harris Satanjeev Banerjee Brian Langner More users / developers / documentation writers are always welcome!! Dialogs on Dialogs Reading Group examples : architecture : development : components : miscellaneous : research

52 The Communicator / RavenClaw Spoken Dialogue Systems Framework
Examples Overall Architecture System Development Components & Resources Miscellaneous Current Research examples : architecture : development : components : miscellaneous : research

53 Error awareness and recovery
Problem: lack of robustness when faced with understanding errors Solution: build mechanisms for acting robustly at the dialogue management level Error awareness Building better confidence annotators, hypothesis selection; transference across domains Error recovery strategies Recovery from non-understandings Error handling decision process Scalable, adaptable, task-independent architecture for making error handling decisions examples : architecture : development : components : miscellaneous : research

54 Let’s Go! Research Speech Recognition: acoustic adaptation on non-native speech WER: 50%  30% Speech Synthesis: flexible and natural F0 modeling (F0 unit selection) Emphasis on erroneous/uncertain words for utterance confirmation examples : architecture : development : components : miscellaneous : research

55 Sublime Interface for personalized information management
Narrow functionality in unrestricted domains Currently, handle information without understanding it Eventually, learn relationships and a shallow ontology examples : architecture : development : components : miscellaneous : research

56 That’s all, folks! THANK YOU!


Download ppt "Sphinx Lunch Talk Carnegie Mellon University, October 2004"

Similar presentations


Ads by Google