DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi State University Co-Authors: Julie Baca, Feng Zheng, Hualin Gao Center for Advanced Vehicular Systems Mississippi State University Mississippi State, Mississippi URL: EUROSPEECH
In-vehicle dialog systems improve information access. Advanced user interfaces enhance workforce training and increase manufacturing efficiency. Noise robustness in both environments to improve recognition performance Advanced statistical models and machine learning technology Multidisciplinary team (IE, ECE, CS). INTRODUCTION IN-VEHICLE DIALOG SYSTEMS
DIALOG SYSTEM ARCHITECTURE SYSTEM ARCHITECTURE DARPA COMMUNICATOR FRAMEWORK
…. Uses publicly available ISIP speech recognition toolkit. Implements standard HMM- based speaker independent continuous speech recognition system. Complete toolkits available for many popular tasks including conversational speech. On-line educational materials Extensive documentation SYSTEM ARCHITECTURE PUBLIC DOMAIN ASR
Transduction: Andrea NC-65 head-mounted Feature extraction: standard 39-element MFCCs Acoustic modeling: 8-mixture Gaussian HMMs Lexicon: 7,100 words (5K WSJ, 2K names) Language modeling: Interpolated Bigram (ppl: ~70) Search: Hierarchical Viterbi Beam SYSTEM ARCHITECTURE ASR SYSTEM COMPONENTS
Uses Phoenix semantic case frame parser from Colorado Univ. (CU). Employs semantic grammar consisting of case frames with named slots. FRAME: Drive [route] [distance] [route] (*IWANT [go_verb][arrive_loc]) IWANT (I want *to)(I would *like *to) (I will) (I need *to) [go_verb] (go)(drive)(get)(reach) [arriveloc] [*to [placename][cityname]] SYSTEM ARCHITECTURE NATURAL LANGUAGE UNDERSTANDING
“I want to drive from Columbus Mississippi to New York.” SYSTEM ARCHITECTURE NATURAL LANGUAGE UNDERSTANDING
SYSTEM ARCHITECTURE Accepts ungrammatical input, “I want… I need to drive to the campus post office.” Current version of the semantic grammar contains over 500 rules and 2000 words. Developed from pilot test corpus of sentence patterns. Route IWANTgo_verbarrive_loc “I need to” “drive” placenamecityname “post office” “campus” NLU MODULE
Controls interaction between user and system. Accepts parsed input from NLU module. Determines data requested, obtains data and controls presentation to user. SYSTEM ARCHITECTURE DIALOG MANAGER User:“How can I get to campus?” System:“Are you going to a specific location on campus?” User:“Where is engineering?” System:“What department?”
Derived from CU toolkit. Bulk of development lies in construction of domain-specific frames, rules, and slots. Example frames and associated queries: Drive_Direction:“How can I get from Lee Boulevard to Kroger? Drive_Address:“Where is the campus bakery?” Drive_Distance:“How far is China Garden?” Drive_Quality:“Find me the most scenic route to Scott Field.” Drive_Turn:“I am on Nash Street. What’s my next turn?” SYSTEM ARCHITECTURE DIALOG MANAGER
Geographic Information System (GIS) contains map routing data for MSU and surrounding area. Dialog manager (DM) first determines the nature of query, then: obtains route data from the GIS database handles presentation of the data to the user APPLICATION DEVELOPMENT GIS BACKEND
Obtained domain-specific data by: 1.Initial data gathering and system testing 2.Retesting after enhancing LM and semantic grammar Initial efforts focused on reducing OOV utterances and parsing errors for NLU module. APPLICATION DEVELOPMENT PILOT SYSTEM
Refinements to NLU System: Overall System Enhancements : Vers TestPrePostPrePostPrePost OOV25%0% 36% 0%4%0% Parser80%3%60%5%46%11% Test No. NLU Parser Error Rate DM Error Rate 143%49% 26%3% APPLICATION DEVELOPMENT RESULTS
Users participate in multiple scenarios in which they query for information (e.g., hotel and meeting locations). Tasks vary in scenarios according to role user plays: First-time visitors New residents Long-time residents SUMMARY AND CONCLUSIONS WIZARD OF OZ DATA
SUMMARY AND CONCLUSIONS FURTHER DEVELOPMENT Established a preliminary dialog system for future data collection and research Demonstrated significant domain-specific improvements for in-vehicle dialog systems. Created a testbed for future studies of workforce training applications. Extended the ISIP public domain toolkit and released relevant resources into the public domain.
SUMMARY RELEVANT RESOURCES CAVS Dialog System: review our experimental results and download the in-vehicle prototype architecture and associated components. Natural Language and Dialog Management Toolkits (CU): explore tools to build NLU and DM components for a specific domain. Speech Recognition Toolkit (ISIP): examine a state of the art public domain ASR toolkit for integration in a dialog system.