Speech recognition grammars as TRINDIKIT resources

Slides:



Advertisements
Similar presentations
Goteborg University Dialogue Systems Lab Motivation for using GF with GoDiS TALK meeting Edinburgh 7/
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CS 330 Programming Languages 09 / 19 / 2006 Instructor: Michael Eckmann.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Goteborg University Dialogue Systems Lab Using TrindiKit and GoDiS as OAA resources TALK Edinburgh 7/
ISBN Chapter 3 Describing Syntax and Semantics.
Integrating Nuance and Trindikit David Hjelm
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
ITEC 380 Organization of programming languages Lecture 2 – Grammar / Language capabilities.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
1 Computational Linguistics Ling 200 Spring 2006.
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read.
Intro to Lexing & Parsing CS 153. Two pieces conceptually: – Recognizing syntactically valid phrases. – Extracting semantic content from the syntax. E.g.,
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Some Probability Theory and Computational models A short overview.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
The Generic Gaming Engine Andrew Burke Advisor: Prof. Aaron Cass Abstract Games have long been a source of fascination. Their inherent complexity has challenged.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
C H A P T E R TWO Syntax and Semantic.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
CPS 506 Comparative Programming Languages Syntax Specification.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Copyright © 2006 Addison-Wesley. All rights reserved. Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has.
Server-side Programming The combination of –HTML –JavaScript –DOM is sometimes referred to as Dynamic HTML (DHTML) Web pages that include scripting are.
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
ISBN Chapter 3 Describing Syntax and Semantics.
Syntax and Grammars.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
CSCI-383 Object-Oriented Programming & Design Lecture 25.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
(Really) Basic Computer Science James A. Foster U. Idaho, IBEST.
Chapter 3 – Describing Syntax
Specifying, Compiling, and Testing Grammars
Retrieval of audio testimonials via voice search
David Cyphert CS 2310 – Software Engineering
Principles of Programming Languages
Chapter 10: Compilers and Language Translation
COMPILER CONSTRUCTION
Presentation transcript:

Speech recognition grammars as TRINDIKIT resources David Hjelm 2003-12-12

TRINDIKIT Framework for building dialogue systems Written in SICStus Prolog Contains predefined modules for input, output, interpretation, etc… Total Information State (TIS) holds information accessible by modules As long as different modules behave similar with respect to TIS they are interchangeable

Nuance Speech recognition, voice authentication and text-to-speech engines API:s to create speech-recognition/text-to-speech clients in Java, C++ and C Clients can read and write audio in several ways: native sound card telephony card IP-telephony from audio files

Speech recognition basics feature extraction Acoustic model (N-gram) acoustic features viterbi search Language model (N-gram or PCFG) phoneme or word lattice word lattice or n-best list of sentences viterbi search or parsing

Nuance SR models Acoustic models (master packages) Language models One or several for each language + some multilingual. Language models written using Nuance’s Grammar Specification Language (GSL). PCFG, but SLM:s can actually be used as categories – SLM:s trained from corpus data separately compiled using a specific master package into a recognition package (acoustic + language model)

Nuance GSL EBNF variant augmented with Must not be left-recursive optional probabilities optional rudimentary slot-filling semantics a lot of other special stuff like e.g. SLM inclusion external grammar references external rule references special words for e.g. pauses and telephony touch-tones Must not be left-recursive

Example Nuance grammars Without probabilities or semantics a grammar can look like this: .Top [ Cmd Q ] Cmd ( [ stop play pause ] ?it) Q ( is [ (the vcr) it ] [stopped playing paused] ) Start symbol(s) are preceded by ’.’ Nonterminals are uppercase Terminals are lowercase

More example Nuance grammars Probabilistic grammar: .Top [ Cmd~0.6 Q~0.4 ] Cmd ( [ stop~0.2 play~0.4 pause~0.3 ] ?it~0.3) Q ( is [ (the vcr)~0.3 it~0.7 ] [stopped playing paused] ) Slot-filling grammar: .Top [ Cmd {<cmd $return>} Q {<q $return>} ] Cmd ( [ stop {return(stop)} play {return(play)} pause {return(pause)}] ?it) Q ( is [ (the vcr) it ] [ stopped {return(stop)} playing {return(play)} paused {return(pause)} ] ) Of course they can be combined…

Static or dynamic grammar compilation Nuance’s recognize function takes one argument, which is either of the following: a start symbol in the current statically compiled recognition package. In this case recognition is performed using the grammar specified. a GSL expression. In this case the GSL expression is dynamically compiled on the fly. The GSL expression can not contain recursive rules, but it can point to a precompiled ’grammar object’ which does.

Current TRINDIKIT – Nuance interface TRINDIKIT modules exist for Nuance speech input and Nuance speech output. OAA is used for the communication between TRINDIKIT (prolog) and Nuance client (java). Each OAA agent connects to a facilitator and declares a set of capabilities. Agents can then pose queries to the facilitator which delegates the each query to the appropriate agent(s) and returns an answer to the requesting agent.

Current TRINDIKIT – Nuance interface OAA facilitator IP telephony telephony card TRINDIKIT OAA gateway Nuance java client native sound card ASR server TTS server

Current TRINDIKIT – Nuance interface Nuance java client provides (partial) access to Nuance java API via OAA loads recognition package at startup performs SR using one of its top level grammars TRINDIKIT input module checks name of dummy resource $asr_grammar for name of top level grammar calls OAA solvable nscPlayAndRecognize(+Grammar,?Result) Major disadvantages: Recognition package must be compiled before using system and specified when running java application Actual ASR grammar is not a part of TRINDIKIT – can not be modified or checked for coverage by modules

Upcoming TRINDIKIT – Nuance interface Nuance java client provides (partial) OAA access to Nuance java API loads empty recognition package at startup can compile GSL into a Nuance Grammar Object (NGO) via OAA performs SR using a GSL expression which points at a NGO TRINDIKIT input module checks resource $sr_grammar for actual speech recognition grammar makes sure $sr_grammar is compiled into a NGO at start-up calls OAA solvable nscPlayAndRecognize(+GSL,?Result) where GSL = ’<file:/path/to/ngo>’

Upcoming TRINDIKIT – Nuance interface OAA facilitator IP telephony telephony card TRINDIKIT OAA gateway Nuance java client native sound card Compilation server ASR server TTS server

Different ways for implementing sr_grammar resource Keep the GSL expression making up the Nuance grammar as a prolog string or atom Easy for Nuance input module Really hard for other modules trying to reason about the SR grammar Define the EBNF rules as prolog terms Quite easy for Nuance input module (convert EBNF to GSL) Enables reasoning about rules and categories by other modules Hard to find a working EBNF prolog notation.

Different ways for implementing sr_grammar resource Define grammar as a set of context free grammar rules (Chosen method) Some computation by Nuance input module (needs to convert (CFG to BNF to GSL) Enables reasoning about rules and categories by other modules Enables efficient parsing (if needed) Easy to find a prolog notation Portable – same grammar can be ported to many different speech recognizer grammar formats, as long as they are CFG-equivalent.

CFG resource definition resource relations: start_symbol(S) where S is a nonterminal rule(LHS,RHS) where LHS is a nonterminal and RHS is a list of nonterminals/terminals rules(Rules) where Rules is the set of rules in the resource resource operations (not yet implemented): add_rule(rule(LHS,RHS)) delete_rule(rule(LHS,RHS)) add_rules(Rules) delete_rules(Rules)

CFG rule format Example rules: rule( nonterminal(np), [ nonterminal(det), nonterminal(n) ] ). rule( nonterminal(det), [ terminal(”a”) ] ). rule( nonterminal(n), [ terminal(”car”) ] ). Convenient when reasoning about rules in grammar but not very convenient when writing grammars… Solution: write rules in EBNF-ish notation using operators. convert EBNF-ish rules to CFG rules.

’blockworld’ - example CFG resource ebnf2cfg:assert_rules/0 converts EBNF rules to CFG rules and asserts them :- module( blockworld , [rules/1,rule/2,start_symbol/1] ). :- ensure_loaded( ebnf2cfg ). top( np ). np => det, adj* , n, loc? . adj => colour | size. colour => "blue" | "red" | "green". size => "big" | "small". det => "a". n => "sphere" | "cube" | "pyramid". loc => prep , np. prep => "in" | "on" | "under" | "above". :- assert_rules.

Using CFG resource with Nuance input module input:init:- check_condition( $sr_grammar::start_symbol(Start) ), check_condition( $sr_grammar::rules(set(Rules)) ), cfg2gsl(dynamic,Start,Rules,GSL), oaag:solve(nscCurrentMasterPackage(Package), ( oaag:solve(nscGslCompiledToNGO(GSL,Package,Path) -> true; oaag:solve(nscCompileGslToNGO(Gsl,Package,Path) ),!. input:input:- oaag:solve(nscGslCompiledToNGO(GSL,Package,Path), join_atoms([’<file:/’,GSL,’>’],NGOGSL), recognize_score(NGOGSL,String,Score), apply_update( set( input, String ) ), apply_update( score := Score ).

What must be done before CFG resource can be used with Nuance? Write actual code of input module (some parts are missing) Implement nscGetMasterPackage(?Pkg) solvable Make sure that all nonterminals are upper-case and all terminals are lower-case in GSL Write real CFG resource (use existing Nuance grammar) testing, testing and testing…

What should be done? Documentation of java and prolog code Trindikit manual Eliminate left-recursion Convert to Chomsky Normal Form (?) Parser/generator for testing CFGs inside of prolog Multilingual nuance input module batch scripts for running with ease Asynchronous input algorithm

What can be done? PCFG resource SLM resource if EBNF format is used, how calculate weights when converting to PCFG? (this has been solved in Nuance though – but is it a proper solution) SLM resource would probably not store entire model in memory Nuance semantics + CFG/PCFG can GoDiS semantics be expressed? Convert typed unification grammars to CFG resources DCG with typed features (regulus), SKVATT(?), HPSG Grammatical Framework CFG approximation e.g. by limiting sentence length or letting grammar overgenerate problem: any interesting grammar will overgenerate a lot

What can be done? Write modules for Java Speech API, ViaVoice, etc. using the same CFG resource… Use several recognition grammars in sequence (one after the other on the same input) Dynamically generate recognition grammar based on IS contents and or system expectations Letting the system learn new words - ”How do you spell that?”