Language Model Grammar Conversion

Slides:



Advertisements
Similar presentations
Regular Expression to NFA-  (a+ba) * a. First Parsing Step concatenate (a+ba) * a.
Advertisements

Normal forms for Context-Free Grammars
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Language Model Grammar Conversion Wesley Holland Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering.
Release Progress Report Daniel May Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering min XMLABNF.
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Language Model Grammar Conversion Wesley Holland, Julie Baca, Dhruva Duncan, Joseph Picone Center for Advanced Vehicular Systems Mississippi State University.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
1 Chapter 3 Regular Languages.  2 3.1: Regular Expressions (1)   Regular Expression (RE):   E is a regular expression over  if E is one of:
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Department of Software & Media Technology
Regular Expressions, Backus-Naur Form and Reverse Polish Notation
Logical Database Design and the Rational Model
Language Model Classes
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
lec02-parserCFG May 8, 2018 Syntax Analyzer
Chapter 1 Introduction.
Closed book, closed notes
Context-Free Grammars: an overview
CS 404 Introduction to Compiler Design
CS510 Compiler Lecture 4.
Lexical analysis Finite Automata
Chapter 3 – Describing Syntax
Concepts of Programming Languages
Syntax Specification and Analysis
Automata and Languages What do these have in common?
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter 1 Introduction.
Formal Language Theory
Syntax-based Testing CS 4501 / 6501 Software Testing
CS416 Compiler Design lec00-outline September 19, 2018
Presentation by Julie Betlach 7/02/2009
Language Recognition (12.4)
Department of Software & Media Technology
Introduction CI612 Compiler Design CI612 Compiler Design.
Programming Language Syntax 2
CHAPTER 2 Context-Free Languages
R.Rajkumar Asst.Professor CSE
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
CS 3304 Comparative Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS416 Compiler Design lec00-outline February 23, 2019
Lecture 5 Theory of AUTOMATA
4b Lexical analysis Finite Automata
Language Recognition (12.4)
Recap lecture 10 Definition of GTG, examples of GTG accepting the languages of strings:containing aa or bb, beginning with and ending in same letters,
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Overview of Language Model Classes and Release Progress
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
High-Level Programming Language
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Lecture 5 Scanning.
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow.
Presentation transcript:

Language Model Grammar Conversion XML ABNF IHD BNF BNF JSGF Wesley Holland, Julie Baca, Dhruva Duncan, Joseph Picone Center for Advanced Vehicular Systems Mississippi State University

Acoustic Model Language Model Speech Recognition Maps audio data to words or phonemes Language Model Specifies order in which a sequence of words or phonemes is likely to occur Described using grammar

Backus-Naur Form (BNF) Augmented BNF (ABNF) Grammar Specifications Backus-Naur Form (BNF) Augmented BNF (ABNF) JSpeech Grammar Format (JSGF) Speech Recognition Grammar Specification (SRGS) ISIP Hierarchical Digraph (IHD) BNF ABNF JSGF <A>::=aB <B>::=bB <B>::=ε <A>::=ab* <A>=a(b)*; XML-SRGS IHD a <item repeat=“0-”> b </item>

Goals Final Architecture Conversion Design JSGF ↔ IHD XML-SRGS ↔ IHD Determination of equivalence Grammar minimization Final Architecture XML The main goal of our conversion software is to allow our speech recognizer to process language models with JSGF and XML-SRGS format grammar specifications. It is also desirable to support this conversion in the reverse direction (i.e., IHD->JSGF and IHD->XML-SRGS). This serves a dual purpose. The first advantage is that this allows indirect conversion from JSGF to XML-SRGS and vice versa through IHD. The second is that this increases the flexibility and compatibility of our system by allowing different aspects of recognition to occur in different grammar specifications. With this capability, a language model can be trained in our system (in IHD), then recognition may be performed in a different system (in JSGF or XML). Two secondary goals of our conversion software are to provide capabilities for determination of grammar equivalence and grammar minimization. These goals require the ability to reduce each grammar format to a common elementary representation. Due to the availability of textbook algorithms for determination of equivalence and minimization, we chose normalized BNF as the common elementary format of our system. Although an initial attempt was made at converting JSGF and XML-SRGS directly to BNF, differences arose in the way regular expression structures were handled in the two conversion algorithms. To standardize this handling, it was decided that both JSGF and XML-SRGS would be converted to a common ABNF format before expansion of regular expression operators. Something worth noting is that BNF and ABNF, as academic specifications, have no mechanisms for weight specification. Our system uses external structures to maintain weight information during these stages of conversion. ABNF BNF IHD JSGF

JSGF → ABNF XML-SRGS → ABNF JSGF/XML-SRGS → ABNF Trivial Similar in syntax and structure to ABNF XML-SRGS → ABNF Harder than JSGF Different in syntax and structure from ABNF Requires enumeration of certain repeat attributes XML-SRGS ABNF <item repeat=‘1-2’> a b </item> <S>::=(ab)|(abab) <item repeat=‘2-’> a b </item> <S>::=abab(ab)*

XML-SRGS → ABNF (continued) JSGF/XML-SRGS → ABNF XML-SRGS → ABNF (continued) Different weighting mechanisms (weight and repeat-prob attributes) a <item repeat=“0-” repeat-prob=“.45”> b </item> <one-of> <item weight=“.4”>c</item> <item weight=“.6”>d</item> </one-of> The weight and repeat-prob attributes do not convey the same concept. Nonetheless, for conversion to an accepting finite state machine, both pieces of information must be taken into account.

Normalized BNF ABNF → BNF ABNF → BNF Consists of rules of the following formats: (RULE_NAME)::=(TERMINAL),(NON_TERMINAL) (RULE_NAME)::=(NON_TERMINAL) (RULE_NAME)::=ε ABNF Break rule into multiple rules at each top-level alternation. Recurse on each rule. For each concatenation, Kleene star, or Kleene plus, extract a set of left symbols and a set of right symbols. For n left symbols and m right symbols, create n x m connecting rules. ABNF → BNF Complicated Accomplished using a recursive algorithm that extracts sets of normalized BNF rules from a set of ABNF rules BNF

BNF ↔ IHD BNF ↔ IHD Each arc translates to a normalized BNF Terminals correspond to nodes; concatenations correspond to arcs BNF IHD RS→R0 R3→C,R3 RS→R1 R3→C,RT R0→A,R3 RT→ε R1→B,R3 Nodes 1: A 2: B 3: C Arcs (S,1) (2,3) (S,2) (3,3) (1,3) (3,T)

BNF → JSGF/XML-SRGS BNF → JSGF/XML-SRGS Rule-by-rule Trivial XML-SRGS <rule id=“a”> a <ruleref uri=“#b”/> </rule> <rule id=“b”> <one-of> <item> b </item> <ruleref special= “NULL”/> </one-of> BNF JSGF <A>::=aB <B>::=bB <B>::=ε <A>=aB; <B>=b*;

ISIP Network Converter Software Tools ISIP Network Converter Console tool to perform conversions to and from arbitrary grammar formats ISIP Network Builder Java-based graphical tool to design grammars as finite state machines Can exports grammars to JSGF, XML-SRGS, ABNF, BNF, and IHD ISIP Language Model Tester Console tool for testing of grammars Can generate valid sentences in a given grammar Can parse sentences and determine if accepted by a given grammar.

Minimization Minimization Happens in BNF Iterate over rule set, merging redundant rules Rules can be merged if the non terminal of both rules reference the same terminal Example: Conversion (especially from XML-SRGS) introduces redundancies.