Download presentation
Presentation is loading. Please wait.
1
Language Model Grammar Conversion
XML ABNF IHD BNF BNF JSGF Wesley Holland, Julie Baca, Dhruva Duncan, Joseph Picone Center for Advanced Vehicular Systems Mississippi State University
2
Acoustic Model Language Model Speech Recognition
Maps audio data to words or phonemes Language Model Specifies order in which a sequence of words or phonemes is likely to occur Described using grammar
3
Backus-Naur Form (BNF) Augmented BNF (ABNF)
Grammar Specifications Backus-Naur Form (BNF) Augmented BNF (ABNF) JSpeech Grammar Format (JSGF) Speech Recognition Grammar Specification (SRGS) ISIP Hierarchical Digraph (IHD) BNF ABNF JSGF <A>::=aB <B>::=bB <B>::=ε <A>::=ab* <A>=a(b)*; XML-SRGS IHD a <item repeat=“0-”> b </item>
4
Goals Final Architecture Conversion Design JSGF ↔ IHD XML-SRGS ↔ IHD
Determination of equivalence Grammar minimization Final Architecture XML The main goal of our conversion software is to allow our speech recognizer to process language models with JSGF and XML-SRGS format grammar specifications. It is also desirable to support this conversion in the reverse direction (i.e., IHD->JSGF and IHD->XML-SRGS). This serves a dual purpose. The first advantage is that this allows indirect conversion from JSGF to XML-SRGS and vice versa through IHD. The second is that this increases the flexibility and compatibility of our system by allowing different aspects of recognition to occur in different grammar specifications. With this capability, a language model can be trained in our system (in IHD), then recognition may be performed in a different system (in JSGF or XML). Two secondary goals of our conversion software are to provide capabilities for determination of grammar equivalence and grammar minimization. These goals require the ability to reduce each grammar format to a common elementary representation. Due to the availability of textbook algorithms for determination of equivalence and minimization, we chose normalized BNF as the common elementary format of our system. Although an initial attempt was made at converting JSGF and XML-SRGS directly to BNF, differences arose in the way regular expression structures were handled in the two conversion algorithms. To standardize this handling, it was decided that both JSGF and XML-SRGS would be converted to a common ABNF format before expansion of regular expression operators. Something worth noting is that BNF and ABNF, as academic specifications, have no mechanisms for weight specification. Our system uses external structures to maintain weight information during these stages of conversion. ABNF BNF IHD JSGF
5
JSGF → ABNF XML-SRGS → ABNF JSGF/XML-SRGS → ABNF Trivial
Similar in syntax and structure to ABNF XML-SRGS → ABNF Harder than JSGF Different in syntax and structure from ABNF Requires enumeration of certain repeat attributes XML-SRGS ABNF <item repeat=‘1-2’> a b </item> <S>::=(ab)|(abab) <item repeat=‘2-’> a b </item> <S>::=abab(ab)*
6
XML-SRGS → ABNF (continued)
JSGF/XML-SRGS → ABNF XML-SRGS → ABNF (continued) Different weighting mechanisms (weight and repeat-prob attributes) a <item repeat=“0-” repeat-prob=“.45”> b </item> <one-of> <item weight=“.4”>c</item> <item weight=“.6”>d</item> </one-of> The weight and repeat-prob attributes do not convey the same concept. Nonetheless, for conversion to an accepting finite state machine, both pieces of information must be taken into account.
7
Normalized BNF ABNF → BNF ABNF → BNF
Consists of rules of the following formats: (RULE_NAME)::=(TERMINAL),(NON_TERMINAL) (RULE_NAME)::=(NON_TERMINAL) (RULE_NAME)::=ε ABNF Break rule into multiple rules at each top-level alternation. Recurse on each rule. For each concatenation, Kleene star, or Kleene plus, extract a set of left symbols and a set of right symbols. For n left symbols and m right symbols, create n x m connecting rules. ABNF → BNF Complicated Accomplished using a recursive algorithm that extracts sets of normalized BNF rules from a set of ABNF rules BNF
8
BNF ↔ IHD BNF ↔ IHD Each arc translates to a normalized BNF
Terminals correspond to nodes; concatenations correspond to arcs BNF IHD RS→R0 R3→C,R3 RS→R1 R3→C,RT R0→A,R3 RT→ε R1→B,R3 Nodes 1: A 2: B 3: C Arcs (S,1) (2,3) (S,2) (3,3) (1,3) (3,T)
9
BNF → JSGF/XML-SRGS BNF → JSGF/XML-SRGS Rule-by-rule Trivial XML-SRGS
<rule id=“a”> a <ruleref uri=“#b”/> </rule> <rule id=“b”> <one-of> <item> b </item> <ruleref special= “NULL”/> </one-of> BNF JSGF <A>::=aB <B>::=bB <B>::=ε <A>=aB; <B>=b*;
10
ISIP Network Converter
Software Tools ISIP Network Converter Console tool to perform conversions to and from arbitrary grammar formats ISIP Network Builder Java-based graphical tool to design grammars as finite state machines Can exports grammars to JSGF, XML-SRGS, ABNF, BNF, and IHD ISIP Language Model Tester Console tool for testing of grammars Can generate valid sentences in a given grammar Can parse sentences and determine if accepted by a given grammar.
11
Minimization Minimization Happens in BNF
Iterate over rule set, merging redundant rules Rules can be merged if the non terminal of both rules reference the same terminal Example: Conversion (especially from XML-SRGS) introduces redundancies.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.