Download presentation
Presentation is loading. Please wait.
Published byTiphaine Lamontagne Modified over 5 years ago
1
Overview of Language Model Classes and Release Progress
min XML ABNF IHD BNF BNF JSGF Daniel May Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering
2
Language Model Classes
Overview Language Model Classes LanguageModelIHD: Explanation of IHD->BNF and BNF->IHD conversions. LanguageModelABNF: Explanation and example of ABNF->BNF conversion algorithm. LanguageModelBNF: Explanation of graph minimization algorithm LanguageModelXML and LanguageModelJSGF Network Utilities: isip_network_builder, isip_network_converter Release Progress Outstanding Issues Plan Deadline
3
What is Normalized BNF? IHD→BNF Class: LanguageModelIHD
Normalized BNF consists only of the following three rule forms: 1. (RULE_NAME) →(TERMINAL),(NON_TERMINAL) 2. (RULE_NAME) →(NON_TERMINAL) 3. (RULE_NAME) →(EPSILON) IHD→BNF Straightforward conversion process Each IHD arc is converted to a normalized BNF rule Example: IHD BNF RS→R0 R3→C,R3 RS→R1 R3→C,RT R0→A,R3 RT→ε R1→B,R3
4
BNF → IHD Class: LanguageModelIHD Straightforward conversion process
Simply the reverse of the IHD→BNF process Unique nodes identified by unique instances of: (RULE_NAME)→(TERMINAL) Concatenation tokens (“,”) correspond to arcs and are weighted Example BNF IHD RS→R0 R3→C,R3 RS→R1 R3→C,RT R0→A,R3 RT→ε R1→B,R3 Nodes 1: A 2: B 3: C Arcs (S,1) (2,3) (S,2) (3,3) (1,3) (3,T)
5
ABNF → BNF Class: LanguageModelABNF Complicated!
Accomplished using a recursive algorithm that extracts sets of ‘right symbols’ and ‘left symbols’ and builds a set of normalized BNF rules. A set of right and left symbols is found when a concatenation, Kleene star (‘*’) or Kleene plus (‘+’) is encountered. If n left symbols and m right symbols are found, n x m BNF rules are created. ABNF rules are processed one at a time We iterate over the tokens in each rule from left to right and look for concatenation, Kleene star, and Kleene plus tokens. When one of these tokens is encountered, the recursive methods findLeftSymbols() and findRightSymbols() are called. Each returns a set of symbols.
6
Class: LanguageModelABNF
Example We must first construct a set of nodes using unique combinations of (RULE_NAME)→(TERMINAL) IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Nodes: R0→A R3→D R1→B R4→E R2→C R5→F
7
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule RS→R0 BNF Rules: This rule contains no tokens of interest, so we move on to the next rule. Left Symbols Right Symbols
8
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: As we iterate from left to right, we encounter a concatenation token. The findLeftSymbols method returns ‘A’. Left Symbols A Right Symbols
9
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: When findRightSymbols is called, we encounter a Kleene star. Left Symbols A Right Symbols
10
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: The findRightSymbols method must be called on the token following the next concatenation at this nesting level. Left Symbols A Right Symbols F
11
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: Next, findRightSymbols is called on the token following the Kleene star. In this case, it’s an opening parenthesis. Left Symbols A Right Symbols F
12
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: For an opening parenthesis, we call findRightSymbols on the token following it. Left Symbols A Right Symbols F, B
13
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: We also look for alternation tokens, and call findRightSymbols on tokens following the them. Left Symbols A Right Symbols F, B, E
14
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: The Kleene plus is ignored since it isn’t currently relevant, and findRightSymbols is called on the open parenthesis. Left Symbols A Right Symbols F, B, E, C
15
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 Now we can construct a set of BNF rules from the right and left symbols. Left Symbols A Right Symbols F, B, E, C
16
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 The next token of interest is a Kleene star. For these, we want a self loop on all rule segments following alternations. Left Symbols Right Symbols
17
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 Since the following token is an open parenthesis, we find all rule segments separated by alternation tokens. Left Symbols Right Symbols
18
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 A different set of rules is created for each segment. Left Symbols Right Symbols
19
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last. Left Symbols Right Symbols
20
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 R1→B,R1 findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last. Left Symbols B Right Symbols B
21
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 R1→B,R1 R4→D,R3 findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last. Left Symbols D Right Symbols C
22
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R0→A,R2 R0→A,R4 R0→A,R3 R1→B,R1 R3→D,R2 R4→D,R4 findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last. Left Symbols E Right Symbols E
23
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R2→C,R3 R0→A,R2 R0→A,R4 R0→A,R3 R1→B,R1 R3→D,R2 R4→E,R4 The next token of interest is another concatenation. Again, we find a set of right and left symbols and build rules. Left Symbols C Right Symbols D
24
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R2→C,R3 R0→A,R2 R4→E,R5 R0→A,R4 R3→D,R5 R0→A,R3 R1→B,R5 R1→B,R1 R3→D,R2 R4→E,R4 The next token of interest is another concatenation. Again, we find a set of right and left symbols and build rules. Left Symbols E, D, B Right Symbols F
25
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R2→C,R3 R0→A,R2 R4→E,R5 R0→A,R4 R3→D,R5 R0→A,R3 R1→B,R5 R1→B,R1 R3→D,R2 R4→E,R4 The next token of interest is another concatenation, but this time, the right symbol is a non terminal. Left Symbols F Right Symbols
26
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule R0->A,*(B|+(C,D)|E),F,RT BNF Rules: R0→A,R5 R4→E,R5 R0→A,R2 R3→D,R5 R0→A,R4 R1→B,R5 R0→A,R3 R1→B,R1 R5→B,RT R3→D,R2 RT→ε R4→E,R4 R2→C,R3 When findRightSymbols is called on a non terminal, findRightSymbols is called on the first token of the rule referenced. Left Symbols F Right Symbols ε
27
Class: LanguageModelABNF
Example IHD ABNF RS→R0 R0→A,*(B|+(C,D)|E),F,RT RT→ε Current Rule RS→R0 BNF Rules: R0→A,R5 R4→E,R5 R0→A,R2 R3→D,R5 R0→A,R4 R1→B,R5 R0→A,R3 R1→B,R1 R5→B,RT R3→D,R2 RT→ε R4→E,R4 RS→R0 R2→C,R3 BNF start rules are found by calling findRightSymbols on the first token of the ABNF start rules. Left Symbols Start Right Symbols A
28
Class: LanguageModelABNF
Weights ABNF does not have a mechanism for defining weights on arcs because ABNF has no knowledge of arcs. Arcs are just implied by the grammar representation. When converting from IHD to any other format that uses ABNF as an intermediate, weights are included on the open parenthesis tokens preceding non terminal and terminal symbols. In some cases, the ABNF rules must be restructured to support weights. This will only be the case if the source of the grammar is not ISIP internal. Testing The ABNF→BNF algorithm has been thoroughly tested on ABNF grammars derived from XML, but more testing needs to be done on arbitrary ABNF grammars.
29
Graph Minimization Class: LanguageModelBNF
Converting from XML introduces redundancy. Although resulting graphs are equivalent to the originals, they’re much larger and nearly impossible to interpret visually. The minimize method in LanguageModelBNF can be used to remove redundancy once the language model is in BNF representation. The algorithm iterates over all rule pairs and determines whether or not the rules can be merged into a single rule. Rules can be merged if the non terminal of both rules reference the same terminal and if the weights on the concatenation tokens are the same. When two rules are merged, the other rules must all be updated. Example:
30
Class: LanguageModelBNF
Testing Currently, this minimization algorithm has been tested by visually inspecting the original graph and resulting graph and verifying that they are equivalent. The isip_lm_tester tool will be able to test it more thoroughly once the language model parsing capability is complete. Eventually, we should probably implement a true FSM minimization algorithm.
31
LanguageModelXML LanguageModelJSGF
Class: LanguageModelXML and LanguageModelJSGF LanguageModelXML Wesley has completed this class and checked it in. Minor changes are made every once and a while, but overall, the conversions from BNF to XML and XML to ABNF are working fine. LanguageModelJSGF This class will be implemented similarly to LanguageModelXML. The underlying JSGF representation is ABNF. JSGF parsing algorithms already exist, but currently, the JSGF tokens are converted directly to IHD. This was supposed to be finished several weeks ago, but issues regarding ABNF to BNF conversion and graph minimization have caused delays.
32
isip_network_converter
Other Language Model Related Utilities isip_network_converter Changes have been made to incorporate XML, BNF, and ABNF. A minimize option has been added that invokes the minimization routine when the language model is in BNF representation. isip_network_builder The changes to allow network_builder to save in other formats are pending isip_lm_tester Won is in the process of adding parsing capability to this tool. Currently, the tool can only generate random transcriptions. Soon, it will be able to parse transcriptions and verify that they are valid given a particular language model.
33
Outstanding Issues Schedule/Deadline Release Progress
LanguageModelJSGF (Daniel) Diagnose methods and documentation (Daniel, Seungchan, Ted) isip_lm_tester parsing capability (Won) isip_transform and isip_transform_builder (Sridhar) Varmint backlog (Everyone) Schedule/Deadline March 10: All code and documentation will be completed, tested, and checked in (code freeze). After March 10, we will begin running regression and code integrity tests. March 31: Release Date
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.