Download presentation
Presentation is loading. Please wait.
1
Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004
2
2April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs
3
3April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Human Speech
4
4April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Human Speech Embedded Languages
5
5April 3, 2004LDTA 2004 Harmonia: Language-aware Editing Programming by Voice –Code dictation –Voice-based editing commands Program Transformations –Transformation actions –Pattern-matching constructs Each kind of input stream ambiguity requires new language analyses Human Speech Embedded Languages
6
6April 3, 2004LDTA 2004 Speech Example for (int i = 0; i < 10; i++ ) { } for int i equals zero i less than ten i plus plus
7
7April 3, 2004LDTA 2004 Ambiguities for (int i = 0; i < 10; i++ ) { } 4 int eye equals 0 aye less then 10 i plus plus
8
8April 3, 2004LDTA 2004 Ambiguities for (int i = 0; i < 10; i++ ) { } 4 int eye equals 0 aye less then 10 i plus plus KW or #? ID Spelling? KW or ID?
9
9April 3, 2004LDTA 2004 Another Utterance for times ate equals zero two plus equals one
10
10April 3, 2004LDTA 2004 Many Valid Parses! 4 * 8 = zero; to += won for times ate equals zero two plus equals one for (times; ate == 0; to += 1) { } fore.times(8).equalsZero(2, plus == 1)
11
11April 3, 2004LDTA 2004 Embedded Language Example C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID);
12
12April 3, 2004LDTA 2004 Embedded Language Example C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID); Why not this interpretation? [_a-zA-Z]([_a-zA-Z0-9])* i++ ; RETURN_TOKEN(ID);
13
13April 3, 2004LDTA 2004 Legacy Language Example Fortran DO 57 I = 3,10
14
14April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10
15
15April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 DO 57 I = 3
16
16April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 Assignment DO 57 I = 3
17
17April 3, 2004LDTA 2004 Legacy Language Example Fortran Do Loop DO 57 I = 3,10 Assignment DO57I = 3
18
18April 3, 2004LDTA 2004 Legacy Language Example PL/I Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END
19
19April 3, 2004LDTA 2004 Legacy Language Example PL/I Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END KW ID
20
20April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Unambiguous Homophone IDs Lexical misspellings Multiple Lexical Categories Non-reserved keywords Ambiguous interpretations Homophones Input Stream Classification
21
21April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Unambiguous Homophone IDs Lexical misspellings Multiple Lexical Categories Non-reserved keywords Ambiguous interpretations Homophones Input Stream Classification Embedded Languages Fall in all Four Categories!
22
22April 3, 2004LDTA 2004 GLR Analysis Architecture Lexer GLR Parser Semantics FORI FOR I for (i = 0; i < 10; i++ ) { } (
23
23April 3, 2004LDTA 2004 GLR Analysis Architecture Lexer GLR Parser Semantics FORI FOR I for (i = 0; i < 10; i++ ) { } ( Handles syntactic ambiguities
24
24April 3, 2004LDTA 2004 Our Contribution: XGLR Analysis Architecture Lexer XGLR Parser Semantics for i equals zero... FORI FOR I
25
25April 3, 2004LDTA 2004 Our Contribution: XGLR Analysis Architecture Lexer XGLR Parser Semantics for i equals zero... FORI 4EYE FOR I Handles input stream ambiguities
26
26April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack
27
27April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack
28
28April 3, 2004LDTA 2004 LR Parsing IDKW# 1S2S3Err 2R1S4Err 3S9R3S7 I ID = KW 1 Parse Table 0 # Input StreamParse Stack FOR KW 3
29
29April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack
30
30April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack
31
31April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID FOR KW = 1 Parse Table 0 # Input StreamParse Stack 2 5
32
32April 3, 2004LDTA 2004 GLR Parsing IDKW# 1S2 S3 R5 Err 2 R1 R2 S4Err 3S9R3S7 I ID = KW 1 Parse Table 0 # Input StreamParse Stack 2 FOR KW 3 FOR KW 4 5
33
33April 3, 2004LDTA 2004 Single Spelling Multiple Spellings Single Lexical Category Not ShownExample 1 Multiple Lexical Categories Example 2Example 1 XGLR in Action
34
34April 3, 2004LDTA 2004 23FOR Parsing Homophones BAR
35
35April 3, 2004LDTA 2004 23FOR FORE 4 ID KW NUM XGLR Extension: Multiple Spellings, Single and Multiple Lexical Categories BAR FOUR
36
36April 3, 2004LDTA 2004 23FOR FORE 4 23 ID KW NUM XGLR Extension: Parsers fork due to input ambiguity BAR FOUR
37
37April 3, 2004LDTA 2004 23 26 FOR FORE 4 29 35 23 ID KW NUM Each parser shifts its now unambiguous input BAR FOUR
38
38April 3, 2004LDTA 2004 23 26 FOR FORE 4 29 35 BAR 23 ID KW NUM The next input is lexed unambiguously FOUR
39
39April 3, 2004LDTA 2004 23 26 FOR FORE 4 29 35 BAR 23 42 49 ID KW NUM ID is only a valid lookahead for two parsers FOUR
40
40April 3, 2004LDTA 2004 Parsing Embedded Languages Example BNF Grammar Contains Languages L and W b L loop L d W END L loop L LOOP L | d W WHILE W NUM W do W do W DO W | L W
41
41April 3, 2004LDTA 2004 Parsing Embedded Languages Example BNF Grammar Contains Languages L and W b L loop L d W END L loop L LOOP L | d W WHILE W NUM W do W do W DO W | LOOP WHILE 34 END WHILE 56 DO END L W
42
42April 3, 2004LDTA 2004
43
43April 3, 2004LDTA 2004
44
44April 3, 2004LDTA 2004
45
45April 3, 2004LDTA 2004
46
46April 3, 2004LDTA 2004 S0 Parsing Embedded Languages LOOPWHILE34
47
47April 3, 2004LDTA 2004 S0LOOPWHILE34 Current parse state has ambiguous lexical language
48
48April 3, 2004LDTA 2004 S 0 0 W L LOOPWHILE34 XGLR Extension: Fork parsers, assign one to each lexical language
49
49April 3, 2004LDTA 2004 S 0 0LOOP W L W L KW ID WHILE34 XGLR Extension: Single spelling, Multiple lexical categories Lex lookahead both in language L and W
50
50April 3, 2004LDTA 2004 S 4 L 0 0LOOP W L W L KW ID WHILE34 Only LOOP L is valid lookahead, and is shifted
51
51April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE34 XGLR Extension: State 4 has lexer lookaheads only in language W
52
52April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W Lex lookahead in language W
53
53April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W REDUCE by rule 2 and GOTO state 1 loop L
54
54April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W loop L
55
55April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W 2 W Shift into state 2 loop L
56
56April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE 34 KW W 1 W 2 W W NUM XGLR Extension: Lex lookahead in language W loop L
57
57April 3, 2004LDTA 2004 S 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM loop L
58
58April 3, 2004LDTA 2004 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W Shift into state 3 S loop L
59
59April 3, 2004LDTA 2004 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W Shift into state 3, which has ambiguous lexical language S loop L
60
60April 3, 2004LDTA 2004 4 W 0 0LOOP W L W L KW ID WHILE34 KW W 1 W 2 WW NUM 3 W 3 L XGLR Extension: Single spelling, Multiple lexical categories Fork parsers, assign one to each lexical language S loop L
61
61April 3, 2004LDTA 2004 GLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict
62
62April 3, 2004LDTA 2004 XGLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict
63
63April 3, 2004LDTA 2004 XGLR Ambiguity Support 1. Fork parser on shift-reduce conflict 2. Fork parser on reduce-reduce conflict 3. Fork parsers on ambiguous lexical language Single spelling, Multiple lexical categories 4. Fork parsers on ambiguous lexical lookahead Single/Multiple Spellings, Multiple lexical categories Shift-shift conflict resolution
64
64April 3, 2004LDTA 2004 XGLR Ambiguities Many GLR programming language specs have finite, few ambiguities XGLR language specs also have finite, but slightly more, ambiguities –Lexical ambiguity due to ambiguous input does result in more ambiguous parse forests
65
65April 3, 2004LDTA 2004 XGLR Ambiguities Many GLR programming language specs have finite, few ambiguities XGLR language specs also have finite, but slightly more, ambiguities –Lexical ambiguity due to ambiguous input does result in more ambiguous parse forests Ambiguity causes parsers to fork GLR maintains efficiency by merging parsers when ambiguity is over
66
66April 3, 2004LDTA 2004 Parser Merging GLR: Parsers merge when in same parse state DO KW 8 31 DO KW 5 57 # 5
67
67April 3, 2004LDTA 2004 Parser Merging GLR: Parsers merge when in same parse state DO KW 8 31 DO KW 5 57 # 4 5 #
68
68April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 57 # A AA AA W 5 A
69
69April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W
70
70April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 W
71
71April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 A
72
72April 3, 2004LDTA 2004 Parser Merging XGLR: Parsers merge when in same parse state and same lexical state DO KW 8 31 DO KW 5 A AA AA W 5 A 57 # # A W 4 A
73
73April 3, 2004LDTA 2004 Out of Sync Parsers DO57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A
74
74April 3, 2004LDTA 2004 Out of Sync Parsers 57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W =3
75
75April 3, 2004LDTA 2004 Out of Sync Parsers 57I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W =3 3 A 5 W
76
76April 3, 2004LDTA 2004 Out of Sync Parsers I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A
77
77April 3, 2004LDTA 2004 Out of Sync Parsers I=3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A
78
78April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A
79
79April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A 9 W
80
80April 3, 2004LDTA 2004 Out of Sync Parsers =3 XGLR: Parsers merge when in same parse state and same lexical state and same input position 8 1 A W 5 A DO KW A DO57I ID W 3 3 A 5 W = KW W 57 ID A 6 W 4 A # W I A 9 W
81
81April 3, 2004LDTA 2004 Implementation Keep map: lookahead parser to use when looking for parsers to merge with Sort parsers by position of lookahead in the input –Enables pruning of map as parsers move past a particular input location –Extra memory required is bounded by dynamic separation between first and last parsers
82
82April 3, 2004LDTA 2004 Related Work GLR Parsing Algorithm –Tomita [1985] –Farshi [1991] –Rekers [1992] –Johnstone et. al. [2002] Incremental GLR –Wagner [1997] GLR Implementations (that I heard of before today) –ASF+SDF [1993] –Elkhound [2004] –Bison [2003] –DParser [2002] –Aycock and Horspool [1999] Scannerless Parsing (or Context-Free Scanning) –Salomon and Cormack [1989] –Visser [1997] van den Brand [2002] Ambiguous Input Streams –Aycock and Horspool [2001] Embedded Languages –ASF+SDF [1997] –Van de Vanter and Boshernitsan (CodeProcessor) [2000]
83
83April 3, 2004LDTA 2004 Future Work Semantic Analysis of Embedded Languages Automated Semantic Disambiguation
84
84April 3, 2004LDTA 2004 Contributions 1. Generalized GLR to handle input stream ambiguities 2. Classified input stream ambiguities into four categories 3. Implemented XGLR algorithm in Harmonia framework 4. Constructed combined lexer and parser generator to support embedded languages and lexical ambiguities at each stage of analysis 5. Enabled analysis of embedded languages, programming by voice, and legacy languages
85
85April 3, 2004LDTA 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.