Download presentation
Presentation is loading. Please wait.
1
Parsing Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn
2
Derivations A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string Note that the final string (sentence) consists of only terminals
3
Question Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ? Or equivalently, is a given program p valid according to some language ’ s syntax (say C)?
4
Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum
5
Example: Context-Free Grammar // derivable? xum xuwz S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z
6
Example: Context-Free Grammar // derivable? xum xuwz xwu S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z
7
Example: Context-Free Grammar // derivable? xum xuwz xwu xuz S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z
8
Lexical Analyzer The lexical analyzer translates the source program into a stream of lexical tokens Source program: stream of (ASCII or Unicode) characters Lexical token: compiler data structure that represents the occurrence of a terminal symbol Valid sentence consists of only allowable terminals
9
Example: Context-Free Grammar // all terminals T={x, y, u, v, t, w, z} S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z
10
Example: Context-Free Grammar // all terminals T={x, y, u, v, t, w, z} // allowable strings T* S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z
11
Predictive Parsing Parsing: recognizing a string and do something useful The most na ï ve approach to use when implementing a parser is to use recursive descent A form of top-down parsing Not as powerful as other methods, but easy enough to implement by hand
12
Predictive Parsing // Valid? xum xuwz xwu xuz S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z
13
A Predictive Parser in C (Sketch) tokenTy token; void parseS () { switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } // other functions are similar
14
Output: Abstract Syntax Tree xuz S xA uC z
15
A Predictive Parser Emitting AST in C (Sketch) tokenTy token; S parseS () { switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } // other functions are similar
16
Predictive Parsing Difficulties // derivable? xuz S ::= x A | x B A ::= u C | v C B ::= t C ::= w | z
17
E By 4 => E * E By 5 => E * (E + E) By 2 => E * (E + 4) By 2 => E * (3 + 4) By 2 => 15 * (3 + 4) Or Even Worse 1 E ::= id 2 | num 3 | E + E 4 | E * E 5 | ( E ) 15*(3+4)
18
E E * E E * (E + E) E * (E + 4) E * (3 + 4) 15 * (3 + 4) Or Even Worse 15*(3+4) E E * E 15 * E 15 * (E + E) 15 * (3 + E) 15 * (3 + 4) rightmost derivationleftmost derivation
19
Ambiguous grammars A grammar is ambiguous if there is a sentence with >1 parse tree 15 * 3 + 4 E E*E 15 E +E 3 4 E E+E E *E 3
20
Eliminating ambiguity In programming language syntax, ambiguity often arises from missing operator precedence or associativity * higher precedence than +? * and + are left associative? Can sometimes rewrite the grammar to disambiguate this Beyond the scope of this course
21
Unambiguous Grammar E ::= id | num | E + E | E * E | ( E ) E ::= E + T | T T ::= T * F | F F ::= id | num | ( E ) Accepts the same language, but parses unambiguously
22
Limitations with Predictive Parsing Rewriting grammar: to resolve ambiguity Grammars/trees are ugly But … easy to write code by hand, and very good for error reporting
23
Doing better We can do better We can use a parsing algorithm that can handle all context-free languages (though not all context-free grammars) Remember: a context-free language might have many different context-free grammars
24
The Yacc Tool semantic analyzer specification parser Yacc Originally developed for C, and now almost every main-stream language has its own Yacc-tool: bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), …
25
Whole Structure source code abstract syntax tree lexical analyzer parser tokens Pentiu m other part
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.