A Programming Languages Syntax Analysis (1)
2 Lexical Analysis - errors Introduction to Parsing Recursive Grammars Derivations and parse trees Ambiguous Grammars
3 Lexical Errors Only a small %age of errors can be recognised during Lexical Analysis Consider fi (good == “bad)
4 –Line ends inside literal string –Illegal character in input file –Input file ends inside a comment –Invalid exponent in REAL constant –Number too long –Illegal use of underscore in identifier Examples from the oberon language (QUT)
5 In general What does a lexical error mean? Strategies: –“Panic-mode” Delete chars from input until something matches –Inserting characters –Re-ordering characters –Replacing characters For an error like “illegal character” then we should report it sensibly
6 Syntax Analysis aka Parsing Grouping together tokens into larger structures Analogous to lexical analysis Input: –Tokens (output of Lexical Analyzer) Output: –Structured representation of original program
7 Parsing Fundamentals Source program: –3 + 4 After Lexical Analysis: ???
8 Parsing Expression number plus number –Similar to regular definitions: Concatenation Choice Expression number Operator number operator + | - | * | / Repetition is done differently
9 BNF Grammar Operator + | - | * | / Meta-symbols: | Expression number Operator number Structure on the left is defined to consist of the choices on the right hand side Expression number Operator number Different conventions for writing BNF Grammars: ::= number number
10 Derivations Derivation: –Sequence of replacements of structure names by choices on the RHS of grammar rules –Begin: structure name –End: string of token symbols –Each step one replacement is made Exp Exp Op Exp | number Op + | - | * | /
11 Example Derivation Example: Note the different arrows: Derivation applies grammar rules Used to define grammar rules Non-terminals: Exp, OpTerminals: number, * Terminals: because they terminate the derivation
12 Derivations (2) E ( E ) ??????? E ( E ) | a What sentences does this grammar generate An example derivation: Note that this is what we couldn’t achieve with regular definitions –See pg 96 in Aho, Sethi, Ullman
13 Recursive Grammars E ( E ) | a –is recursive E ( E ) is the general case E a is the terminating case We have no * operator in context free grammars –Repetition = recursion E E | –derives , , , , …. –All strings beginning with followed by zero or more repetitions of *
14 Recursive Grammars (2) a+ (regular expression) –E E a | a (1) –Or –E a E | a (2) 2 different grammars can derive the same language (1) is left recursive (2) is right recursive a* –Implies we need the empty production –E E a |
15 Recursive Grammars (3) Require recursive data structures – trees Parse Trees exp op number * Exp Exp Op Exp | number Op + | - | * | /
16 Parse Trees & Derivations Leafs = terminals Interior nodes = non-terminals If we replace the non-terminals right to left –The parse tree sequence is right to left –A rightmost derivation -> reverse post-order traversal If we derive left to right: –A leftmost derivation – pre-order traversal –parse trees encode information about the derivation process
17 Abstract Syntax Trees exp op Parse Tree + 34 Abstract Syntax Tree number Token sequence This is all the information we actually need Parse trees contain surplus information
18 An exercise Consider the grammar S->(L) | a L->L,S |S (a)What are the terminals, nonterminals and start symbol (b)Find leftmost and rightmost derivations and parse trees for the following sentences i.(a,a) ii.(a, (a,a)) iii.(a, ((a,a), (a,a)))