Recap Mooly Sagiv
Outline Subjects Studied Questions & Answers
input –program text (file) output –sequence of tokens Read input file Identify language keywords and standard identifiers Handle include files and macros Count line numbers Remove whitespaces Report illegal symbols [Produce symbol table] Lexical Analysis (Scanning)
The Lexical Analysis Problem Given –A set of token descriptions –An input string Partition the strings into tokens (class, value) Ambiguity resolution –The longest matching token –Between two equal length tokens select the first
Jlex Input – regular expressions and actions (Java code) Output – A scanner program that reads the input and applies actions when input regular expression is matched Jlex regular expressions input program tokens scanner
Summary For most programming languages lexical analyzers can be easily constructed automatically Exceptions: –Fortran –PL/1 Lex/Flex/Jlex are useful beyond compilers
input –Sequence of tokens output –Abstract Syntax Tree Report syntax errors unbalanced parenthesizes [Create “symbol-table” ] [Create pretty-printed version of the program] In some cases the tree need not be generated (one-pass compilers) Syntax Analysis (Parsing)
Pushdown Automaton control parser-table input stack $ $utw V
Efficient Parsers Pushdown automata Deterministic Report an error as soon as the input is not a prefix of a valid program Not usable for all context free grammars cup context free grammar tokens parser “Ambiguity errors” parse tree
Kinds of Parsers Top-Down (Predictive Parsing) LL –Construct parse tree in a top-down matter –Find the leftmost derivation –For every non-terminal and token predict the next production –Preorder tree traversal Bottom-Up LR –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order –For every potential right hand side and token decide when a production is found –Postorder tree traversal
Top-Down Parsing 1 t 1 t 2 input
Bottom-Up Parsing t 1 t 2 t 4 t 5 t 6 t 7 t 8 input 1 2 3
Example Grammar for Predictive LL Top- Down Parsing expression digit | ‘(‘ expression operator expression ‘)’ operator ‘+’ | ‘*’ digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
Example Grammar for Predictive LL Top- Down Parsing expression digit | ‘(‘ expression operator expression ‘)’ operator ‘+’ | ‘*’ digit ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
static int Parse_Expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ‘(‘) { expr->type=‘P’; get_next_token(); if (!Parse_Expression(&expr->left)) Error(“missing expression”); if (!Parse_Operator(&expr->oper)) Error(“missing operator”); if (Token.class != ‘)’) Error(“missing )”); get_next_token(); return 1; } return 0; }
Parsing Expressions Try every alternative production –For P A 1 A 2 … A n | B 1 B 2 … B m –If A 1 succeeds Call A 2 If A 2 succeeds –Call A 3 If A 2 fails report an error –Otherwise try B 1 Recursive descent parsing Can be applied for certain grammars Generalization: LL1 parsing
int P(...) { /* try parse the alternative P A 1 A 2... A n */ if (A 1 (...)) { if (!A 2 ()) Error(“Missing A 2 ”); if (!A 3 ()) Error(“Missing A 3 ”);.. if (!A n ()) Error(Missing A n ”); return 1; } /* try parse the alternative P B 1 B 2... B m */ if (B 1 (...)) { if (!B 2 ()) Error(“Missing B 2 ”); if (!B 3 ()) Error(“Missing B 3 ”);.. if (!B m ()) Error(Missing B m ”); return 1; } return 0;
Predictive Parser for Arithmetic Expressions Grammar C-code? 1E E + T 2E T 3T T * F 4T F 5 F id 6 F (E)
Bottom-Up Syntax Analysis Input –A context free grammar –A stream of tokens Output –A syntax tree or error Method –Construct parse tree in a bottom-up manner –Find the rightmost derivation in (reversed order) –For every potential right hand side and token decide when a production is found –Report an error as soon as the input is not a prefix of valid program
Constructing LR(0) parsing table Add a production S’ S$ Construct a finite automaton accepting “valid stack symbols” States are set of items A –The states of the automaton becomes the states of parsing-table –Determine shift operations –Determine goto operations –Determine reduce operations –Report an error when conflicts arise
1: S E$ 4: E T 6: E E + T 10: T i 12: T (E) 5: E T T 11: T i i 2: S E $ 7: E E + T E 13: T ( E) 4: E T 6: E E + T 10: T i 12: T (E) ( ( 15: T (E) ) 14: T (E ) 7: E E + T E 7: E E + T 10: T i 12: T (E) + + 8: E E + T T 2: S E $ $ i i
1: S E$ 4: E T 6: E E + T 10: T i 12: T (E) 5: E T T 11: T i i 2: S E $ 7: E E + T E 13: T ( E) 4: E T 6: E E + T 10: T i 12: T (E) ( ( 15: T (E) ) 14: T (E ) 7: E E + T E 7: E E + T 10: T i 12: T (E) + + 8: E E + T T 2: S E $ $ i i Parsing “ (i)$ ”
Summary (Bottom-Up) LR is a powerful technique Generates efficient parsers Generation tools exit LALR(1) –Bison, yacc, CUP But some grammars need to be tuned –Shift/Reduce conflicts –Reduce/Reduce conflicts –Efficiency of the generated parser
Summary (Parsing) Context free grammars provide a natural way to define the syntax of programming languages Ambiguity may be resolved Predictive parsing is natural –Good error messages –Natural error recovery –But not expressive enough But LR bottom-up parsing is more expressible
Abstract Syntax Intermediate program representation Defines a tree - Preserves program hierarchy Generated by the parser Declared using an (ambiguous) context free grammar (relatively flat) –Not meant for parsing Keywords and punctuation symbols are not stored (Not relevant once the tree exists) Big programs can be also handled (possibly via virtual memory)
Semantic Analysis Requirements related to the “context” in which a construct occurs Examples –Name resolution –Scoping –Type checking –Escape Implemented via AST traversals Guides subsequent compiler phases
Abstract Interpretation Static analysis Automatically identify program properties –No user provided loop invariants Sound but incomplete methods –But can be rather precise Non-standard interpretation of the program operational semantics Applications –Compiler optimization –Code quality tools Identify potential bugs Prove the absence of runtime errors Partial correctness
Constant Propagation z =3 while (x>0) if (x=1) y =7y =z+4 assert y==7 [x ?, y ?, z ? ] [x ?, y ?, z 3 ] [x 1, y ?, z 3 ] [x 1, y 7, z 3 ] [x ?, y 7, z 3 ] [x ?, y ?, z 3 ]
/* c */ L0: a := 0 /* ac */ L1:b := a + 1 /* bc */ c := c + b /* bc */ a := b * 2 /* ac */ if c < N goto L1 /* c */ return c a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ;
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ;
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ; {c}
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ; {c}
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ; {c} {c, b}
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ; {c} {c, b}
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ; {c} {c, b} {c, a} {c, b}
a := 0 ; b := a +1 ; c := c +b ; a := b*2 ; c <N goto L1 return c ; {c, a} {c, b} {c, a} {c, b}
Summary Iterative Procedure Analyze one procedure at a time –More precise solutions exit Construct a control flow graph for the procedure Initializes the values at every node to the most optimistic value Iterate until convergence
Basic Compiler Phases
Overall Structure
Techniques Studied Simple code generation Basic blocks Global register allocation Activation records Object Oriented Assembler/Linker/Loader
Heap Memory Management Part of the runtime system Utilities for dynamic memory allocation Utilities for automatic memory reclamation –Garbage Colletion
Garbage Collection Techniques –Mark and sweep –Copying collection –Reference counting Modes –Generational –Incremental vs. Stop the world