Designing a Predictive Parser Consider A FIRST()=set of leftmost tokens that appear in or in strings generated by . E.g. FIRST(type)={,array,integer,char,num} Consider productions of the form A, A the sets FIRST() and FIRST() should be disjoint Then we can implement predictive parsing (initially: start NT + lookahead=lefmost) Starting with A? we find into which FIRST() set the lookahead symbol belongs to and we use this production. Any non-terminal results in the corresponding procedure call Terminals are matched.
Problems with Top Down Parsing Left Recursion in CFG May Cause Parser to Loop Forever. Indeed: In the production AA we write the program procedure A { if lookahead belongs to First(A) then call the procedure A } Solution: Remove Left Recursion... without changing the Language defined by the Grammar.
Dealing with Left recursion Solution: Algorithm to Remove Left Recursion: BASIC IDEA: AA| becomes A R R R| expr expr + term | expr - term | term term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr term rest rest + term rest | - term rest |
What happens to semantic actions? expr expr + term {print(‘+’)} expr - term {print(‘-’)} term term 0 {print(‘0’)} term 1 {print(‘1’)} … term 9 {print(‘9’)} expr term rest rest + term {print(‘+’)} rest - term {print(‘-’)} rest term 0 {print(‘0’)} term 1 {print(‘1’)} … term 9 {print(‘9’)}
Comparing Grammars with Left Recursion Notice Location of Semantic Actions in Tree What is Order of Processing? expr term {print(‘2’)} {print(‘+’)} {print(‘5’)} {print(‘-’)} {print(‘9’)} 5 + 2 - 9
Comparing Grammars without Left Recursion Now, Notice Location of Semantic Actions in Tree for Revised Grammar What is Order of Processing in this Case? {print(‘2’)} expr term term {print(‘-’)} term {print(‘+’)} {print(‘5’)} {print(‘9’)} rest 2 5 - 9 + rest
The Lexical Analysis Process A Graphical Depiction returns token to caller uses getchar ( ) to read character lexan ( ) lexical analyzer pushes back c using ungetc (c , stdin) tokenval Sets global variable to attribute value
The Lexical Analysis Process Functional Responsibilities Input Token String Is Broken Down White Space and Comments Are Filtered Out Individual Tokens With Associated Values Are Identified Symbol Table Is Initialized and Entries Are Constructed for Each “Appropriate” Token Under What Conditions will a Character be Pushed Back?
Example of a Lexical Analyzer function lexan: integer ; var lexbuf : array[ 0 .. 100 ] of char ; c : char ; begin loop begin read a character into c ; if c is a blank or a tab then do nothing else if c is a newline then lineno : = lineno + 1 else if c is a digit then begin set tokenval to the value of this and following digits ; return NUM end
Algorithm for Lexical Analyzer else if c is a letter then begin place c and successive letters and digits into lexbuf ; p : = lookup ( lexbuf ) ; if p = 0 then p : = insert ( lexbf, ID) ; tokenval : = p return the token field of table entry p end else set tokenval to NONE ; / * there is no attribute * / return integer encoding of character c Note: Insert / Lookup operations occur against the Symbol Table !
Symbol Table Considerations OPERATIONS: Insert (string, token_ID) Lookup (string) NOTICE: Reserved words are placed into symbol table for easy lookup Attributes may be associated with each entry, i.e., Semantic Actions Typing Info: id integer etc. ARRAY symtable lexptr token attributes div mod id 1 2 3 4 d i v EOS m o d EOS c o u n t EOS i EOS ARRAY lexemes
A Brief Look at Code Generation Back-end of Compilation Process - Which Will Not Be Our Emphasis We’ll Focus on Front-end Important Concepts to Re-emphasize •• Abstract Stack Machine for Intermediate Code Generation: (i) basic arithmetic, (ii) stack, (iii), flow control •• L-value Vs. R-value of an identifier I : = 5 ; L - Location I : = I + 1 ; R - Contents
A Brief Look at Code Generation Employ Statement Templates for Code Generation. Each Template Characterizes the Translation Different Templates for Each Major Programming Language Construct, if, while, procedure, etc. WHILE IF label test code for expr code for expr gofalse out gofalse out code for stmt code for stmt label out goto test label out
Concluding Remarks / Looking Ahead We’ve Reviewed / Highlighted Entire Compilation Process Introduced Context-free Grammars (CFG) and Indicated /Illustrated Relationship to Compiler Theory Reviewed Many Different Versions of Parse Trees That Assist in Both Recognition and Translation We’ll Return to Beginning - Lexical Analysis We’ll Explore Close Relationship of Lexical Analysis to Regular Expressions, Grammars, and Finite Automatons