1 Week 3 Questions / Concerns What’s due: Lab1b due Friday at midnight Lab1b check-off next week (schedule will be announced on Monday) Homework #2 due next Monday (Draw a parse tree) Homework #3 due next Wednesday (Define grammar for your language) Homework #4 due next Thursday (Grammar modifications) Top down parser Grammar modifications
2 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Syntax Analysis (Parser) Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator Intermediate Representation Target machine code Symbol Table skeletal source program preprocessor
3 Parser Choose a type of parser Top-Down parser Bottom-Up parser Choose a parsing technique Recursive Descent Table driven parser (LL(1) or LR(1)) Generate a grammar for your language Modify the grammar to fit the particular parsing technique Remove lambda productions Remove unit productions Remove left recursion Left factor the grammar
4 Parser Parser is just a matching tool It matches list of tokens with grammar rules to determine if they are legal constructs/statements or not. Yes/No machine Context-Free It doesn’t care about context (types), it just cares about syntax If it looks like an assignment statement, then it is an assignment statement. int x; x = “Hello”;
5 Grammar #1 S -> aaSc| B B -> bbbB | Generate a parse tree for the input string aaaabbbcc
6 Grammar #2 S -> E E -> E + E E -> E * E E -> a |b | c Generate a parse tree for the input string a + b * c
7 Grammar #3 Lua Grammar
8 Grammar Two formats Context-Free Grammar Extended Backus-Naur Form Lua Example laststat ::= return [explist] | break Laststat -> return LaststatOptional | break LaststatOptional -> Explist | varlist ::= var {`,´ var} Varlist -> Var Varlist2 Varlist2 -> `,´ Var Varlist2 |
9 Grammar Two formats Context-Free Grammar Extended Backus-Naur Form Mini C example Program = Definition { Definition } program -> Definition MoreDefinitions MoreDefinitions -> Definition MoreDefinitions | Definition = Data_definition | Function_definition Definition -> Data_definition | Function_definition Function_definition = ['int'] Function_header Function_body Function_definition -> OptionalType Function_header Function_body OptionalType -> ‘int’ |
10 Top-down parser Start with start symbol of the grammar. Grab an input token and select a production rule. Use “stack” to store the production rule. Try to parse that rule by matching input tokens. Keep going until all of the input tokens have been processed. If the rule is not the right one, put all the tokens back and try a different rule. (backtracking)
11 Top-down Parser Ideal grammar: Unique rule for each type of token. One-token look ahead
12 One token look ahead Stat -> local function Name Funcbody | local Namelist LocalOptional Based on one token “local” we should be able to pick one unique rule so we don’t have to backtrack. What if we could combine these 2 rules into one rule by factoring out the common parts, it would eliminate the need for backtracking.
13 One token look ahead Stat -> local function Name Funcbody | local Namelist LocalOptional Left factor the grammar: Stat -> local Morelocal Morelocal -> function Name Funcbody | Namelist LocalOptional
14 Top-down Parser Ideal grammar: Unique rule for each type of token. One-token look ahead Minimize unit productions Unit productions don’t parse tokens immediately. It requires another production. It’s hard to tell which tokens match the unit productions thus more chances for backtracking.
15 Minimize Unit Productions S -> aaSc S -> B B -> bbbB B -> S B b b b B
16 Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp
17 Remove Unit Productions S -> aaSc S -> B B -> bbbB B -> S -> aaSc S -> bbbB S -> B -> bbbB B ->
18 Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp Exp -> nil | false | true | Number | String | `...´ | Functioncall| Prefixexp | { Fieldlistoptional }| Exp Binop Exp | Unop Exp
19 Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp Exp -> nil | false | true | Number | String | `...´ | Prefixexp Args | Prefixexp `:´ Name Args | Prefixexp | { Fieldlistoptional } | Exp Binop Exp | Unop Exp
20 Minimize Unit Productions Exp -> nil | false | true | Number | String | `...´ | Functioncall | Prefixexp | Tableconstructor | Exp Binop Exp | Unop Exp Exp -> nil | false | true | Number | String | `...´ | Prefixexp Args | Prefixexp `:´ Name Args | Prefixexp | { Fieldlistoptional } | Exp Binop Exp | Unop Exp More left factoring needed
21 Top-down Parser Ideal grammar: Unique rule for each type of token. One-token look ahead Minimize unit productions Unit productions don’t parse tokens immediately. It requires another production. It’s hard to tell which tokens match the unit productions thus more chances for backtracking. Lambda productions are okay but we have to process them accordingly. Removing lambdas always add more rules. It’s not possible to remove all lambda productions and still yield unique token-rule matching. Remove left recursion in the grammar.
22 Grammar (left recursive vs. right recursive) Right Recursion A -> aA A -> Left Recursion A -> Aa A -> A aA aA aA A aA a A a A Only non- recursive rule is Same grammar?
23 Grammar (left recursive vs. right recursive) A -> aA A -> A -> Aa A -> A aA aA aA A aA a A a A Which one works for top down?
24 Grammar (left recursive vs. right recursive) A -> aA A -> b A -> Aa A -> b A aA aA aA b A aA a A a A b Non- recursive rules are not only Same grammar?
25 Remove Left Recursion in the Grammar Example: A -> Aa A -> b Step 1: Make all left recursive rules right recursive, but give them a new non- terminal A -> Aa X -> aX Step 2: Add a lambda production to the new non-terminal X -> Step 3: Identify all non-recursive rules. A -> b Step 4: Append the new non-terminal to the end of all non-recursive rules A -> bX A -> A… Left Recursive rule
26 Grammar (left recursive vs. right recursive) A -> bX X -> aX | A -> Aa A -> b A bX aX aX A aA a A a A b Non- recursive rules are not only Same grammar? a
27 Remove Left Recursion S -> Sab S -> c S -> d X -> abX X -> S -> cX S -> dX
28 Remove Left Recursion PARAMLIST -> IDLIST : TYPE | PARAMLIST ; IDLIST : TYPE PARAMLIST2 -> ; IDLIST : TYPE PARAMLIST2 PARAMLIST2 -> PARAMLIST -> IDLIST : TYPE PARAMLIST2
29 Remove Unit Production Example S -> abSc S -> A S -> AB A -> aA A -> B -> bbB B -> S -> abSc S -> aA S -> S -> AB A -> aA A -> B -> bbB B ->
30 Remove Unit Production Example TERM -> FACTOR FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR TERM -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR
31 Left Factor Example S -> abS S -> aaA S -> a A -> bA A -> S -> aX X -> bS X -> aA X -> A -> bA A ->
32 Left Factor Example EXPRESSION -> SIMPLE_EXPR | SIMPLE_EXPR relop SIMPLE_EXPR EXPRESSION -> SIMPLE_EXPR RestOfExp RestOfExp -> | relop SIMPLE_EXPR
33 In-Class Exercise #5 Remove Unit Production S -> abS | bSa | A | d A -> c | dA Left Factor this grammar FACTOR -> id | id ( EXPR_LIST ) | num | ( EXPRESSION ) | not FACTOR Remove Left recursion: SIMPLE_EXPR -> TERM | SIGN TERM | SIMPLE_EXPR addop TERM