Download presentation
Presentation is loading. Please wait.
1
CH2.1 CSE4100 Chapter 2: A Simple One Pass Compiler Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre
2
CH2.2 CSE4100 The Entire Compilation Process Grammars for Syntax Definition Syntax-Directed Translation Parsing - Top Down & Predictive Pulling Together the Pieces The Lexical Analysis Process Symbol Table Considerations A Brief Look at Code Generation Concluding Remarks/Looking Ahead
3
CH2.3 CSE4100 Grammars for Syntax Definition A Context-free Grammar (CFG) Is Utilized to Describe the Syntactic Structure of a Language A CFG Is Characterized By: 1. A Set of Tokens or Terminal Symbols 2. A Set of Non-terminals 3. A Set of Production Rules Each Rule Has the Form NT {T, NT}* 4. A Non-terminal Designated As the Start Symbol
4
CH2.4 CSE4100 Grammars for Syntax Definition Example CFG list list + digit list list - digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (the “|” means OR) (So we could have written list list + digit | list - digit | digit )
5
CH2.5 CSE4100 Grammars are Used to Derive Strings: Using the CFG defined on the previous slide, we can derive the string: 9 - 5 + 2 as follows: list list + digit list - digit + digit digit - digit + digit 9 - digit + digit 9 - 5 + digit 9 - 5 + 2 P1 : list list + digit P2 : list list - digit P3 : list digit P4 : digit 9 P4 : digit 5 P4 : digit 2
6
CH2.6 CSE4100 Grammars are Used to Derive Strings: This derivation could also be represented via a Parse Tree (parents on left, children on right) list digit list digit list 9 5 2 - + list list + digit list - digit + digit digit - digit + digit 9 - digit + digit 9 - 5 + digit 9 - 5 + 2
7
CH2.7 CSE4100 A More Complex Grammar What is this grammar for ? What does “ ” represent ? What kind of production rule is this ? block begin opt_stmts end opt_stmts stmt_list | stmt_list stmt_list ; stmt | stmt
8
CH2.8 CSE4100 Defining a Parse Tree More Formally, a Parse Tree for a CFG Has the Following Properties: Root Is Labeled With the Start Symbol Leaf Node Is a Token or Interior Node (Now Leaf) Is a Non-Terminal If A x1x2…xn, Then A Is an Interior; x1x2…xn Are Children of A and May Be Non- Terminals or Tokens
9
CH2.9 CSE4100 Other Important Concepts Ambiguity string + 2 - 59 Why is this a Problem ? Grammar: string string + string | string – string | 0 | 1 | …| 9 Two derivations (Parse Trees) for the same token string. string - 9 + 52
10
CH2.10 CSE4100 Other Important Concepts Associativity of Operators Left vs. Right right letter right letter right c b a - + right letter = right | letter letter a | b | c | …| z list digit list digit list 9 5 2 - +
11
CH2.11 CSE4100 Other Important Concepts Operator Precedence What does 9 + 5 * 2 mean? Typically ( ) * / + - is precedence order This can be incorporated into a grammar via rules: expr expr + term | expr – term | term term term * factor | term / factor | factor factor digit | ( expr ) digit 0 | 1 | 2 | 3 | … | 9 Precedemce Achieved by: expr & term for each precedence level Rules for each are left recursive or associate to the left
12
CH2.12 CSE4100 Syntax-Directed Translation Associate Attributes With Grammar Rules & Constructs and Translate As Parsing Occurs Our Example Uses Infix to Postfix Notation Translation for Expressions Translation May Be Defined Inductively As: Postfix(e), E is an Expression 1. If E is a variable | constant Postfix(E) = E 2. If E is E1 op E2 Postfix(E) = Postfix(E1 op E2) = Postfix(E1) Postfix(E2) op 3. If E is (E1) Postfix(E) = Postfix(E1) Examples: ( 9 – 5 ) + 2 9 5 – 2 + 9 – ( 5 + 2 ) 9 5 2 + -
13
CH2.13 CSE4100 ) Syntax-Directed Definition: (2 parts) Each Production Has a Set of Semantic Rules Each Grammar Symbol Has a Set of Attributes For the Following Example, String Attribute “t” is Associated With Each Grammar Symbol, i.e., What is a Derivation for 9 + 5 - 2? expr expr – term | expr + term | term term 0 | 1 | 2 | 3 | … | 9
14
CH2.14 CSE4100 ) Syntax-Directed Definition: (2 parts) Each Production Rule of the CFG Has a Semantic Rule Note: Semantic Rules for expr Use Synthesized Attributes Which Obtain Their Values From Other Rules. Production Semantic Rule expr expr + term expr.t := expr.t || term.t || ‘+’ expr expr – term expr.t := expr.t || term.t || ’-’ expr term expr.t := term.t term 0 term.t := ‘0’ term 1 term.t := ‘1’ …. term 9 term.t := ‘9’
15
CH2.15 CSE4100 Semantic Rules are Embedded in Parse Tree expr.t =95- expr.t =9 expr.t =95-2+ term.t =5 term.t =2 term.t =9 2 +5-9 How Do Semantic Rules Work ? What Type of Tree Traversal is Being Performed? How Can We More Closely Associate Semantic Rules With Production Rules ?
16
CH2.16 CSE4100 Examples rest + term rest rest + term {print(‘+’)}rest (Print ‘+’ After term for postfix translation) expr expr + term {print(‘+’)} expr - term {print(‘-’)} term term 0 {print(‘0’)} term 1 {print(‘1’)} … term 9 {print(‘9’)} term expr 9 5 2 - + {print(‘-’)} {print(‘9’)} {print(‘5’)} {print(‘2’)} {print(‘+’)}
17
CH2.17 CSE4100 Parsing – Top-Down & Predictive Top-Down Parsing Parse tree / derivation of a token string occurs in a top down fashion. For Example, Consider: type simple | id | array [ simple ] of type simple integer | char | num dotdot num Suppose input is : array [ num dotdot num ] of integer The parse would begin with type array [ simple ] of type
18
CH2.18 CSE4100 Top-Down Parse (type = start symbol) type]simpleof[array type ]simpleof[array type num dotdot Input : array [ num dotdot num ] of integer Tokens
19
CH2.19 CSE4100 Top-Down Parse (type = start symbol) Input : array [ num dotdot num ] of integer type]simpleof[array type num dotdotsimple type]simpleof[array type num dotdotsimple integer
20
CH2.20 CSE4100 Top-Down Process Recursive Descent or Predictive Parsing Parser Operates by Attempting to Match Tokens in the Input Stream Utilize both Grammar and Input Below to Motivate Code for Algorithm array [ num dotdot num ] of integer type simple | id | array [ simple ] of type simple integer | char | num dotdot num procedure match ( t : token ) ; begin if lookahead = t then lookahead : = nexttoken else error end ;
21
CH2.21 CSE4100 Top-Down Algorithm (Continued) procedure type ; begin if lookahead is in { integer, char, num } then simple else if lookahead = ‘ ’ then begin match (‘ ’ ) ; match( id ) end else if lookahead = array then begin match( array ); match(‘[‘); simple; match(‘]’); match(of); type end else error end ; procedure simple ; begin if lookahead = integer then match ( integer ); else if lookahead = char then match ( char ); else if lookahead = num then begin match (num); match (dotdot); match (num) end else error end ;
22
CH2.22 CSE4100 Problem with Top Down Parsing expr expr + term | expr - term | term term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr term rest rest + term rest | - term rest | term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 * New Semantic Actions ! rest + term {print(‘+’)} rest | - term {print(‘-’)} rest | Left Recursion in CFG May Cause Parser to Loop Forever Solution: Algorithm to Remove Left Recursion
23
CH2.23 CSE4100 Comparing Grammars with Left Recursion Notice Location of Semantic Actions in Tree What is Order of Processing? expr term { print (‘2’)} { print (‘+’)} {print(‘5’)} {print(‘-’)} {print(‘9’)} 5 + 2 - 9
24
CH2.24 CSE4100 Comparing Grammars without Left Recursion Now, Notice Location of Semantic Actions in Tree for Revised Grammar What is Order of Processing in this Case? {print(‘2’)} expr term term {print(‘-’)} term {print(‘+’)} {print(‘5’)} {print(‘9’)} rest 2 5 - 9 +
25
CH2.25 CSE4100 The Lexical Analysis Process A Graphical Depiction uses getchar ( ) to read character pushes back c using ungetc (c, stdin) returns token to caller tokenval Sets global variable to attribute value lexan ( ) lexical analyzer
26
CH2.26 CSE4100 The Lexical Analysis Process Functional Responsibilities Input Token String Is Broken Down White Space and Comments Are Filtered Out Individual Tokens With Associated Values Are Identified Symbol Table Is Initialized and Entries Are Constructed for Each “Appropriate” Token Under What Conditions will a Character be Pushed Back? Can You Cite Some Examples in Programming Language Statements?
27
CH2.27 CSE4100 Algorithm for Lexical Analyzer function lexan: integer ; var lexbuf : array[ 0.. 100 ] of char ; c : char ; begin loop begin read a character into c ; if c is a blank or a tab then do nothing else if c is a newline then lineno : = lineno + 1 else if c is a digit then begin set tokenval to the value of this and following digits ; return NUM end
28
CH2.28 CSE4100 Algorithm for Lexical Analyzer else if c is a letter then begin place c and successive letters and digits into lexbuf ; p : = lookup ( lexbuf ) ; if p = 0 then p : = iinsert ( lexbf, ID) ; tokenval : = p return the token field of table entry p end else / * token is a single character * / set tokenval to NONE ; / * there is no attribute * / return integer encoding of character c end Note: Insert / Lookup operations occur against the Symbol Table !
29
CH2.29 CSE4100 Symbol Table Considerations ARRAY symtable lexptr token attributes div mod id 0123401234 EOSi tnuoc dom vid ARRAY lexemes OPERATIONS: Insert (string, token_ID) Lookup (string) NOTICE: Reserved words are placed into symbol table for easy lookup Attributes may be associated with each entry, i.e., Semantic Actions Typing Info: id integer etc.
30
CH2.30 CSE4100 A Brief Look at Code Generation Back-end of Compilation Process - Which Will Not Be Our Emphasis We’ll Focus on Front-end Important Concepts to Re-emphasize Abstract Syntax Machine for Intermediate Code Generation L-value Vs. R-value I : = 5 ; L - Location I : = I + 1 ; R - Contents May Be Attributes in Symbol Table
31
CH2.31 CSE4100 A Brief Look at Code Generation Employ Statement Templates for Code Generation. Each Template Characterizes the Translation Different Templates for Each Major Programming Language Construct, if, while, procedure, etc. IF code for expr gofalse out code for stmt label out WHILE label test code for expr gofalse out code for stmt goto test label out
32
CH2.32 CSE4100 Concluding Remarks / Looking Ahead We’ve Reviewed / Highlighted Entire Compilation Process Introduced Context-free Grammars (CFG) and Indicated /Illustrated Relationship to Compiler Theory Reviewed Many Different Versions of Parse Trees That Assist in Both Recognition and Translation We’ll Return to Beginning - Lexical Analysis We’ll Explore Close Relationship of Lexical Analysis to Regular Expressions, Grammars, and Finite Automatons
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.