Download presentation
Presentation is loading. Please wait.
Published byRodger Tucker Modified over 9 years ago
1
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion 8Interpretation 9Review
2
Syntax Analysis (Chapter 4) 2 The “Phases” of a Compiler Syntax Analysis Contextual Analysis Code Generation Source Program Abstract Syntax Tree Decorated Abstract Syntax Tree Object Code Error Reports This chapter
3
Syntax Analysis (Chapter 4) 3 In Chapter 4 Syntax Analysis –Scanning: recognize “words” or “tokens” in the input –Parsing: recognize structure of program Different parsing strategies How to construct a recursive descent parser –AST Construction Use of theoretical “Tools”: –Regular Expressions and Finite–State Machines –Grammars –Extended BNF notation –First sets and Follow sets
4
Syntax Analysis (Chapter 4) 4 Syntax Analysis The “job” of syntax analysis is to read the source program (text file) and determine its structure. Subphases –Scanning –Parsing –Construct an internal representation of the source text that shows the structure (usually an AST) Note: A single-pass compiler usually does not explicitly construct an AST.
5
Syntax Analysis (Chapter 4) 5 Multi Pass Compiler Compiler Driver Syntactic Analyzer calls Contextual AnalyzerCode Generator calls Dependency diagram of a typical Multi Pass Compiler: A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. input Source Text output AST input output Decorated AST input output Object Code This chapter
6
Syntax Analysis (Chapter 4) 6 Syntax Analysis Scanner Source Program Abstract Syntax Tree Error Reports Parser Stream of “Tokens” (Stream of Characters) Error Reports Dataflow chart
7
Syntax Analysis (Chapter 4) 7 (1) Scan: Divide Input into Tokens An example Mini–Triangle source program: let var y: Integer in !new year y := y+1 let var ident. y scanner colon : ident. Integer in ident. y becomes :=... ident. y op. + intlit 1 eot Tokens are “words” in the input, for example keywords, operators, identifiers, literals, etc.
8
Syntax Analysis (Chapter 4) 8 (2) Parse: Determine structure of program Parser analyzes the structure of the token stream with respect to the grammar of the language. let var id. y col. : id. Int in id. y bec. := id. y op + intlit 1 eot Ident Op. Int.Lit V-Name Type Denoter single-Declaration Declaration primary-Exp Expression single-Command Program
9
Syntax Analysis (Chapter 4) 9 (3) AST Construction Program LetCommand Ident OpInt.Lit SimpleType VarDecl SimpleVar VNameExpInt.Expr SimpleVar BinaryExpr AssignCommand y Integer Ident yy+1
10
Syntax Analysis (Chapter 4) 10 Grammars RECAP: –The Syntax of a Language can be specified by means of a CFG (Context Free Grammar). –CFG can be expressed in BNF (Bachus-Naur Form) Example: Mini–Triangle grammar in BNF Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | begin Command end |... Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | begin Command end |...
11
Syntax Analysis (Chapter 4) 11 Grammars (continued) For our convenience, we will use EBNF or “Extended BNF” rather than simple BNF. EBNF = BNF + regular expressions Program ::= single-Command Command ::= (single-Command ;)* single-Command single-Command ::= V-name := Expression | begin Command end |... Program ::= single-Command Command ::= (single-Command ;)* single-Command single-Command ::= V-name := Expression | begin Command end |... Example: Mini Triangle in EBNF * means 0 or more occurrences of
12
Syntax Analysis (Chapter 4) 12 Regular Expressions RE are a notation for expressing a set of strings of terminal symbols. Different kinds of RE: The empty string tGenerates only the string t X YGenerates any string xy such that x is generated by x and y is generated by Y X | YGenerates any string which generated either by X or by Y X*The concatenation of zero or more strings generated by X (X)Used for grouping
13
Syntax Analysis (Chapter 4) 13 RE: Examples What sets of strings do each of the following RE generate? 1. 2. M(r|s) “. ” 3. (foo|bar)* 4. (foo|bar)(foo|bar)* 5. (0|1|2|3|4|5|6|7|8|9)* 6. 0|(1|..|9)(0|1|..|9)* 1. 2. M(r|s) “. ” 3. (foo|bar)* 4. (foo|bar)(foo|bar)* 5. (0|1|2|3|4|5|6|7|8|9)* 6. 0|(1|..|9)(0|1|..|9)*
14
Syntax Analysis (Chapter 4) 14 Regular Expressions The “languages” that can be defined by RE and CFG have been extensively studied by theoretical computer scientists. These are some important conclusions / terminology –RE is a “weaker” formalism than CFG: Any language expressible by a RE can be expressed by CFG but not the other way around! –The languages expressible as RE are called regular languages –Generally: a language that exhibits “self–embedding” cannot be expressed by RE. –Programming languages exhibit self–embedding. (Examples: an expression can contain another expression, and a command can contain another command).
15
Syntax Analysis (Chapter 4) 15 Extended BNF Extended BNF combines BNF with RE A production in EBNF looks like LHS ::= RHS where LHS is a non terminal symbol and RHS is an extended regular expression An extended RE is just like a regular expression except it is composed of terminals and non–terminals of the grammar. Simply put, EBNF adds to BNF these notations –(...) for the purpose of grouping and –* for denoting “0 or more repetitions of … ”
16
Syntax Analysis (Chapter 4) 16 Extended BNF: an Example Expression ::= PrimaryExp (Operator PrimaryExp)* PrimaryExpression ::= Literal | Identifier | ( Expression ) Identifier ::= Letter (Letter|Digit)* Literal ::= Digit Digit* Letter ::= a | b | c |... |z Digit ::= 0 | 1 | 2 | 3 | 4 |... | 9 Expression ::= PrimaryExp (Operator PrimaryExp)* PrimaryExpression ::= Literal | Identifier | ( Expression ) Identifier ::= Letter (Letter|Digit)* Literal ::= Digit Digit* Letter ::= a | b | c |... |z Digit ::= 0 | 1 | 2 | 3 | 4 |... | 9 Example: a simple expression language
17
Syntax Analysis (Chapter 4) 17 A little bit of useful theory We will now look at a few useful bits of theory. These will be necessary later when we implement parsers. –Grammar transformations A grammar can be transformed in a number of ways without changing its meaning (i.e. its language, or the set of strings that it generates) –The definition and computation of starter sets (first sets), follow sets, and nullable symbols
18
Syntax Analysis (Chapter 4) 18 Grammar Transformations Left factorization single-Command ::= V-name := Expression | if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= V-name := Expression | if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= V-name := Expression | if Expression then single-Command ( | else single-Command) single-Command ::= V-name := Expression | if Expression then single-Command ( | else single-Command) X Y | X ZX Y | X Z X ( Y | Z ) Example: X Y= Z
19
Syntax Analysis (Chapter 4) 19 Grammar Transformations (continued) Elimination of Left Recursion N ::= X | N Y Identifier ::= Letter | Identifier Letter | Identifier Digit Identifier ::= Letter | Identifier Letter | Identifier Digit N ::= X Y * Example: Identifier ::= Letter | Identifier (Letter|Digit) Identifier ::= Letter | Identifier (Letter|Digit) Identifier ::= Letter (Letter|Digit)*
20
Syntax Analysis (Chapter 4) 20 Grammar Transformations (continued) Substitution of non-terminal symbols N ::= X M ::= N single-Command ::= for controlVar := Expression direction Expression do single-Command direction ::= to | downto single-Command ::= for controlVar := Expression direction Expression do single-Command direction ::= to | downto Example: N ::= X M ::= X single-Command ::= for controlVar := Expression (to|downto) Expression do single-Command single-Command ::= for controlVar := Expression (to|downto) Expression do single-Command
21
Syntax Analysis (Chapter 4) 21 Starter Sets (a.k.a. First Sets) Informal Definition: The starter set of a RE X is the set of terminal symbols that can occur as the start of any string generated by X Example : starters[ ( “ + ” | - | ) (0 | 1 | … | 9) + ] = { +, -, 0, 1, …, 9 } Formal Definition: starters[ ={ } starters[t ={t} (where t is any terminal symbol) starters[X Y] = starters[X] (if X doesn’t generate ) starters[X Y = starters[X starters[Y if X generates ) starters[X | Y = starters[X starters[Y starters[X* = starters[X
22
Syntax Analysis (Chapter 4) 22 Derivations Replacing a non-terminal S ::= E E ::= T | E + T T ::= i | ( E ) S ::= E E ::= T | E + T T ::= i | ( E ) S S S => E S => E => E + T S => E => E + T => T + T S => E => E + T => T + T => i + T S => E => E + T => T + T => i + T => i + i This is a left-most derivation (it replaces the left-most non-terminal at each step. Can you find the corresponding right-most derivation? Can you find a derivation that is neither left-most nor right-most? This is a left-most derivation (it replaces the left-most non-terminal at each step. Can you find the corresponding right-most derivation? Can you find a derivation that is neither left-most nor right-most?
23
Syntax Analysis (Chapter 4) 23 Sentential forms A sequence of grammar symbols that can be derived from the start symbol A sentence is a sentential form that contains only terminal symbols, that is, a string that can be generated using the grammar. S => E => E + T => T + T => i + T => i + i
24
Syntax Analysis (Chapter 4) 24 Ambiguous grammars A grammar is ambiguous if some sentence has more than one distinct parse tree. Equivalently, a grammar is ambiguous if some sentence has more than one left-most derivation, or more than one right-most derivation. S ::= E E ::= i | ( E ) | E + E S ::= E E ::= i | ( E ) | E + E Does i + i demonstrate the ambiguity? Does i + i demonstrate the ambiguity? E => E + E => i + E => i + i Does i + i + i demonstrate the ambiguity? Does i + i + i demonstrate an ambiguity? E => E + E => i + E => i + E + E => i + i + E => i + i + i E => E + E => E + E + E => i + E + E => i + i + E => i + i + i Does i + i + i demonstrate an ambiguity? E => E + E => i + E => i + E + E => i + i + E => i + i + i E => E + E => E + E + E => i + E + E => i + i + E => i + i + i
25
Syntax Analysis (Chapter 4) 25 Augmented grammars We augment grammars to ensure that we can recognize and handle the end of the input string S ::= E E ::= i | ( E ) | E + E S ::= E E ::= i | ( E ) | E + E S ’ ::= S $ S ::= E E ::= i | ( E ) | E + E S ’ ::= S $ S ::= E E ::= i | ( E ) | E + E Here $ denotes the end-of-file token
26
Syntax Analysis (Chapter 4) 26 Nullable, First sets (starter sets), and Follow sets A non-terminal is nullable if it derives the empty string First(N) or starters(N) is the set of all terminals that can begin a sentence derived from N Follow(N) is the set of terminals that can follow N in some sentential form Next we will see algorithms to compute each of these.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.