Download presentation
Presentation is loading. Please wait.
1
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation Techniques Dr. Ying JIN Associate Professor Oct. 2007
2
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -2- Role of Parsing in a Compiler Lexical Analysis scanning Syntax Analysis Parsing Semantic Analysis Intermediate Code Optimization Intermediate Code Generation Target Code Generation analysis/front endsynthesis/back end
3
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -3- What will be introduced About Parsing –General information about a Parser (parsing) Functional requirement (input, output, main function) Process –General techniques in developing a scanner How to define the syntax of a programming language? –Why not regular expressions? (not enough) –Context free grammar ( 上下文无关文法 ) How to implement a parser with respect to its definition of syntax? –Top-down Parsing –Bottom-up Parsing
4
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -4- Knowledge Relation Graph Develop a Parser Syntax definition basing on Context Free Grammar using implement Top-down Bottom-up
5
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -5- § 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
6
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -6- 3.1 The Parsing Process Function of a Parser –Input : Token / TokenList –Output: internal representation of syntactic structure –Process Read tokens Establish syntactic structure – parse tree/syntax tree, according to the syntax definition (context free grammar); Check syntactical errors
7
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -7- Syntactic Structures Rules for describing the structure of a well-defined program (1) Program (2) Declaration - Constant declaration - Type declaration - Variable Declaration - Procedure/function declaration (3) Body (4) Statements (assignment, conditional, loop, function call) (5) Expressions (arithmetic, logical, boolean)
8
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -8- Syntax Errors Different Types of Syntax Errors –Following token for a syntactic structure is wrong ( 后继单词错 ) –Identifier/constant error ( 标识符或者常量错 ) –Keyword error( 关键字错 ) –Start token for a syntactic structure is wrong( 开始单词错 ) –Unbalanced parentheses( 括号配对错 )
9
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -9- Syntax Errors int GetMax( int x ; int y) { if (x>y) {return x } else return y; } } vod main() { real 10, y; 10 + ; GetMax( x, y) ; } 后继单词错 括号配对错 关键字错, 开始单词错 标识符错 开始单词错
10
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -10- Dealing with Errors Once an error is detected, how to deal with it? –Quit immediately, not practical –Error recovery –Error repair –Error correction –There is no “perfect” way to do it!
11
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -11- Different Types of Parsing Methods Universal parsing methods for any grammars –Cocke-Younger-Kasami algorithm –Earley’s algorithm Top-down parsing methods (limited grammars)- predictive –Recursive descendent parsing ( 递归下降法 ) –LL(k) -- k=1 Bottom-up parsing methods (limited grammars) – shift-reduce –SLR(k) –LR(k) –LALR(k) –Operator-precedence parsing ( 简单优先关系法 ) k=1 inefficient
12
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -12- § 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
13
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -13- 3.2 Context-free Grammars What is a Grammar? Chomsky classification of Grammars; Context Free Grammar (some concepts)
14
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -14- What is a Grammar? To define syntactic structure? A grammar G is a quadruple (V T, V N, S, P) V T is a finite set of terminal symbols( 有限的终极符集合 ) V N 是 is a finite set of non-terminal symbols( 有限的非终极符 集合 ) S is start symbol , S V N P is a set of production rules( 产生式的集合 ) , each production rule has following form : , where , (V T V N )*
15
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -15- 文法的分类 O 型文法 : 也称为短语文法,其产生式具有形式 : → ,其中 , (V T V N )* ,并且 至少含一个非终极符 ; 1 型文法 : 也称为上下文有关文法。它是 0 型文法的特例,即 要求 | | | | (S→ 除外,但 S 不得出现于产生式右部 ) 2 型文法 : 也称为上下文无关文法。它是 1 型文法的特例,即 要求产生式左部是一个非终极符 : A→ 3 型文法 : 也称为正则文法。它是 2 型文法的特例,即其产生 式的右部至多有两个符号,而且具有下面形式之一 : A →a , A →a B, 其中 A,B V N , a V T
16
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -16- 文法的分类 描述能力 O 型文法 > 1 型文法 > 2 型文法 > 3 型文法 对应自动机 –O 型文法 : 图灵机 –1 型文法 : 线性有界自动机 –2 型文法 : 下推自动机 –3 型文法 : 有限自动机
17
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -17- Context Free Grammar (CFG) 定义为四元组 (V T, V N, S, P) V T 是有限的终极符集合 V N 是有限的非终极符集合 S 是开始符, S V N P 是产生式的集合,且具有下面的形式: A X 1 X 2 …X n 其中 A V N , X i (V T V N ) ,右部可空。
18
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -18- Example V T = {i, n, +, *, (, )} V N = {E, T, F } P: E T E E + T T F T T * F F (E) F i F n S = E
19
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -19- Derivation ( 直接推导 ) : if there is a production A , we can have A , where represents one step derivation ( 用 A → 一步推导 ). We can say that is derived from A ; 的含义是,使用一条规则,代替 左边的某个 符号,产生 右端的符号串
20
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -20- + : represents one or more steps derivation ( 通过一步或多步可推导出 ) * : 表示 通过 0 步或多步可推导出
21
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -21- Example (E) (E+T) E + (i+n) E+E+T * E+i+F P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n
22
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -22- 句型: if S * ,则称符号串 为 G 的句型。我们用 SF(G) 表示文法 G 的所有句型的集合 ; Sentence( 句子 ) :只包含终极符的句型被称为 G 的句子 Language( 语言 ) : L( G ) = { u | S + u, u V T * } the set of all sentences of G; G = (V T,V N, S, P)
23
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -23- Example 句型 : E, T, E+T, F, T*F, i, n, (E), ……. 句子 : i, n, (i), (n), i+i, i+n, …… 语言 :{i, n, (i), (n), i+i, i+n, ……} P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n
24
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -24- Leftmost(rightmost) derivation 最左(右)推导: 如果进行推导时选择的是句型中的最左 ( 右)非终极符,则 称这种推导为最左 ( 右 ) 推导,并用符号 lm ( r m )表示最 左(右)推导。 左(右)句型: 用最左推导方式导出的句型,称为左句型,而用最右推导方 式导出的句型,称为右句型 ( 规范句型 ) 。 conclusion : each sentence has its rightmost or leftmost derivation (但 对句型此结论不成立)
25
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -25- Example 最左推导 : i+T*n +F lm i + F*n + F 最右推导 : i+T*n +F lm i + T*n +i 左句型 : i + i*F 右句型 : E+(i) 特例 : i + (T*i) P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n
26
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -26- Some Notes CFG (V T,V N, S, P) will be used to define syntactic structure of a programming language; Normally V T will be set of tokens of the programming language; one terminal symbol might be one token, one token type, or one symbol representing certain structure; Non-terminal symbols act as intermediate representation of certain structure; Productions are rules on how to derive syntactic structure;
27
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -27- Example(1) Arithmetic Expressions V T = {id, num, (, ), +, -, *, /} V N = {Exp} P: Exp Exp + Exp Exp Exp – Exp Exp Exp * Exp Exp Exp / Exp Exp (Exp) Exp id Exp num S = Exp
28
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -28- Example(2) Program V T = {VarDec, TypeDec, ConstDec, MainFun, FunDec } V N = {Program, Dec, Decs} P: Program Decs MainFun Decs Dec Dec VarDec Dec ConstDec Dec FunDec Dec TypeDec Decs Dec Decs Dec Decs S = Program
29
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -29- Example(3) Variable Declaration V T = {Type, id, ;,, } V N = {VarDec, VarDef, VarDefList, idList} P: VarDec VarDec VarDef ; VarDec VarDef; VarDec S = VarDec VarDef Type idList idList id idList id, idList P: 变量声明 变量声明 一个变量定义 ; 变量声明 一个变量定义 ; 变量声明 变量定义 类型 标识符序列 标识符序列 一个标识符 标识符序列 一个标识符, 标识符序列
30
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -30- Extended BNF ( 扩展巴克斯范式 ) Extended Notations –Optional | A | | A A A –Repetition * or {} A A | (left recursive) A * A A | (right recursive) A *
31
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -31- Some algorithms on Grammar Transformation 文法等价变化 : L(G1) = L(G2) 增补文法 : the start symbol does not appear in the right part of any productions; ( 定理 2.1) –Z S
32
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -32- Some algorithms on Grammar Transformation 文法 G 中除了 S 以外的空产生式 (A ) 可以消除 ; ( 定理 2.2) – 找出所有可以推导出 的非终极符, 记为 S ; – 对于所有产生式 A X1 X2 …… Xi-1XiXi+1……Xn Xi S 增加 A X1 X2 …… Xi-1Xi+1……Xn 直到没有新的产生式产生 删除对应空产生式, 除了 S
33
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -33- Example S | A a BB A B B | a B |b {S, B, A} A B A BB S a BB S A a BB S A a B S a B S a BB S Aa S A a B S a S a B S A a S a
34
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -34- Some algorithms on Grammar Transformation 消除无用产生式 ( 定理 2.3) – 无用产生式 : A , A 不出现在任何句型中 ; –How to find such non-terminal symbols? – 生成会出现在句型中的非终极符集合 SS SS = {S} Si SS, Si 1,……, Si n 把 1,……, n 中的所有非终极符加入 SS 中 ; 重复直到没有新的加入 ; – 不属于 SS 的非终极符, 它们的产生式是无用产生式, 删除掉 ;
35
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -35- Example S | A a BB A B B | a B |b C c {S} {S, A, B} C c 是无用产生式
36
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -36- Some algorithms on Grammar Transformation 消除特型产生式 : ( 定理 2.4) 特型产生式 : A B Algorithm – 对每个非终极符 A, 构造 SA = {B| A +B, B V N } – 如果 C SA, 而且 C 不是特型产生式, 则增加 A ; – 删除特型产生式 ; – 删除无用产生式 ;
37
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -37- Example S | A A B | a B b S: {A,B} A: {B} S a S b A b S | a | b A a | b B b
38
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -38- Some algorithms on Grammar Transformation 消除公共前缀 (left factoring) 公共前缀 –A 1 | … | n| 1|…| m 提取公因子 –A A’| 1|…| m –A’ 1 | … | n
39
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -39- Some algorithms on Grammar Transformation 消除左递归 (left recursion) – 直接左递归 :A A 1 | … | A n| 1|…| m – 消除方法 : A 1A’|…| mA’ A 1A’ | … | nA’|
40
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -40- Example P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n E T E ’ E ’ +T E ’ | E E + T | T = + T 1 = T
41
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -41- Some algorithms on Grammar Transformation 消除左递归 (left recursion) – 间接左递归 : – 消除方法 : Pre-conditions Algorithm S A b A S a | b 1:S 2:A A Aba | b A bA ’ A ’ baA ’ |
42
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -42- § 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
43
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -43- 3.3 Parse Trees & Abstract Syntax Tree Problem: Derivation( 推导 ) is a way to construct a sentence from the start symbol; –Many derivation for the same sentence; –Not uniquely represent the structure of the sentence Parse tree: one way to represent the structure of a sentence;
44
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -44- Example P: (1) E T (2) E E + T (3) T F (4) T T * F (5) F (E) (6) F i (7) F n sentence : i + i * n Several derivations for it Parse Tree
45
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -45- Parse Tree A labeled tree for a CFG The root must be labeled with the start symbol; Each node has a symbol associated with it; Each leaf must be labeled with a terminal symbol ; For each node which is associated with a non- terminal symbol A, has n sons, from left to right they are associated with symbols B1, …, Bn, then there must be a production A B1 … Bn
46
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -46- Abstract Syntax Tree Problem with Parse tree –Includes much more than necessary nodes Abstract Syntax Tree –contains only those nodes necessary for compilation sentence : i + i * n
47
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -47- § 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
48
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -48- 3.4 Ambiguous Grammar 二义性文法 –For a Grammar G, if there exists one sentence which has more than one parse tree, G is called ambiguous Grammar;
49
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -49- Example P: Exp Exp + Exp Exp Exp – Exp Exp Exp * Exp Exp Exp / Exp Exp (Exp) Exp id Exp num id + id + id
50
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -50- § 3 Context Free Grammar & Parsing 3.1 The Parsing Process (语法分析过程) 3.2 Context-free Grammars (上下文无关文法) 3.3 Parse Trees and Abstract Syntax Tree (语法分析树和抽象语法树) 3.4 Ambiguous (二义性) 3.5 Syntax of Sample Language (简单语言的语法)
51
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -51- 3.5 Syntax of Sample Language CFG for ToyL Programming Language
52
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -52- Toy language Syntax (structure of program) {}{} variable declaration sequence of statements var int x, y; …… +, -, *, /, (, ) x := a+b; if x>0 then … else … ; while x<10 { … }; read(x); write(x+y) ;
53
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -53- CFG for ToyL V T = { id, num, +, -, *, /, >, <, =, {, }, :, ;, (, ), var, if, then, else, while, read, write, int, real, bool, ass} V N = { Prg, VarDec, VarDefList, VarDef, Body, IdList, Stms, Type, Stm, Exp} S = Prg
54
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -54- CFG for ToyL Variable Declaration: VarDec VarDec var VarDefList ; Program: Prg VarDec Body Variable Definition List: VarDefList VarDef; VarDec VarDef ; VarDec
55
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -55- CFG for ToyL Variable Definition: VarDef Type IdList IdList id IdList id, IdList Type int Type real Type bool
56
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -56- CFG for ToyL Body: Body { Stms } Stms Stm Stm Stm; Stms
57
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -57- CFG for ToyL Statement: Stm Assig | if-S | while-S | read-S | write-S Assig id ass Exp If-S if Exp then Stms else Stms while-S while Exp Body read-S read (id) write-S write (Exp)
58
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -58- CFG for ToyL Expressions: Exp E | relE relE E > E | E < E | E=E P: (1) E T (2) E E + T (3) E E - T (4) T F (5) T T * F (6) T T / F (7) F (E) (8) F i (9) F n
59
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -59- Homework Define the syntax of following C statements with CFG –Assignment –If statement –While statement –Case statement Assuming that CFGs for expressions and variables have already defined;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.