Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.

Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression letters combination string Language set Alphabet Table Name conjunction exponent Union LUM concatenation LM closure L* Positive closure L + Computer Realization Transition Diagrams Non- deterministic Finite Automata Deterministic Finite Automata equal Minimization of Finite Automata Manual Syntax- directed Subset construction Merge undistinguished state Lex State Enumeration

Chapter 3 Syntax Analysis LexicalAnalyzertoken Get next token Source program Parse tree Rest of Front End Parser Intermediate Representation Symbol Table Contents Context-Free Grammars Top-Down Parsing and Bottom-Up Parsing Automatic generation of parser

expression identifier expression (initial) identifier (rate) num (60) * + syntax analysis ： syntax token - 〉 syntax phrase （ Parse Tree ） object attributeobject Adjective (excellent) noun （ DLUT Student ） initial + rate * 60 Excellent DLUT Student id + id * num Adjective noun

characterstringtoken Lexical Analyzer （ regular expression ） expression sentence Program block program Syntactic analyzer parse tree

3.1 Context-free Grammar 3.1.1 Context-free Grammar Definition Regular Expression defines simple language, represents a fixed number of given structure repetition or not specified number of repetition ex ： a (ba) 5, a (ba)* Regular expression cannot define all expressions with properly balanced parentheses and nested block structure ex ： set of paired parentheses strings ， {wcw | w is a and b series} {wcw | w is a and b series}

3.1 Context-free Grammar Context-free Grammar is tetrad （ V T, V N, S, P ） V T : Terminals V N : Nonterminal S : start symbol P :productions,form of production: A   ex ( {id, +, *, , (, )}, {expr, op}, expr, P ) expr  expr op expr expr  (expr) expr   expr expr  id op  + op  *

3.1 Context-free Grammar Simplified Representation Following symbols usually represent terminals 1 ） lowercase letters early in the alphabet, ex:a,b,c 2)Boldface string, ex:id, while 3)digit 0,1, …,9 4)interpunction ， ex:bracket ， comma 5)Operation symbol ， ex:+,- Following symbols usually represent nonterminal 1 ） uppercase letters early in the alphabet,ex:A,B,C 2)Letter S, usually represents start symbol 3)Lowercase ， ex:expr 、 stmt Besides ， 1)Uppercase letters late in the alphabet,such as X,Y is either nonterminal or terminals 2)Lowercase letters late in the alphabet, like u,v,..represents strings of terminals. 3)Lowercase Greek letters represents strings of grammar symbols. 4)If A — >a1 ， A — >a2 ， then A — >a1|a2

3.1 Context-free Grammar Ex: ( {id, +, *, , (, )}, {expr, op}, expr, P ) expr  expr op expr expr  (expr) expr   expr expr  id op  + op  * Simplified representation E  E A E | (E ) |  E | id A  + | *

3.1 Context-free Grammar Context-free Grammar E  E A E | (E ) |  E | id A  + | * Regular expression letter  [A-Za-z] digit  [0-9] id  letter(letter|digit)* Comparison: Context-free Grammar & regular expression

3.1 Context-free Grammar 3.1.2 Derivations Productions are treated as rewriting rules, replaces a nonterminal by the body of one of its productions. ex E  E + E | E * E | (E ) |  E | id E   E   (E)   (E + E)   (id + E)   (id + id) Symbol S  *  、 S  + w definition Sentential form 、 sentence 、 context-free language 、 equivalent grammars

3.1 Context-free Grammar E  E + E | E * E | (E ) |  E | id E   E   (E)   (E + E) Leftmost derivation E  lm  E  lm  (E)  lm  (E + E)  lm  (id + E)  lm  (id + id) Rightmost derivation （ canonical derivations ） E  rm  E  rm  (E)  rm  (E + E)  rm  (E + id)  rm  (id + id) Leftmost derivation and rightmost derivation ？？

3.1 Context-free Grammar 3.1.3 Parse Tree E  lm  E  lm  (E)  lm  (E + E)  lm  (id + E)  lm  (id + id) E  rm  E  rm  (E)  rm  (E + E)  rm  (E + id)  rm  (id + id)

3.1 Context-free Grammar 3.1.3 Ambiguity E  E * E E  E + E  id * E  E * E +E  id * E  E * E +E  id * E + E  id * E + E  id * E + E  id * E + E  id * id + E  id * id + E  id * id + E  id * id + E  id * id + id  id * id + id  id * id + id  id * id + id E E E * + E E id E E E * + E E

3.2 Language and Grammar Context-free Grammar advantage Grammar gives explicit, easy understanding expressions of the expression Automate generate high-efficiency parser Define language hierarchy Grammar-based language is more easier to modified Context-free Grammar disadvantage Grammar can only describes most of the expressions

3.2 Language and Grammar 3.2.1 Comparison: Regular Expression and Context-free Grammar Regular expression (a|b) * abgrammar A 0  a A 0 | b A 0 | a A 1 A 1  b A 2 A 2   1 2 begin a 0 a b b

3.2 Language and Grammar 3.2.1 Comparison: Regular expression and Context-free Grammar NFA  Context-free Grammar confirm the terminals set For each state, create a nonterminal Ai If state I has a transition to state j on input a,add the production A i  aA j,if i is an accepting state,add A i   1 2 start a 0 a b b Grammar A0  a A0 | b A0 | a A1 A1  b A2 A2  NFA

3.2 Language and Grammar 3.2.2 Reason for lexical parser detach Why using regular expression defines the lexical Lexical rule is simple, do not need the context- free grammar. Using regular expression to describe lexical tokens is simple and easy to understand. Lexical analyzer based on regular expression is high-efficient.

3.2 Language and Grammar Reason for detaching the lexical analyses from syntax parsing Simplify the design Improve the compiler’s efficiency Enhance the compiler’s portability Easy for partitioning compiler front-end Modules

3.2 Language and Grammar 3.2.3 Verifying the language Generated by a Grammar G : S  (S ) S |  L(G) =set of strings of balanced Parentheses L(G) =set of strings of balanced Parentheses

3.2 Language and Grammar 3.2.3 Verifying the language Generated by a Grammar G : S  (S ) S |  L(G) =set of strings of balanced parentheses Show that every sentence derivable is balanced. Inductive proof on the number of steps n in a derivation

3.2 Language and Grammar 3.2.3 Verifying the language Generated by a Grammar G : S  (S ) S |  L(G) = set of strings of balanced parentheses Inductive proof on the number of steps n in a derivation Basis ： S   hypothesis ： less than nstep derivations produce balanced parentheses Procedure ： n step leftmost derivation ： S  (S )S  * (x) S  * (x) y

3.2 Language and Grammar 3.2.3 Verifying the language Generated by a Grammar G : S  (S ) S |  L(G) = set of strings of balanced parentheses Induction on the length of a sting :balanced parentheses is derivable from S

3.2 Language and Grammar 3.2.3 Verifying the language Generated by a Grammar G : S  (S ) S |  L(G) = set of strings of balanced parentheses Induction on the length of a sting :balanced parentheses can be derivate by S Basis ： S   hypothesis ： length less than 2n is derivable from S Procedure ： consider length is2n(n  1) w = (x) y S  (S )S  * (x) S  * (x) y

3.2 Language and Grammar 3.2.4 Proper Expression Grammar Expression production ： E  E + E | E * E | (E ) |  E | id Using a hierarchy view to see expression id * id * (id+id) + id * id + id E E E * + E E id E E E * + E E

3.2 Language and Grammar 3.2.4 Proper Expression Grammar Using a hierarchy view to see expression id * id * (id+id) + id * id + id id * id * (id+id)

3.2 Language and Grammar 3.2.4 Proper Expression Grammar Using a hierarchy view to see expression id * id * (id+id) + id * id + id id * id * (id+id)Grammar expr  expr + term | term

3.2 Language and Grammar 3.2.4 Proper Expression Grammar Using a hierarchy view to see expression id * id * (id+id) + id * id + id id * id * (id+id) Grammar expr  expr + term | term term  term * factor | factor

3.2 Language and Grammar 3.2.4 Proper Expression Grammar Using a hierarchy view to see expression id * id * (id+id) + id * id + id id * id * (id+id)Grammar expr  expr + term | term term  term * factor | factor factor  id | (expr)

3.2 Language and Grammar expr  expr + term | term term  term * factor | factor factor  id | (expr) expr id term factor id term * factor * expr + id factor term id term * factor Parse tree of id * id * id and id + id * id

3.2 Language and Grammar 3.2.5 Eliminating Ambiguity stmt  if expr then stmt | if expr then stmt else stmt | if expr then stmt else stmt | other | other Sentential form ： if expr then if expr then stmt else stmt Two Leftmost derivation ： stmt  if expr then stmt  if expr then if expr then stmt else stmt  if expr then if expr then stmt else stmt stmt  if expr then stmt else stmt  if expr then if expr then stmt else stmt  if expr then if expr then stmt else stmt

3.2 Language and Grammar non-Ambiguous Grammar stmt  matched _stmt | unmatched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt | if expr then matched_stmt else unmatched_stmt

3.2 Language and Grammar 3.2.6 Elimination of Left Recursion Grammar left Recursion A+A A+A A+A A+A Immediate left Recursion A  A  A  A  String character  Eliminate immediate left recursion Eliminate immediate left recursion A   A A   A | 

Exercise 3.1 3.2

Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.

Similar presentations

Presentation on theme: "Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.

Similar presentations

Presentation on theme: "Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression."— Presentation transcript:

Similar presentations

About project

Feedback