Lecture 3: Parsing CS 540 George Mason University.

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
1 Chapter 5: Bottom-Up Parsing (Shift-Reduce). 2 - attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working.
Context-Free Grammars Lecture 7
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
COP4020 Programming Languages
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Chapter 4 Grammars and Parsing. 2 Context-Free Grammars: Concepts and Notation A context-free grammar G = (Vt, Vn, S, P) –A finite terminal vocabulary.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Syntax Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Compilers: syntax/4 1 Compiler Structures Objective – –describe general syntax analysis, grammars, parse trees, FIRST and FOLLOW sets ,
Chap. 4, Formal Grammars and Parsing J. H. Wang Oct. 19, 2015.
Parsing Top-Down.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Bottom-Up Parsing David Woolbright. The Parsing Problem Produce a parse tree starting at the leaves The order will be that of a rightmost derivation The.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Lecture 5: LR Parsing CS 540 George Mason University.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Introduction to Parsing
9/30/2014IT 3271 How to construct an LL(1) parsing table ? 1.S  A S b 2.S  C 3.A  a 4.C  c C 5.C  abc$ S1222 A3 C545 LL(1) Parsing Table What is the.
Parsing #1 Leonidas Fegaras.
CS 404 Introduction to Compiler Design
Programming Languages Translator
CS510 Compiler Lecture 4.
Parsing and Parser Parsing methods: top-down & bottom-up
Context free grammars Terminals Nonterminals Start symbol productions
Introduction to Parsing (adapted from CS 164 at Berkeley)
FIRST and FOLLOW Lecture 8 Mon, Feb 7, 2005.
Top-Down Parsing.
CS 363 Comparative Programming Languages
3.2 Language and Grammar Left Factoring Unclear productions
Top-Down Parsing CS 671 January 29, 2008.
CS 540 George Mason University
Syntax Analysis source program lexical analyzer tokens syntax analyzer
CSC 4181Compiler Construction Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
Theory of Computation Lecture #
Chapter 3 Syntactic Analysis I.
Compiler Structures 4. Syntax Analysis Objectives
Context Free Grammar – Quick Review
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Lecture 3: Parsing CS 540 George Mason University

CS 540 Spring 2009 GMU2 Static Analysis - Parsing Scanner (lexical analysis) Parser (syntax analysis) Code Optimizer Semantic Analysis (IC generator) Code Generator Symbol Table Source language tokens Syntatic structure Syntatic/semantic structure Target language Syntax described formally Tokens organized into syntax tree that describes structure Error checking

CS 540 Spring 2009 GMU3 Static Analysis - Parsing We can use context free grammars to specify the syntax of programming languages.

CS 540 Spring 2009 GMU4 Context Free Grammars Definition: A context free grammar is a formal model that consists of: 1.A set of tokens or terminal symbols V t 2.A set of nonterminal symbols V n 3.Start symbol S (in V n ) 4.Finite set of productions of form: A  R 1 …R n (n >= 0) where A in V n, R in V n  V t

CS 540 Spring 2009 GMU5 CFG Examples V t = {+,-,0..9}, V n = {L,D}, s = {L} L  L + D | L – D | D D  0 | … | 9 V t ={(,)}, V n = {L}, s = {L} L  ( L ) L L   Indicates a production Shorthand for multiple productions

CS 540 Spring 2009 GMU6 Languages Regular A  a B, C   Context free A   Context sensitive  A    Type 0     

CS 540 Spring 2009 GMU7 Any regular language can be expressed using a CFG Starting with a NFA: For each state S i in the NFA –Create non-terminal A i –If transition (S i,a) = S k, create production A i  a A k –If transition (S i  = S k, create production A i  A k –If S i is a final state, create production A i   –If S i is the NFA start state, s = A i What does the existence of this algorithm tell us about the relationship between regular and context free languages?

CS 540 Spring 2009 GMU8 NFA to CFG Example ab*a a b a A 1  a A 2 A 2  b A 2 A 2  a A 3 A 3   1 23

CS 540 Spring 2009 GMU9 Writing Grammars When writing a grammar (or RE) for some language, the following must be true: 1.All strings generated are in the language. 2.Your grammar produces all strings in the language.

CS 540 Spring 2009 GMU10 Try these: Integers divisible by 2 Legal postfix expressions Floating point numbers with no extra zeros Strings of 0,1 where there are more 0 than 1

CS 540 Spring 2009 GMU11 Parsing The task of parsing is figuring out what the parse tree looks like for a given input and language. If a string is in the given language, a parse tree must exist. However, just because a parse tree exists for some string in a given language doesn’t mean a given algorithm can find it.

CS 540 Spring 2009 GMU12 Parse Trees The parse tree for some string in a language that is defined by the grammar G as follows: –The root is the start symbol of G –The leaves are terminals or . When visited from left to right, the leaves form the input string –The interior nodes are non-terminals of G –For every non-terminal A in the tree with children B 1 … B k, there is some production A  B 1 … B k

CS 540 Spring 2009 GMU13 Parse Tree for (())() L LL ( ) LL() LL ()  L  ( L ) L L  

CS 540 Spring 2009 GMU14 Parse Tree for (())() L LL ( ) LL() LL ()  L  ( L ) L L  

CS 540 Spring 2009 GMU15 Parsing Parsing algorithms are based on the idea of derivations. General Algorithms: –LL (top down) –LR (bottom up)

CS 540 Spring 2009 GMU16 Single Step Derivation Definition: Given  A  (with ,  in (V n  V t )*) and a production A  ,  A     is a single step derivation. Examples: L + D  L – D + D L  L - D ( L ) ( L )  ( ( L ) L ) ( L ) L  (L) L Greek letters ( , , ,…) denote a (possibly empty) sequence of terminals and non-terminals.

CS 540 Spring 2009 GMU17 Derivations Definition: A sequence of the form: w 0  w 1  …  w n is a derivation of w n from w 0 (w 0  * w n ) Lproduction L  ( L ) L    ( L ) Lproduction L      ( ) Lproduction L      ( ) L  * () If w i has non-terminal symbols, it is referred to as sentential form.

CS 540 Spring 2009 GMU18 L production L  ( L ) L   ( L ) L production L  ( L ) L   ( L ) ( L ) L production L    ( L ) ( L ) production L  ( L ) L   ( ( L ) L ) ( L ) production L     ( ( ) L ) ( L ) production L    ( ( ) L ) ( ) production L    ( ( ) ) ( ) L  * (( )) ( )

CS 540 Spring 2009 GMU19 L(G), the language generated by grammar G is {w in V t * : s  * w, for start symbol s} We’ve just shown that both () and (())() are in L(G) for the previous grammar.

CS 540 Spring 2009 GMU20 Leftmost Derivations A derivation where the leftmost nonterminal is always chosen If a string is in a given language (i.e. a derivation exists), then a leftmost derivation must exist Rightmost derivation defined as you would expect

CS 540 Spring 2009 GMU21 Leftmost Derivation for (())() L production L  ( L ) L   ( L ) L production L  ( L ) L  ( ( L ) L ) L production L    ( ( ) L ) L production L    ( ( ) ) L production L    ( ( ) ) ( L ) L production L  ( L ) L   ( ( ) ) ( ) L production L     ( ( ) ) ( )

CS 540 Spring 2009 GMU22 Rightmost Derivation for (())() L production L  ( L ) L  ( L ) L production L  ( L ) L  ( L ) ( L ) L production L    ( L ) ( L ) production L    ( L ) ( ) production L  ( L ) L  ( ( L ) L ) ( ) production L    ( ( L ) ) ( ) production L    ( ( ) ) ( )

CS 540 Spring 2009 GMU23 Ambiguity Two (or more) parse trees or leftmost derivations for some string in the language E  E + E E  E – E E  0 | … | 9 E E + E E - E E + E E - E E – 3 + 4

CS 540 Spring 2009 GMU24 Two leftmost derivations E  E + E E  E – E  E – E + E  2 – E  2 – E + E  2 – E + E  2 – 3 + E  2 – 3 + E  2 –  2 – 3 + 4

CS 540 Spring 2009 GMU25 An ambiguous grammar can sometimes be made unambiguous: E  E + T | E – T | T T  0 | … | 9 Precedence can be specified as well: E  E + T | E – T | T T  T * F | T / F | F F  ( E ) | 0 | …| 9 enforces the correct associativity

CS 540 Spring 2009 GMU26 Input: begin simplestmt; simplestmt; end begin simplestmt ; simplestmt ; end S S SS  P P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU27 Top Down (LL) Parsing begin simplestmt ; simplestmt ; end SS P P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU28 Top Down (LL) Parsing begin simplestmt ; simplestmt ; end S SS P P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU29 Top Down (LL) Parsing begin simplestmt ; simplestmt ; end S SS P P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU30 Top Down (LL) Parsing begin simplestmt ; simplestmt ; end S S SS P P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU31 Top Down (LL) Parsing begin simplestmt ; simplestmt ; end S S SS P P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU32 Top Down (LL) Parsing begin simplestmt ; simplestmt ; end S S SS P  6 P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU33 Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU34 Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S S P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU35 Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S S SS  P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU36 Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S S SS  P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU37 Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S S SS  P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU38 Bottomup (LR) Parsing begin simplestmt ; simplestmt ; end S S SS P  3 P  begin SS end SS  S ; SS SS   S  simplestmt S  begin SS end

CS 540 Spring 2009 GMU39 Parsing General Algorithms: –LL (top down) –LR (bottom up) Both algorithms are driven by the input grammar and the input to be parsed Two important sets: FIRST and FOLLOW

CS 540 Spring 2009 GMU40 FIRST Sets FIRST(  ) is the set of all terminal symbols that can begin some sentential form in a derivation that starts with    …  a  FIRST(  ) = {a in V t |   * a  }  {  } if   *  Example: S  simple | begin S end FIRST(S) = {simple, begin}

CS 540 Spring 2009 GMU41 To compute FIRST across strings of terminals and non-terminals: FIRST(  ) = {  } FIRST(A  ) = A if A is a terminal FIRST(A)  FIRST(  ) if A  *  FIRST(A) otherwise

CS 540 Spring 2009 GMU42 Computing FIRST sets Initially FIRST(A) (for all A) is empty 1.For productions A  c , where c in V t Add { c } to FIRST(A) 2.For productions A   Add {  } to FIRST(A) 3.For productions A   B , where   *  and NOT (B  *  ) Add FIRST(  B) to FIRST(A) 4.For productions A  , where   *  Add FIRST(  ) and {  to FIRST(A) same thing as   FIRST(B)

CS 540 Spring 2009 GMU43 Example 1 S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = FIRST(S) = Start with the ‘simplest’ non-terminal

CS 540 Spring 2009 GMU44 Example 1 S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = Now that we know FIRST(C) …

CS 540 Spring 2009 GMU45 Example 1 S  a S e S  B B  b B e B  C C  c C e C  d FIRST(C) = {c,d} FIRST(B) = {b,c,d} FIRST(S) = {a,b,c,d}

CS 540 Spring 2009 GMU46 Example 2 P  i | c | n T S Q  P | a S | d S c S T R  b |  S  e | R n |  T  R S q FIRST(P) = {i,c,n} FIRST(Q) = FIRST(R) = {b,  } FIRST(S) = FIRST(T) =

CS 540 Spring 2009 GMU47 Example 2 P  i | c | n T S Q  P | a S | d S c S T R  b |  S  e | R n |  T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,  } FIRST(S) = FIRST(T) =

CS 540 Spring 2009 GMU48 Example 2 P  i | c | n T S Q  P | a S | d S c S T R  b |  S  e | R n |  T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,  } FIRST(S) = {e,b,n,  } FIRST(T) = Note: S  R n  n because R  * 

CS 540 Spring 2009 GMU49 Example 2 P  i | c | n T S Q  P | a S | d S c S T R  b |  S  e | R n |  T  R S q FIRST(P) = {i,c,n} FIRST(Q) = {i,c,n,a,d} FIRST(R) = {b,  } FIRST(S) = {e,b,n,  } FIRST(T) = {b,c,n,q} Note: T  R S q  S q  q because both R and S  * 

CS 540 Spring 2009 GMU50 Example 3 S  a S e | S T S T  R S e | Q R  r S r |  Q  S T |  FIRST(S) = FIRST(R) = FIRST(T) = FIRST(Q) =

CS 540 Spring 2009 GMU51 Example 3 S  a S e | S T S T  R S e | Q R  r S r |  Q  S T |  FIRST(S) = {a} FIRST(R) = {r,  } FIRST(T) = {r,a,  } FIRST(Q) = {a,  }

CS 540 Spring 2009 GMU52 FOLLOW Sets FOLLOW(A) is the set of terminals (including end of file - $) that may follow non-terminal A in some sentential form. FOLLOW(A) = {c in V t | S  + …Ac…}  {$} if S  + …A For example, consider L  + (())(L)L Both ‘)’ and end of file can follow L NOTE:  is never in FOLLOW sets

CS 540 Spring 2009 GMU53 Computing FOLLOW(A) 1.If A is start symbol, put $ in FOLLOW(A) 2.Productions of the form B   A , Add FIRST(  ) – {  } to FOLLOW(A) INTUITION: Suppose B  AX and FIRST(X) = {c} S  +  B    A X   +  A c   = FIRST(X)

CS 540 Spring 2009 GMU54 3.Productions of the form B   A or B   A  where   *  Add FOLLOW(B) to FOLLOW(A) INTUITION: –Suppose B  Y A S  +  B    Y A  –Suppose B  A X and X  *  S  +  B    A X   *  A  FOLLOW(B)

CS 540 Spring 2009 GMU55 Example 4 S  a S e | B B  b B C f | C C  c C g | d |  FIRST(C) = {c,d,  } FIRST(B) = {b,c,d,  } FIRST(S) = {a,b,c,d,  } FOLLOW(C) = FOLLOW(B) = FOLLOW(S) = {$} Assume the first non-terminal is the start symbol Using rule #1

CS 540 Spring 2009 GMU56 Example 4 S  a S e | B B  b B C f | C C  c C g | d |  FIRST(C) = {c,d,  } FIRST(B) = {b,c,d,  } FIRST(S) = {a,b,c,d,  } FOLLOW(C) = {f,g} FOLLOW(B) = {c,d,f} FOLLOW(S) = {$,e} Using rule #2

CS 540 Spring 2009 GMU57 Example 4 S  a S e | B B  b B C f | C C  c C g | d |  FIRST(C) = {c,d,  } FIRST(B) = {b,c,d,  } FIRST(S) = {a,b,c,d,  } FOLLOW(C) = FOLLOW(B) = FOLLOW(S) = { } $, e {c,d,f}  FOLLOW(S) = {c,d,e,f,$} {f,g}  FOLLOW(B) = {c,d,e,f,g,$} Using rule #3

CS 540 Spring 2009 GMU58 Example 5 S  A B C | A D A   | a A B  b | c |  C  D d C D  e b | f c FIRST(D) = {e,f} FIRST(C) = {e,f} FIRST(B) = {b,c  FIRST(A) = {a,  } FIRST(S) = {a,b,c,e,f} FOLLOW(S) = FOLLOW(A) = FOLLOW(B) = FOLLOW(C) = FOLLOW(D) =

CS 540 Spring 2009 GMU59 Example 5 S  A B C | A D A   | a A B  b | c |  C  D d C D  e b | f c FIRST(D) = {e,f} FIRST(C) = {e,f} FIRST(B) = {b,c  FIRST(A) = {a,  } FIRST(S) = {a,b,c,e,f} FOLLOW(S) = {$} FOLLOW(A) = {b,c,e,f} FOLLOW(B) = {e,f} FOLLOW(C) = {$} FOLLOW(D) = {$}

CS 540 Spring 2009 GMU60 Example 6 S  ( A) |  A  T E E  & T E |  T  ( A ) | a | b | c FIRST(T) = {(,a,b,c} FIRST(E) = {&,  } FIRST(A) = {(,a,b,c} FIRST(S) = {(,  } FOLLOW(S) = FOLLOW(A) = FOLLOW(E) = FOLLOW(T) =

CS 540 Spring 2009 GMU61 Example 6 S  ( A) |  A  T E E  & T E |  T  ( A ) | a | b | c FIRST(T) = {(,a,b,c} FIRST(E) = {&,  } FIRST(A) = {(,a,b,c} FIRST(S) = {(,  } FOLLOW(S) = {$} FOLLOW(A) = { ) } FOLLOW(E) = FOLLOW(A) = { ) } FOLLOW(T) = FIRST(E)  FOLLOW(A)  FOLLOW(E) = {&, )}

CS 540 Spring 2009 GMU62 Example 7 E  T E’ E’  + T E’ |  T  F T’ T’  * F T’ |  F  ( E ) | id FIRST(F) = FIRST(T) = FIRST(E) = {(,id} FIRST(T’) = {*,  } FIRST(E’) = {+,  } FOLLOW(E) = FOLLOW(E’) = FOLLOW(T) = FOLLOW(T’) = FOLLOW(F) =

CS 540 Spring 2009 GMU63 Example 7 E  T E’ E’  + T E’ |  T  F T’ T’  * F T’ |  F  ( E ) | id FIRST(F) = FIRST(T) = FIRST(E) = {(,id} FIRST(T’) = {*,  } FIRST(E’) = {+,  } FOLLOW(E) = {$,)} FOLLOW(E’) = FOLLOW(E) = {$,)} FOLLOW(T) = FIRST(E’)  FOLLOW(E)  FOLLOW(E’) = {+,$,)} FOLLOW(T’) = FOLLOW(T) = {+,$,)} FOLLOW(F) = FIRST(T’)  FOLLOW(T)  FOLLOW(T’) = {*,+,$,)}