Advanced Programming Andrew Black and Tim Sheard Lecture 10 Grammars NFAs.

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Cse321, Programming Languages and Compilers 1 6/18/2015 Lecture #7, Feb. 5, 2007 Grammars Top down parsing Transition Diagrams Ambiguity Left recursion.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Top-Down Parsing.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Top-Down Parsing - recursive descent - predictive parsing
Chapter 5 Top-Down Parsing.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
컴파일러 입문 제 7 장 LL 구문 분석.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Syntax and Semantics Structure of programming languages.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Pembangunan Kompilator.  The parse tree is created top to bottom.  Top-down parser  Recursive-Descent Parsing ▪ Backtracking is needed (If a choice.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Top-Down Parsing The parse tree is created top to bottom. Top-down parser –Recursive-Descent Parsing.
Parsing Top-Down.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
More Parsing CPSC 388 Ellen Walker Hiram College.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Lecture 3: Parsing CS 540 George Mason University.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
ADTS, GRAMMARS, PARSING, TREE TRAVERSALS Lecture 13 CS2110 – Spring
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Error recovery in predictive parsing An error is detected during the predictive parsing when the terminal on top of the stack does not match the next input.
Parsing #1 Leonidas Fegaras.
Context free grammars Terminals Nonterminals Start symbol productions
Compilers Welcome to a journey to CS419 Lecture15: Syntax Analysis:
Compiler Construction
Introduction to Top Down Parser
Top-down parsing cannot be performed on left recursive grammars.
SYNTAX ANALYSIS (PARSING).
FIRST and FOLLOW Lecture 8 Mon, Feb 7, 2005.
CS 404 Introduction to Compiler Design
Top-Down Parsing.
3.2 Language and Grammar Left Factoring Unclear productions
Lecture 7 Predictive Parsing
CS 540 George Mason University
Top-Down Parsing Identify a leftmost derivation for an input string
Top-Down Parsing The parse tree is created top to bottom.
Chapter 4 Top Down Parser.
LL PARSER The construction of a predictive parser is aided by two functions associated with a grammar called FIRST and FOLLOW. These functions allows us.
Predictive Parsing Lecture 9 Wed, Feb 9, 2005.
Computing Follow(A) : All Non-Terminals
Syntax Analysis - Parsing
Lecture 7 Predictive Parsing
Nonrecursive Predictive Parsing
Context Free Grammar – Quick Review
Predictive Parsing Program
Presentation transcript:

Advanced Programming Andrew Black and Tim Sheard Lecture 10 Grammars NFAs

Grammar –A set of tokens (terminals): T –A set of non-terminals: N –A set of productions { lhs -> rhs,... } lhs in N rhs is a sequence of N U T –A Start symbol: S (in N) Grammars 1

In Haskell data Production symbol = Prod { lhs:: symbol, rhs:: [symbol] } data Grammar symbol = Gr { -- the set of terminal symbols term:: [symbol] -- the set of non-terminals, nonterm:: [symbol] -- the set of productions, prod:: [Production symbol] -- the start symbol, start:: symbol }

Record syntax term :: Grammar a -> [a] Main> :t nonterm nonterm :: Grammar a -> [a] Main> :t prod prod :: Grammar a -> [Production a] Main> :t start start :: Grammar a -> a Main> :t lhs lhs :: Production a -> a Main> :t rhs rhs :: Production a -> [a] data Grammar symbol = Gr { term:: [symbol], nonterm:: [symbol], prod:: [Production symbol], start:: symbol }

Example Grammar g1 = Gr ["1","2",",","[","]"] ["list1","elem","list2","list3"] [Prod "list1" ["[", "list2", "]"],Prod "list2" [],Prod "list2" ["elem", "list3"],Prod "list3" [ ",", "elem", "list3" ],Prod "list3" [],Prod "elem" ["1"],Prod "elem" ["2"]] "list1"

Pretty printed Terminals = {",", "1", "2", "[", "]"} NonTerminals = {elem, list1, list2, list3} Start = list1 list1 -> [ list2 ] list2 -> list2 -> elem list3 list3 ->, elem list3 list3 -> elem -> 1 elem -> 2

Shortcut construction We can build a grammar form a list of its productions –Provide only the productions All lhs symbols comprise N All other symbols comprise T lhs of first production is S shortcut ps = Gr term nonterm ps start where (Prod start _ : _) = ps nonterm = norm(map (\ (Prod lhs rhs) -> lhs) ps) all = concat(map (\ (Prod lhs rhs) -> lhs:rhs) ps) term = norm all \\ nonterm

Example 2 prods2 = [Prod "Sent" ["nounPhrase","verbPhrase"],Prod "nounPhrase" [ "properNoun" ],Prod "nounPhrase" [ "article", "noun" ],Prod "article" [ "a " ],Prod "article" ["the "],Prod "properNoun" ["Tom "],Prod "noun" ["cat "],Prod "noun" ["man "],Prod "verbPhrase" [ "verb", "object"],Prod "verb" [ "ate "],Prod "verb" [ "stole "],Prod "object" [ "article", "adjective", "noun" ],Prod "adjective" [ "pretty " ],Prod "adjective" [ "red "] ]

Constructed & Pretty Printed Main> shortcut prods2 Terminals = {"Tom ", "a ", "ate ", "cat ", "man “, "pretty ", "red ", "stole", "the "} NonTerminals = {Sent, adjective, article, noun, nounPhrase, object, properNoun, verb, verbPhrase} Start = Sent Sent -> nounPhrase verbPhrase nounPhrase -> properNoun nounPhrase -> article noun article -> a article -> the properNoun -> Tom noun -> cat noun -> man verbPhrase -> verb object verb -> ate verb -> stole object -> article adjective noun adjective -> pretty adjective -> red

Example 3 prods3 = [Prod "E" ["T","E'", "$"],Prod "E'" [ "+", "T", "E'" ],Prod "E'" [],Prod "T" [ "F", "T'"],Prod "T'" ["*", "F", "T'"],Prod "T'" [],Prod "F" ["(", "E", ")"],Prod "F" ["Id"],Prod "Id" [ "x" ] ] g3 = shortcut prods3

Pretty Printed Terminals = {"$", "(", ")", "*", "+", "x"} NonTerminals = {E, E', F, Id, T, T'} Start = E E -> T E' $ E' -> + T E' E' -> T -> F T' T' -> * F T' T' -> F -> ( E ) F -> Id Id -> x

Meaning of a grammar A grammar can be given several meanings –As a generator –As a recognizer Any recursive grammar generates an infinite set of strings

Generating Grammars Start with a non-terminal Rewriting rules –Pick a non-terminal to replace. Replace it with the rhs of one of its production. Repeat until no non-terminals remain A sentence of G: L(G) –Start with S –only terminal symbols –all strings derivable from G in 1 or more steps

Technique Start with a nonterm –choose nounPhrase Pick a prodcuction –choose: nounPhrase -> article noun Generate all strings from each symbol in the rhs article = ["a ","the "] noun = ["cat ","man "] Make all possible combinations “a cat” “a man” “the cat” “the man” If there is more than 1 production for a symbol, compute sentences for each one, and then union them together. Sent -> nounPhrase verbPhrase nounPhrase -> properNoun nounPhrase -> article noun article -> a article -> the properNoun -> Tom noun -> cat noun -> man verbPhrase -> verb object verb -> ate verb -> stole object -> article adjective noun adjective -> pretty adjective -> red

All Possible combinations oneEach :: [[String]] -> [String] oneEach [] = [""] oneEach (x:xs) = [ a++b | a <- x, b <- oneEach xs ] Main> oneEach [["x","y"],["1","2"],["A","B","C"]] ["x1A","x1B","x1C","x2A","x2B","x2C“,"y1A","y1B","y1C","y2A","y2B","y2C"]

First Try genA :: Grammar String -> String -> [String] genA (Gr term nonterm ps start) symbol | elem symbol term = [symbol] genA term nonterm ps start)) symbol = concat many where startsWith sym (Prod lhs rhs) = lhs==sym prods = filter (startsWith symbol) ps oneRhs (Prod lhs rhs) = oneEach(map (genA gram) rhs) many = map oneRhs prods

Test it test1 = mapM putStrLn (genA g2 "Sent") Main> test1 Tom ate a pretty cat Tom ate a pretty man Tom ate a red cat Tom ate a red man Tom ate the pretty cat Tom ate the pretty man Tom ate the red cat Tom ate the red man Tom stole a pretty cat Tom stole a pretty man Tom stole a red cat Tom stole a red man Tom stole the pretty cat Tom stole the pretty man Tom stole the red cat Tom stole the red man a cat ate a pretty cat a cat ate a pretty man...

Lets try it on grammar g1 What happens? Why? Terminals = {",", "1", "2", "[", "]"} NonTerminals = {elem, list1, list2, list3} Start = list1 list1 -> [ list2 ] list2 -> list2 -> elem list3 list3 ->, elem list3 list3 -> elem -> 1 elem -> 2

Cut of generation after using a non-term n times Create a table of cutoff depths for each non-term. –type Tab = [(String,Int)] Operations on tables –Decrementing the cutoff for a given symbol decrement :: String -> Tab -> Tab decrement s [] = [] decrement s ((x,n):xs) | s==x = (x,n-1):xs | True = (x,n):decrement s xs –Finding the cutoff find :: Eq a => a -> [(a,b)] -> b find s ((x,n):xs) | s==x = n | True = find s xs

Second try genB :: Grammar String -> Tab -> String -> [String] genB (Gr term nonterm ps start) table symbol | elem symbol term = [symbol] genB term nonterm ps start)) table symbol | find symbol table <= 0 = [] | True = concat many where startsWith sym (Prod lhs rhs) = lhs==sym prods = filter (startsWith symbol) ps new sym = decrement sym table oneRhs (Prod lhs rhs) = oneEach(map (genB gram (new lhs)) rhs) many = map oneRhs prods

Putting it all together gen n term nonterm ps start)) = genB gram table start where table = map f nonterm f sym = (sym,n)

Test it Main> gen 3 g1 ["[]","[1,1,1]","[1,1,2]","[1,1]“,"[1,2,1]","[1,2,2]","[1,2]","[1]“,"[2,1,1]","[2,1,2]","[2,1]“,"[2,2,1]","[2,2,2]","[2,2]","[2]"]

Top Down Parsing Begin with the start symbol and try and derive the parse tree from the root which matches the given string Consider the grammar Exp -> id | Exp + Exp | Exp * Exp | ( Exp ) derives x, x+x, x+x+x, x * y x + y * z...

Example Parse (top down) –stack input Exp x + y * z / | \ Exp + Exp Exp y * z / | \ Exp + Exp | id(x)

Top Down Parse (cont) Exp y * z / | \ Exp + Exp | / | \ id(x) Exp * Exp Exp z / | \ Exp + Exp | / | \ id(x) Exp * Exp | id(y)

Top Down Parse (cont.) Exp / | \ Exp + Exp | / | \ id(x) Exp * Exp | | id(y) id(z)

Predictive Parsers Using a stack to avoid recursion. Encoding the diagrams in a table The Nullable, First, and Follow functions –Nullable: Can a symbol derive the empty string. False for every terminal symbol. –First: all the terminals that a non-terminal could possibly derive as its first symbol. term or nonterm -> set( term ) sequence(term + nonterm) -> set( term) –Follow: all the terminals that could immediately follow the string derived from a non-terminal. non-term -> set( term )

Example First and Follow Sets E -> T E' $ E' -> + T E' E’ -> ε T -> F T' T' -> * F T' T’ -> ε F -> ( E ) F -> id First E = { "(", "id"} Follow E = {")","$"} First F = { "(", "id"} Follow F = {"+","*",”)”,"$"} First T = { "(", "id"} Follow T = {{"+",")","$"} First E' = { "+", ε} Follow E' = {")","$"} First T' = { "*", ε} Follow T' = {"+",")","$"} First of a terminal is itself. First can be extended to sequence of symbols.

Nullable If E -> ε then E is nullable If E -> A B C, and all of A, B, C are nullable then E is nullable. Nullable (E’) = true Nullable(T’) = true Nullable for all other symbols is false This is a fixpoint computation E -> T E' $ E' -> + T E' E’ -> ε T -> F T' T' -> * F T' T’ -> ε F -> ( E ) F -> id

In Haskell We’ll represent nullable, first and follow sets as tables. data Table elem = Tab [(String,elem)] deriving Show instance Eq elem => Eq (Table elem) where (Tab xs) == (Tab ys) = sameRhs xs ys where sameRhs [] [] = True sameRhs ((x,rhs1):xs) ((y,rhs2):ys) = rhs1==rhs2 && sameRhs xs ys sameRhs _ _ = False

Operations on Tables get :: String -> Table elem -> elem get s (Tab xs) = find s xs set :: Table elem -> String -> elem -> Table elem set (Tab xs) s y = Tab(update xs) where update [] = [] update ((t,_):ys) | t==s = (t,y):ys update (y:ys) = y : (update ys) add :: String -> String -> Table [String] -> Table [String] add symbol element (Tab xs) = Tab(insert xs) where insert [] = [] insert ((s,xs):ys) | s==symbol = (s,norm(element:xs)):ys insert (y:ys) = y:(insert ys)

nullable :: Table Bool -> String -> Bool nullable tab s = get s tab nullStep term nonterm ps start)) table = newtable where nulls(Prod lhs rhs)= all(nullable table) rhs nullps = filter nulls ps newtable = foldl acc table nullps acc tab (Prod lhs rhs) = set tab lhs True null term nonterm ps start)) = fixpoint (nullStep gram) simple where simple = Tab(map f (term++nonterm)) f x = (x,False)

Main> null g1 Tab [(",",False),("1",False),("2",False),("[",False),("]",False),("elem",False), ("list1",False),("list2",True),("list3",True)] Main> null g2 Tab [("Tom ",False),("a ",False),("ate ",False),("cat ",False),("man",False),("pretty",False),("red”,False),("stole ",False),("the",False),("Sent",False),("adjective",False),("article",False),("noun",False),("nounPhrase",False),("object",False),("properNoun",False),("verb",False),("verbPhrase",False)] Main> null g3 Tab [("$",False),("(",False),(")",False),("*",False),("+",False),("x",False),("E",False),("E'",True),("F",False),("Id",False),("T",False),("T'",True)]

Computing First Use the following rules until no more terminals can be added to any FIRST set. 1) if X is a term. FIRST(X) = {X} X -> ε 2) if X -> ε is a production then add ε to FIRST(X), (Or set nullable of X to true). 3) if X is a non-term and –X -> Y1 Y2... Yk –add a to FIRST(X) if a in FIRST(Yi) and for all j<i ε in FIRST(Yj) E.g.. if Y1 can derive ε then if a is in FIRST(Y2) it is surely in FIRST(X) as well.

Example First Computation Terminals –First($) = {$} –First(*) = {*} –First(+) = {+}... Empty Productions –add ε to First(E’), add ε to First(T’) Other NonTerminals –Computing from the lowest layer (F) up First(F) = {id, ( } First(T’) = { ε, * } First(T) = First(F) = {id, ( } First(E’) = { ε, + } First(E) = First(T) = {id, ( } E -> T E' $ E' -> + T E' E’ -> ε T -> F T' T' -> * F T' T’ -> ε F -> ( E ) F -> id

Computing Follow Use the following rules until nothing can be added to any follow set. 1) Place $ (the end of input marker) in FOLLOW(S) where S is the start symbol. 2) If A -> a B b then everything in FIRST(b) except ε is in FOLLOW(B) 3) If there is a production A -> a B or A ::- a B b where FIRST(b) contains ε (i.e. b can derive the empty string) then everything in FOLLOW(A) is in FOLLOW(B)

Ex. Follow Computation Rule 1, Start symbol – Add $ to Follow(E) Rule 2, Productions with embedded nonterms – Add First( ) ) = { ) } to follow(E) –Add First($) = { $ } to Follow(E’) –Add First(E’) = {+,ε } to Follow(T) –Add First(T’) = {*,ε} to Follow(F) Rule 3, Nonterm in last position –Add follow(E’) to follow(E’) (doesn’t do much) –Add follow (T) to follow(T’) –Add follow(T) to follow(F) since T’ --> ε –Add follow(T’) to follow(F) since T’ --> ε E -> T E' $ E' -> + T E' E’ -> ε T -> F T ' T' -> * F T' T’ -> ε F -> ( E ) F -> id

1.For each production A -> alpha do 2 & 3 2.For each a in First alpha do add A -> alpha to M[A,a] 3.if ε is in First alpha, add A -> alpha to M[A,b] for each terminal b in Follow A. If ε is in First alpha and $ is in Follow A add A -> alpha to M[A,$]. First E = {"(","id"} Follow E = {")","$"} First F = {"(","id"} Follow F = {"+","*",”)”,"$"} First T = {"(","id"} Follow T = {{"+",")","$"} First E' = {"+",ε} Follow E' = {")","$"} First T' = {"*",ε} Follow T' = {"+",")","$"} Table from First and Follow M[A,t] terminals + * ) ( id $ nontermsnonterms E E’ T T’ F E -> T E' $ 2 E' -> + T E' 3 | ε 4 T -> F T' 5 T' -> * F T' 6 | ε 7 F -> ( E ) 8 | id

Predictive Parsing Table E E’ T T’ F id + * () $ T E’ + T E’ε ε F T’ ε * F T’εε id( E )

Table Driven Algorithm push start symbol Repeat begin let X top of stack, A next input if terminal(X) then if X=A then pop X; remove A else error() else (* nonterminal(X) *) begin if M[X,A] = Y1 Y2... Yk then pop X; push Yk YK-1... Y1 else error() end until stack is empty, input = $