Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars.

Similar presentations


Presentation on theme: "CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars."— Presentation transcript:

1 CSCI 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 The Chinese University of Hong Kong LR(0) grammars Fall 2011

2 Parsing computer programs First the javac compiler does a lexical analysis: if (n == 0) { return x; } if (ID == INT_LIT) { return ID; } ID = identifier (name of variable, procedure, class,...) INT_LIT = integer literal (value) The alphabet of java CFG consists of symbols like:  = { if, return, (, ) {, }, ;, ==, ID, INT_LIT,...}

3 Parsing computer programs if (n == 0) { return x; } Statement if ParExpression Statement ( Expression ) ExpressionExpressionRest Infixop Expression Literal Primary Identifier Block { BlockStatements } BlockStatement return Expression ; Statement == INT_LIT ID Primary Identifier ID if (ID == INT_LIT) { return ID; } the parse tree of a java statement

4 CFG of the java programming language Identifier: ID QualifiedIdentifier: Identifier {. Identifier } Literal: IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteral NullLiteral Expression: Expression1 [AssignmentOperator Expression1]] AssignmentOperator: = += -= *= /= &= |= from http://java.sun.com/docs/books/jls /second_edition/html/syntax.doc.html#52996 …

5 Parsing java programs class Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug;// A trick to help with debugging public Point2d (double px, double py) {// Constructor x = px; y = py; debug = false; // turn off debugging } public Point2d () {// Default constructor this (0.0, 0.0); // Invokes 2 parameter Point2D constructor } // Note that a this() invocation must be the BEGINNING of // statement body of constructor public Point2d (Point2d pt) { // Another consructor x = pt.getX(); y = pt.getY(); } … Simple java program: about 500 symbols

6 Parsing algorithms How long would it take to parse this program? Can we parse faster? No! CYK is the fastest known general-purpose parsing algorithm for CFGs try all parse treesabout 10 80 years CYK algorithmabout 1 week!

7 Another way of thinking Scientist: Find an algorithm that can parse any CFG Engineer: Design your CFG so it can be parsed very quickly

8 Parsing left to right S  Tc (1) T  TA (2) | A (3) A  aTb (4) | ab (5) input: abaabbc a a b b A ab A c T T T A S Try to match to the left of

9 Items An item is a production augmented with a The item is complete if the is the last symbol S  Tc T  TA T  A A  aTb A  ab S  Tc (1) T  A (3) T  TA (2) A  aTb (4) A  ab (5)

10 Meaning of items S  Tc (1) T  TA (2) | A (3) A  aTb (4) | ab (5) a a b b A ab A c T T Items represent possibilities at various stages of the parsing process A  aTb a a b b abc A  aTb A  ab

11 Meaning of items S  Tc (1) T  TA (2) | A (3) A  aTb (4) | ab (5) a a b b A ab A c T T When a complete item occurs, a part of the parse tree is discovered A  aTb a a b b abc A  aAb A  ab A a a b b A ab A c T T A  aTb

12 LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab

13 LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab

14 LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab A  aAb

15 LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab

16 A  aAb LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  aAb A  ab A  aAb A  ab

17 LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A  ab A  aAb A

18 LR(0) parsing Move from left to right Keep track of all possible valid items Prune the invalid items When a complete item occurs, build part of parse tree valid items registry A  aAb | ab aabb A A  aAb A

19 valid items registry A  ab two kinds of actions no complete item in registry shiftreduce exactly one complete item valid items registry A  aAb A  ab A  aAbA  ab aabb aabb A

20 LR(0) implementation: first take stack action valid items A  aAb | ab aabb S  S S R S R a aa aab aA aAb A A  aAb A  ab A  ab A  aAb A A

21 valid item update rules S   A   a  disappear a, b : terminals A, B, C : variables  : mixed strings notation initial valid items: shift updates: read a A   a  read b A   B  read x disappear reduce updates: A   B  disappear reduce B A   B  B   reduce C common updates:

22 valid item update rules S   a, b : terminals A, B, C : variables  : mixed strings X : terminal or variable notation initial valid items: shift updates: A   a  read a reduce updates: A   B  reduce B A   B  B   common updates: A  X  X

23 LR(0) parsing: NFA representation S   q0q0  A   X  X C    A   C  For every item S   For every item A  X  For every pair of items A   C , C   a, b : terminals A, B, C : variables  : mixed strings X : terminal or variable notation

24 NFA example A  aAb | ab A  aAb A  ab a   A a b b   q0q0 NFA alphabet is  = {a, b, A} start state is q 0 other states are items

25 NFA to DFA conversion A  aAb A  ab a   A a b b q0q0   A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a q die b, A a, A q die

26 Shift states and reduce states 123 45 are shift states are reduce states A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a 1 2 3 5 4

27 LR(0) parsing: second take A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a 1 2 3 5 4 aabb 1 S a2 S aa2 R aab5 ? How do we know what to reduce to?

28 remember state in stack! A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a 1 2 3 5 4 aabb 1 S 1a2 S 1a2a2 R 1a2a2b5 A 1a2A backtrack two steps

29 remember state in stack! A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a 1 2 3 5 4 aabb 1 S 1a2 S 1a2a2 R 1a2a2b5 A 1a2A 3 S 1a2A3b 4 R A

30 PDA for LR(0) parsing A  aAb | ab stack action state S  A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a 1 2 3 5 4 aabb 1 S 12 S 122 R 1225 A 12 3 S 123 4 R A

31 PDA for LR(0) parsing A  aAb A  ab A  aAb A  ab A  aAb A  ab A b b a A  aAb A  ab a 1 2 3 5 4 pop state b, state a take transition A out of state a push state a pop state b, state A, state a take transition A out of state a push state a A A ,  /$ A, $/ 

32 Example 1 L = {w#w R : w ∈ {a, b}*} A  aAa | bAb | # A  aAa A  bAb A  # a b A b q0q0          # A a

33 Example 1 A  aAa | bAb |  A  aAa A  bAb A  # 4 # 1 3 A  aAa A  bAb A  # 2 A  aAa 7 5 A # baab A # A A stackstateaction 1S1S 14S14S 143S 1432R 1435S 14357R 146S 1468R 4 A  bAb A  aAa A  bAb A  # b a a b # a b A a 4 A  bAb 8 6 a

34 LR(0) grammars and deterministic PDAs The PDA for LR(0) parsing is deterministic Some CFLs require non-deterministic PDAs, e.g. What goes wrong when we do LR(0) parsing on L ? L = {ww R : w ∈ {a, b}*}

35 Example 2 L = {ww R : w ∈ {a, b}*} A  aAa | bAb |  A  aAa A  bAb A  a b A b q0q0          A a

36 shift-reduce conflict Example 2 L = {ww R : w ∈ {a, b}*} A  aAa | bAb |  input: abba A  aAa A  bAb A  4 A  aAa A  bAb A  A  aAa A A  bAb A  aAa A  bAb A  a b a b A a 4 A  bAb a

37 Parsing computer programs if (n == 0) { return x; } Statement if ParExpression Statement ( Expression ) ExpressionExpressionRest Infixop Expression Literal Primary Identifier Block { BlockStatements } BlockStatement return Expression ; Statement == INT_LIT ID Primary Identifier ID else { return x + 1; }

38 Parsing computer programs if (n == 0) { return x; } Statement ( Expression ) Block ID else { return x + 1; } if ParExpression Statement else Statement Block... LR(0) parsers cannot tell apart if... then from if... then... else

39 When you can’t LR(0) parse LR(0) parser can perform two actions: What if: no complete item is valid shift (S) there is one valid item, and it is complete reduce (R) some valid items complete, some not S / R conflict more than one valid complete item R / R conflict

40 context-free grammars LR(∞) grammars … Hierarchy of context-free grammars LR(1) grammars LR(0) grammars parse using LR(0) algorithm java perl python … to be continued…


Download ppt "CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong LR(0) grammars."

Similar presentations


Ads by Google