COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION.

Slides:



Advertisements
Similar presentations
Functions Reading: Epp Chp 7.1, 7.2, 7.4
Advertisements

Chapter Three: Closure Properties for Regular Languages
ICE1341 Programming Languages Spring 2005 Lecture #6 Lecture #6 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
1 Lecture 32 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union,
Closure Properties of CFL's
Sets Lecture 11: Oct 24 AB C. This Lecture We will first introduce some basic set theory before we do counting. Basic Definitions Operations on Sets Set.
Copyright © Cengage Learning. All rights reserved. CHAPTER 1 SPEAKING MATHEMATICALLY SPEAKING MATHEMATICALLY.
1 Introduction to Computability Theory Lecture15: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
Introduction to Computability Theory
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Normal forms for Context-Free Grammars
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
The Byzantine Generals Strike Again Danny Dolev. Introduction We’ll build on the LSP presentation. Prove a necessary and sufficient condition on the network.
CS5371 Theory of Computation Lecture 8: Automata Theory VI (PDA, PDA = CFG)
1 Introduction to Computability Theory Lecture11: The Halting Problem Prof. Amos Israeli.
APPLICATIONS OF DIFFERENTIATION 4. In Sections 2.2 and 2.4, we investigated infinite limits and vertical asymptotes.  There, we let x approach a number.
Copyright © Cengage Learning. All rights reserved.
Chapter 4: A Universal Program 1. Coding programs Example : For our programs P we have variables that are arranged in a certain order: Y 1 X 1 Z 1 X 2.
Set theory Sets: Powerful tool in computer science to solve real world problems. A set is a collection of distinct objects called elements. Traditionally,
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Zvi Kohavi and Niraj K. Jha 1 Capabilities, Minimization, and Transformation of Sequential Machines.
LESSON 24.
Reading and Writing Mathematical Proofs
Zvi Kohavi and Niraj K. Jha 1 Memory, Definiteness, and Information Losslessness of Finite Automata.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
MA/CSSE 474 Theory of Computation DFSM Canonical Form Proof of NDFSM  DFSM ALGORITHM (as much as we have time for) This version includes the "answers"
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Set, Combinatorics, Probability & Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Set,
1 Sections 1.5 & 3.1 Methods of Proof / Proof Strategy.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
ELIMINATING LEFT RECURSIVENESS. Abbreviation. “cfg” stands for “context free grammar” Definition. A cfg is left recursive if it contains a production.
Discrete Structure Sets. 2 Set Theory Set: Collection of objects (“elements”) a  A “a is an element of A” “a is a member of A” a  A “a is not an element.
Theory of Computation, Feodor F. Dragan, Kent State University 1 TheoryofComputation Spring, 2015 (Feodor F. Dragan) Department of Computer Science Kent.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Chapter 3 Part II Describing Syntax and Semantics.
Copyright © Cengage Learning. All rights reserved.
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
Naïve Set Theory. Basic Definitions Naïve set theory is the non-axiomatic treatment of set theory. In the axiomatic treatment, which we will only allude.
CS 203: Introduction to Formal Languages and Automata
TM Design Macro Language D and SD MA/CSSE 474 Theory of Computation.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.
How YACC is constructed. How Yacc works To construct a parsing machine for arithmetic expressions, a special case considered to simplify the account of.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
ELIMINATING UNIT PRODUCTIONS
Set Operations CS 202, Spring 2008 Epp, chapter 5.
Set, Combinatorics, Probability & Number Theory
Copyright © Cengage Learning. All rights reserved.
Chapter 3 The Real Numbers.
CSE 105 theory of computation
PDAs Accept Context-Free Languages
Context-Free Languages
Session – 2 SETS & Operations of SETS
Chapter 4. Syntax Analysis (2)
The Programming Language L
CSE 105 theory of computation
Chapter 4. Syntax Analysis (2)
Chapter 10: Compilers and Language Translation
The Programming Language L
Intro to Theory of Computation
CSE 105 theory of computation
Presentation transcript:

COMBINING COMPATIBLE STATES DURING LR(1) PARSER CONSTRUCTION

The LR(0) algorithm for creating compilers is one in which contexts are not evaluated, and states are considered identical if they consist of the same set of marked productions

But this algorithm is insufficient for actual programming languages, producing parsers with numerous conflicts

The LR(1) algorithm when applied to creating compilers for real computer languages, such as those for Java or C++, results in a parsing machine that is a order or more larger than those produced by an LR(0) algorithm for the same grammar.

On the other hand the LR(1) algorithm, which you made use of in your last assignment, produces parsers, for the large grammars employed for actual computer languages, which are a few orders larger than those produced by the LR(0) algorithm.

As a compromise, various methods, including the one employed by Yacc, have been devised for subsets of the LR(1) languages, using a hybrid approach.

This works well for most programming languages, but imposes a greater responsibility on the compiler writer, to come up with a grammar that does not lead to conflicts (i.e. to cases where more than one action is defined at a parsing machine state for the same next input symbol). These methods only work for a subset of the LR(1) grammars, and there are applications, including ones involving natural language processing, for which they are inadequate.

However one can employ a definition of compatibility between states, which works for all LR(1) languages, and which produces parsers of the same size as those referred to previously

DEFINITION. The nucleus of state consists of the configurations in the state in which the marker is in a position greater that zero. Example A configuration in a state of the form A → bc.d, {x,y} would be a member of its nucleus, but a configuration such as A →.bcd, {x,y} would not be a member.

DEFINITION OF COMPATIBILITY BETWEEN LR(1) STATES Let S and S be two states in a LR(1) parsing machine whose nuclei consist of the same marked productions, which we will denote as P1,…,Pn. For 1≤ t ≤ n, let Ut denote the set of contexts associated with marked production Pt in state S, and let Ut denote the set of contexts associated with that marked production in state S. Then states S and S are compatible if, for all 1 ≤ i < j ≤ n, at least one of the following condition holds: (a) Ui  Uj =  and Ui  Uj =  (  is the empty set, i.e. the intersections involved are both empty) (b) Ui  Uj ≠  (c) Ui  Uj ≠ 

Note If states S and S are as described above, and their nuclei consist of only a single configuration, then according to the above definition they are compatible

In the case where S and S as described above are compatible, one can combine the states into a single state whose nucleus consists of the same marked productions listed above, while for 1≤ t ≤ n, the set of contexts associated with marked production Pt is Ut  Ut.

One way of looking at the definition is to say that every pair of configurations in the nuclei must pass a test, and that two states are compatible only if they all in fact pass.

Fortunately, in grammars for actual programming languages such as Java, C++, etc., there are at most 6 configurations in the nucleus of any state. The states may be large, with many immediate successors, but the nuclei are all quite small.

EXAMPLES We show only the nucleus of the states in these examples, since, according to the definition, states are compatible if and only if their nuclei are.

S S’ The above two states are not compatible because the pair consisting of the first and last configurations fail the test. For this pair condition (a) of the defn. is not true, since the context of the first configuration of S contains an x, and so does the context of the third production of S’ In addition neither of conditions (b) or (c) are true. A → ab.c {x,y} B → b.n {s,t} C → rb.ed {u,v} A → ab.c {d} B → b.n {s} C → rb.ed {x,v}

S S’ The first and third configurations in this case pass the test because condition (b) of the defn. applies to the first and third configurations of S. Both of these configurations contain x in their set of contexts. The states in this case are compatible. Remember, that while every pair of configurations in the nucleus must pass the test, it only requires that one of conditions (a), (b) or (c) be true for a given pair for it to pass. A → ab.c {x,y} B → b.n {s,t} C → rb.ed {x,v} A → ab.c {x,y,d} B → b.n {s} C → rb.ed {x,v}

Since the states are compatible, they can be combined to form one whose nucleus is: A → ab.c {x,y,d} B → b.n {s,t} C → rb.ed {x,u,v}

Note. In the figure on the next slide, where we omit the context set of various configurations (i.e. only show the marked production involved), the inference involved is that they are irrelevant to the assertions being made about the figure.

States 2 and 8 are not compatible since the first configuration of state 2 has d as context in common with the second configuration of state 8. In fact if we were to combine states 2 and 8, it would produce a combination of states 3 and 9 as its u- successor. This state would have a conflict, in that in had reduce actions, for when the next input symbol was d, for both Z → tu and V → є

Now consider the altered machine obtained if the production X → aYd where replaced by (say) X → aYa. In this case the first configuration of state 2 would be Y → t.W {a}. It would then follow that states 2 and 8 were compatible and could safely be combined to form: Y → t.W {a, e}. Z → t.u {c, d} W →.uV

The Journal paper describing this method of combining states contains a formal proof of its correctness. But seeing our’s is a practically oriented course, we will just consider an informal justification based on a few examples to supply a flavor of the reasoning involved

The main argument is that if the parsing machine containing the states S and S, as described in the defn. of compatibility, has no conflicts, and S and S are compatible, then the parsing machine obtained by combining them will also have no conflicts.

The argument is by contradiction. Let’s consider examples of the various ways that two configurations in the combination of S and S could have conflicts or lead to conflicts between other pairs of configurations in states reachable from S. In each case we hope to show that either the parsing machine as it was before S and S were combined contained conflicts in the first place or that S and S could not in fact have been compatible.

Case 1. Let configs 1 and 2 of the combined state formed from states S and S’ be: A → r B.uv {a,b} C → t B.uv {a,c} Seeing that the machine as it was before the combination contained no conflicts, and specifically did not contain a conflict in the uv successor of these states, either (1)state S must have contained the a in its version of config1, while state S contained the a in its version of config 2, or (2)vice-versa.

Case 1 contd. A → r B.uv {a,b} C → t B.uv {a,c} In either case neither condition (a) nor (b) of the defn. would then be true for the two configs, and since condition (c) is also not true, states S and S’ could not have been compatible in the first place.

Case 2. Let configs 1 and 2 of the combined state be: A → r B.uv {a,b} D → t B.Ca C →.uv {a} Either S or S must contain A → r B.uv {a.. }, in which case the original parsing machine would have had a conflict at its uv-successor. This is in contradiction to our assumption that the original parsing machine was conflict-free.

Case 3. Let configs 1 and 2 be: A → s B.Ea E →.uv {a} D → t B.Ca C →.uv {a} Here again the original parsing machine would have had conflicts in the uv-successors of both S and S

Case 4. Let configs 1 and 2 be: A → r B.uv D → t B.uvr Here too the original parsing machine would have had conflicts in the uv-successors of both S and S. In this case the conflict would have been between a reduction and a transition.

EXERCISE Construct an LR(1) parsing machine for the grammar on the next slide, combining compatible states as you encounter them

program → main ; statement_list end main; statement_list → statement_list statement | statement statement → assign_statement | while_statement | do_statement assign_statement → identifier = identifier while_statement → while ( condition ) statement_list wend condition → identifier = identifier do_statement → do identifier = number to number ; statement_list end do ;