CS 3813 Introduction to Formal Languages and Automata Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on.

Slides:



Advertisements
Similar presentations
Closure Properties of CFL's
Advertisements

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Context-Free Grammars
LR-Grammars LR(0), LR(1), and LR(K).
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
Context Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Pushdown Automata Part II: PDAs and CFG Chapter 12.
CS 3240 – Chapter 6.  6.1: Simplifying Grammars  Substitution  Removing useless variables  Removing λ  Removing unit productions  6.2: Normal Forms.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Introduction to Computability Theory
CS5371 Theory of Computation
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule substitute B equivalent grammar.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
Normal forms for Context-Free Grammars
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Chapter 12: Context-Free Languages and Pushdown Automata
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
نظریه زبان ها و ماشین ها فصل دوم Context-Free Languages دانشگاه صنعتی شریف بهار 88.
Pushdown Automata (PDA) Intro
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Pushdown Automata (PDAs)
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
CS 203: Introduction to Formal Languages and Automata
1 Chapter 6 Simplification of CFGs and Normal Forms.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Chapter 5 Context-free Languages
Context-Free Languages
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Exercises on Chomsky Normal Form and CYK parsing
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Chomsky Normal Form.
CS 154 Formal Languages and Computability March 8 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Context-Free Grammars: an overview
Simplifications of Context-Free Grammars
NORMAL FORMS FDP ON THEORY OF COMPUTING
Chapter 6 Simplification of Context-free Grammars and Normal Forms
CHAPTER 2 Context-Free Languages
Context-Free Languages
Presentation transcript:

CS 3813 Introduction to Formal Languages and Automata Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata, 4th ed., by Peter Linz, published by Jones and Bartlett Publishers, Inc., Sudbury, MA, They are intended for classroom use only and are not a substitute for reading the textbook.

Parsing Given a string w and a grammar G, a parser finds a derivation of the string w from the grammar G, or else determines that the string is not part of the language Thus, a parser solves the membership problem for a language, which is the problem of deciding, for any string w and grammar G, whether w belongs to the language generated by G Typically, a parser also constructs a parse tree for the string (which can be used by a compiler for code generation)

Two questions Can we solve the membership problem for context-free languages? That is, can we develop a parsing algorithm for any context-free language? If so, can we develop an efficient parsing algorithm? We saw in the previous chapter that we can, if we place restrictions on the grammar.

Simplified forms and normal forms Simplified forms can eliminate ambiguity and otherwise “improve” a grammar What we would like to do is to have all productions in a CFG be in a form such that the string length is strictly non-decreasing. Once the productions are in this form, whenever we find in the process of deriving a string that the derivation string is longer than the input string, we know that the string cannot belong to the language.

Simplified forms and normal forms Normal forms of context-free grammars are interesting in that, although they are restricted forms, it can be shown that every CFG can be converted to a normal form. The two types of normal forms that we will look at are Chomsky normal form and Greibach normal form.

The empty string The empty string often complicates things, so we would like to define (and work with) a subset of a language which accepts the empty string. Let L be a context-free language and let G’ = (V, T, S, P) be a context free grammar for L – { λ }. Then we can construct a grammar G that generates L by adding the following to G’: Create a new Start variable, S 0 Add two new production rules to G’: S 0  S S 0  λ

The empty string Most of the proofs for CFG languages are demonstrated by using λ-free languages. It usually can be shown quite easily that the proof can also be extended to “equivalent” languages for which the only difference is the acceptance of the empty string. (yes, this is handwaving, but...)

Simplified forms Theorem 6.1: Let G = (V, T, S, P) be a context- free grammar. Suppose that P contains a production rule of the form: A  x 1 Bx 2 Assume that A and B are different variables and that B  y 1 | y 2 |... | y n is the set of all productions in P which have B as the left side.

Simplified forms Theorem 6.1: (continued) Let G’ = (V, T, S, P’) be the grammar in which P’ is constructed by deleting A  x 1 Bx 2 from P, and adding to it A  x 1 y 1 x 2 | x 1 y 2 x 2 |... | x 1 y n x 2 Then it may be shown that L(G’) = L(G) (see the Linz textbook, p. 151, for the proof)

Simplified forms Example: A  a | aaA | abBc B  abbA | b Here we can’t eliminate all rules with B on the left side, but we can eliminate it from the right side of any A rules. The equivalent productions would be: A  a | aaA | ababbAc | abbc B  abbA | b

Simplified forms Example: Suppose that our complete simplified grammar is: S  A A  a | aaA | ababbAc | abbc B  abbA | b Since you can’t get to B from S, there is no longer any way that any B rules can play a part in any derivation; they are useless.

Simplified forms Another example: Suppose that our grammar is: S  aSb | λ | A A  aA Notice that the production rule A  aA can never be used to produce a sequence of all terminals. It is therefore useless. The production rule S  A is also useless. (Why?) Both of these rules may be deleted without effectively changing the grammar.

Reachable Definition: A variable A in a CFG grammar G = (V, , S, P) is reachable if S  * xAy for some x  y  (V  T) *. Reachable variables are variables that appear in strings derivable from S.

Example S  EA A  abA | ab C  EC | Ab E  bC G  EbE | CE | ba Reachable variables: R 0 = {S} R 1 = {S, E, A} R 2 = {S, E, A, C} R 3 = {S, E, A, C}

Useful variables Definition: Let G = (V, , S, P) be a context-free grammar. Let A  V; then A is live iff there is at least one string w  L(G) such that xAy  * w with x, y in (V  T) * Informally, live variables are those from which strings of terminals can be derived. Variables which are not live are said to be dead.

Example S  AB | CD | ADF | CF | EA A  abA |ab B  bB | aD | BF | aF C  cB | EC | Ab D  bB | FFB E  bC | AB F  abbF | baF | bD | BB G  EbE | CE | ba Live variables: L 0 ={A, G} L 1 ={A, G, C} L 2 ={A, G, C, E} L 3 ={A, G, C, E, S}

Useful variables Definition 6.1 (modified): A variable A in a CFG grammar G = (V, , S, P) is useful if, for some string w  L(G), there is a derivation of w that takes the form S  * xAb  * w. Informally, a variable is useful if it can be used in a derivation of a string in the language L(G). A variable which is not useful is said to be useless. Variables which are dead are useless. Variables which are not reachable are useless.

Useless variables So a variable is useless if either: 1. it is not live (i.e., cannot derive a terminal string), or 2. it is not reachable from the start symbol A production is useless if it involves any useless variables.

Exercise Example: Given G = ({S, A, B, C}, {a, b}, S, P), with P = S  aS | A | C A  a B  aa C  aCb eliminate all useless variables and productions. First, we find any dead variables. It should be obvious that C can never generate a string of all-terminals. C is dead.

Exercise Delete any productions involving C. New grammar:S  aS | A A  a B  aa Next, we check to see if there are any variables which cannot be reached from the start symbol. To do this, we may use a dependency graph.

Exercise Example:S  aS | A | C A  a B  aa C  aCb Dependency graph: SA C B Clearly, B is not reachable from S.

Exercise Delete any productions involving B. New grammar:S  aS | A A  a The only productions that were deleted from the original grammar were useless. This new grammar generates all and only the strings generated by the original grammar. It is equivalent to the original grammar.

Useless variables Theorem 6.2: Let G = (V, T, S, P) be a context- free grammar. Then there exists an equivalent grammar G’ = (V’, T’, S, P’) that does not contain any useless variables or productions. See pp. 155 and 156 in the Linz text for the formal proof. Note that useless variables may be removed from V to give V’, and any terminals not occurring in any useful production may be removed from T to give T’.

Simplified forms and normal forms Two undesirable types of productions in a CFG can make the string length in sentential forms not increase:  productions - these productions are of the form A , and they actually decrease the length of the string unit productions - these productions are of the form A  B, and they allow rules to be applied to a string without increasing the length of the string and without getting us any closer to the goal of ending up with a string of all terminals

 productions Definition 6.2: Any production of a context-free grammar of the form A  λ is called a λ-production. Any variable A for which the derivation A  * λ is possible is called null able.

Nullable variables A nullable variable in a context-free grammar G = (V, , S, P) is defined as follows: 1. Any variable A for which P contains the production A  is nullable. 2. If P contains the production A  B 1 B 2 …B n and B 1 B 2 …B n are nullable variables, then A is nullable. 3. No other variables in V are nullable. The nullable variables in V are precisely those variables A for which A  *.

The effect of  productions Suppose we are trying to see if our CFG generates the string aabaa, which contains 5 terminal characters. In the process of applying productions, we have generated an intermediate string, aaYbYaa, containing 7 characters. Since  productions decrease the length of the string, it might still be possible to generate aabaa from aaYbYaa (if there were a derivation path Y  ).

 productions Note that without  productions, a grammar would have no way to reduce the number of characters in its intermediate strings. In such a grammar, we could stop processing intermediate strings as soon as they exceeded the length of the target string.

 productions So, given a CFG G without  productions, we could determine if a given string x of length |x| belonged to L(G) simply by applying production rules and generating all strings of length |x|. If x had not been generated up to that point, it could not belong to that language.

 productions Given the grammar S  aS 1 b S 1  aS 1 b | λ What is the effect of the production S 1  λ? The effect is to delete S 1 from any sentential form occurring on the right-hand side of a production rule.

 productions If we apply the production S 1  λ to S  aS 1 b the resulting production rule is S  ab If we apply the production S 1  λ to S 1  aS 1 b the resulting production rule is S 1  ab

 productions Therefore, we can eliminate any λ-productions from this grammar by adding the new productions obtained by substituting λ for S 1 wherever S 1 appears on the right-hand side of the production rules, and then deleting the λ-production. When we do this, we obtain the equivalent grammar: S  aS 1 b | ab S 1  aS 1 b | ab

 productions Theorem 6.3: Let G be any context-free grammar with λ not in L(G). Then there exists an equivalent grammar G’ having no λ-productions.

Algorithm FindNull Establish the set N 0, which is the set of all variables A in the grammar that go directly to. Now loop: The first time through the loop, add to this set all variables B that go to A. The second time through the loop, add to this set all variables C that go to B. The third time through the loop, add to this set all variables D that go to C. etc.... Stop when no new variables were added to the set during the last iteration of the loop.

Example Let G be the CFG with the productions: S  ABCBCDA A  CD B  Cb C  a | D  bD | Here, C and D are nullable because there are production rules C  and D . But A is also nullable, because A  CD, and both C and D are nullable.

Algorithm: Eliminate  productions Given a CFG G = (V,  S, P) construct a CFG G’= (V,  S, P’) with no -productions as follows: 1. Initialize P’ = P 2. Find all nullable variables in V, using FindNull. 3. For every production A  x in P (x  {V  T} * ), where x contains nullable variables, add to P’ every production that can be obtained from this one by deleting from x one or more of the occurrences in x of nullable variables. 4. Delete all  productions from P’. 5. In addition, delete any duplicates and delete productions of the form A  A.

Example Given a context-free grammar with the following production rules, find the nullable variables: S  ABC A  B | a B  C | b | λ C  AB | D D  Cd N 0 = {B} N 1 = {B, A} N 2 = {B, A, C} N 3 = {B, A, C, S}

Example (continued) S  ABC A  B | a B  C | b | C  AB | D D  Cd N = {A, B, C, S} S  ABC S  ABC | BC | AC | AB | A | B | C C  AB | D C  AB | A | B | D D  Cd D  Cd | d

Example (continued) S  ABC | AB | AC | BC | A | B | C A  B | a B  C | b C  AB | A | B | D D  Cd | d Note that we have gotten rid of all -productions. However, other beneficial changes can still be made.

Implications of Theorem 6.3: Let G = (V, , S, P) be any context-fee grammar, and let G’ be the grammar obtained from G by the previous algorithm. Then: 1. G’ has no  -productions, and 2. L(G’) = L(G) - { }. 3. Moreover, if G is unambiguous, then so is G’.

Unit productions Definition 6.3: Any production of a context-free grammar of the form A  B, where A, B  V is called a unit-production.

Unit productions Theorem 6.4: Let G = (V, T, S, P) be any context- free grammar without λ-productions. Then there exists a context-free grammar G’ = (V’, T’, S, P’) that does not have any unit-productions and that is equivalent to G.

Definition of A-derivable variables The set of “A-derivable variables” is the set of all variables B for which A  * . 1. If A  B is a production, then B is A-derivable. 2. If: C is A-derivable C  B is a production B  A then B is A-derivable. 3. No other variables are A-derivable.

Algorithm: Eliminating Unit Productions Given a context-free grammar G = (V,  S, P) with no -productions, construct a grammar G’= (V,  S, P’) having no unit productions as follows: 1. Initialize P’ to be P. 2. For each A  V, find the set of A-derivable variables. 3. For every pair (A, B) such that B is A-derivable, and every non-unit production B  x (where x  {V  T} + ), add the production A  x to P’. 4. Delete all unit productions from P’.

Example Original grammar: S  S+T | T T  T*F | F F  (S) | a {S -derivable} = {T} {T-derivable} = {F} {S-derivable} ={T, F} Resulting grammar: S  S+T | T*F | (S) | a T  T*F | (S) | a F  (S) | a

Summary Theorem 6.5: Let L be a context-free language that does not contain λ. Then there exists a context-free language that generates L and that does not have any useless productions, λ- productions, or unit-productions. Proof: Find a CFG that generates L. Apply the procedures in theorems 6.2, 6.3, and 6.4. The result is an equivalent CFG that generates L but does not have any useless productions, λ- productions, or unit-productions..

Summary Note that the procedure specified above must occur in a particular order. The procedure for removing λ-productions can create new unit-productions, and the procedure for eliminating unit- productions must start with a CFG that has no λ- productions. The required sequence is: 1. Remove λ-productions 2. Remove unit productions 3. Remove useless productions

Unit productions Given a context-free grammar G’ without unit  productions, any production rule must either: Convert a non-terminal to a terminal, or Replace a non-terminal with at least two other symbols

Simplified forms What does this mean for us? Given a grammar G and a language L(G), it means that if you have a string, x, in L(G) and |x| = k, then starting from S there are no more than 2k - 1 steps in the derivation of x.

Chomsky Normal Form There are other ways to limit the form a grammar can have. A context-free grammar in Chomsky Normal Form (CNF) has all of its rules restricted so that there are no more than two symbols, either one terminal or two variables, on the right-hand side of a production rule. This seems very restrictive, but actually every context-free grammar can be converted into Chomsky Normal Form.

Chomsky Normal Form Definition 6.4: A context-free grammar is in Chomsky Normal Form (CNF) if every production is one of these two types: A  BC A  a where A, B, and C are variables and a is a terminal symbol.

Chomsky normal form For languages that include the empty string λ, the rule S  λ may also be allowed, where S is the start symbol, as long as S does not occur on the right-hand side of any rule

Chomsky Normal Form Theorem 6.6: Any context-free grammar G = (V, T, S, P) with λ  L(G) has an equivalent grammar G’ = (V’, T’, S, P’) in Chomsky Normal Form. (Actually, for languages that include the empty string λ, the rule S  λ may also be allowed, where S is the start symbol, as long as S does not occur on the right-hand side of any rule.)

Chomsky Normal Form: Proof by construction Given a CFG grammar G = (V, , S, P), to convert it to Chomsky Normal Form: 1. Eliminate -productions and unit-productions from G, producing a CFG G’= (V, , S, P’), such that L(G’) = L(G) - { }. 2. Convert G’ into G’’ = (V’’, , S, P’’) so that every production is either of the form A  B 1 B 2 … B k (where k  2 and each B i is a variable in V’’), or of the form A  a

Chomsky Normal Form Basically, what you are doing in step 2 is restricting the right sides of productions to be either single terminals or strings of two or more variables. What we don’t want is strings of length  2 that have one or more terminals in them. If we have strings like this, for every terminal a appearing in such a string: 1.Add a new variable, X a and add a new production, X a  a 2. Replace a by X a in all the productions where it appears (except those in the form A  a).

Chomsky Normal Form (continued) 3. Convert G’’ into G’’’ = (V’’’, , S, P’’’). To do this, replace each production having more than two variables on the right by an equivalent set of productions, each one having exactly two variables on the right. (Create new variables as necessary to accomplish this.) For example: the production A  BCD would be replaced with A  BZ 1 Z 1  CD Done!

Example Original grammar: S  AB | ab A  ABAB | BA B  ab | b After step 2: S  AB | X a X b X a  a X b  b A  ABAB | BA B  X a X b | b

Example After step 2: S  AB | X a X b X a  a X b  b A  ABAB | BA B  X a X b | b After step 3: S  AB | X a X b X a  a X b  b A  AY 1 | BA Y 1  BY 2 Y 2  AB B  X a X b | b

Example If you recognize that A  ABAB has two copies of the same pair of variables, you could substitute the following instead: (but the first procedure works equally well) After step 3: S  AB | X a X b X a  a X b  b A  Y 1 Y 1 | BA Y 1  AB B  X a X b | b

Proof (concluded) This constitutes a proof by construction that any CFG can be converted to CNF. Later, this will be used to prove that there are languages which are not context-free.

Greibach Normal Form Greibach Normal Form is similar to Chomsky Normal Form, except that every production is of the form A  ax, where a is a terminal symbol and x is a string of zero or more variables. Note that GNF puts a limit on where terminals and variables can appear – restrictions on their relative positions – rather than on the number of symbols on the right-hand side of the production rules.

Greibach Normal Form Definition 6.5: A context-free grammar is said to be in Greibach Normal Form if all productions have the form A  ax where a  T and x  V *

Greibach Normal Form Example: Convert the following grammar into GNF: S  abSb | aa Introduce new variables A and B to stand for a and b respectively, and substitute: S  aBSB | aA A  a B  b

Greibach Normal Form Theorem 6.7: Any context-free grammar G = (V, T, S, P) with λ  L(G) has an equivalent grammar G’ = (V’, T’, S, P’) in Greibach Normal Form. It is hard to prove this, and it is hard to construct an easy-to implement algorithm for performing the conversion.

A membership algorithm for CFG’s The famous linguist Noam Chomsky showed that every context-free grammar can be converted to an equivalent grammar in Chomsky normal form. Why should you care about this? The fact that any CFG can be converted to Chomsky normal form lets us develop a parsing algorithm that shows that the membership problem can be solved for context-free languages (CFLs).

Some motivation Here is the idea of the algorithm: For a grammar in Chomsky normal form, any derivation of a string w has 2n-1 steps, where n is the length of w. (Why?) So, it is only necessary to check derivations of 2n-1 steps to decide whether G generates w. Of course, this parsing algorithm is inefficient! It would never be used in practice. But it solves the membership problem for CFLs.

The CYK algorithm The membership algorithm for CFG’s that is usually cited is the CYK algorithm, named for its three developers. It works by breaking down the problem into a sequence of smaller problems and solving them. Details may be found on pages of the Linz textbook. This algorithm can be shown to run in |w| 3 time.

LL grammars A top-down parser finds a leftmost derivation of a string. “Top-down” means to start with the start symbol and show how to derive the string from it. An LL(k) grammar allows a parser to perform left-to- right scan of the input to find a leftmost derivation, using k symbols of lookahead to select the next rule. Many compilers have been written using LL parsers. But LL grammars are not sufficiently general to generate all deterministic CFLs. This led to study of more general deterministic grammars, especially LR grammars.

LR grammars A bottom-up parser finds a rightmost derivation of a string. “Bottom-up” means to start with a string and “reduce” it to the start symbol. An LR(k) grammar allows a parser to perform left-to- right scan of the input to produce a rightmost derivation, using k symbols of lookahead to select the next rule. The class of languages generated by LR(1) grammars is exactly the deterministic CFLs. Two subclasses of LR(1) grammars, called SLR(1) (for “simple” LR) and LALR(1) (for “lookahead” LR) are commonly used for programming languages.

Parsing algorithms Parsing is an extremely important topic in the design and compilation of programming languages. You will study parsing algorithms based on various LL and LR grammars in a course on compiler design. Most of what we have studied in these chapters about regular and context-free languages provides the mathematical foundation for designing good compilers. (It has many other applications as well.)

Efficient parsing Programming languages are context-free languages, and parsing is central to any programming language compiler Many parsing algorithms for context-free grammars have been developed over the years. Most simulate pushdown automata. However, some PDAs cannot be simulated efficiently by computer programs because they are nondeterministic. Efficient parsers simulate deterministic PDAs.

Regular grammar  CFG’s A word is a string of all terminals. A semiword is a string of 0 or more terminals concatenated with exactly one nonterminal on the right. So, for example, abcA is a semiword. A CFG is called a regular grammar if each of its productions is one of the two forms: Nonterminal  semiword Nonterminal  word

Regular grammars All regular languages can be generated by regular grammars. All regular grammars generate regular languages. Context-free grammars are more powerful than regular grammars. Regular languages are a proper subset of context-free languages, so CFG’s can generate all regular languages (as well as non-regular context-free languages).