CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.

Slides:



Advertisements
Similar presentations
Closure Properties of CFL's
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Western Michigan University CS6800 Advanced Theory of Computation Spring 2014 By Abduljaleel Alhasnawi & Rihab Almalki.
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
Tutorial CSC3130 : Formal Languages and Automata Theory Tu Shikui ( ) SHB 905, Office hour: Thursday 2:30pm-3:30pm
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Context-Free Grammars
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CONVERTING TO CHOMSKY NORMAL FORM
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of CFL was to formalize.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
Context-Free Grammars – Chomsky Normal Form Lecture 16 Section 2.1 Wed, Sep 26, 2007.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
CSCI 3130: Formal languages and automata theory Tutorial 4 Chin.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Limitations.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Exercises on Chomsky Normal Form and CYK parsing
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.
Theory of Languages and Automata By: Mojtaba Khezrian.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
Theory of Computation Automata Theory Dr. Ayman Srour.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
David Rodriguez-Velazquez CS – 6800 Summer I
Context-Free Grammars: an overview
LR(k) grammars The Chinese University of Hong Kong Fall 2009
Chomsky Normal Form CYK Algorithm
Ambiguity Parsing algorithms
Complexity and Computability Theory I
7. Properties of Context-Free Languages
Simplifications of Context-Free Grammars
Polynomial time The Chinese University of Hong Kong Fall 2010
LR(1) grammars The Chinese University of Hong Kong Fall 2010
7. Properties of Context-Free Languages
CHAPTER 2 Context-Free Languages
LR(1) grammars The Chinese University of Hong Kong Fall 2011
The Cocke-Kasami-Younger Algorithm
Normal forms and parsing
Answer Questions about Exam2 problems
Presentation transcript:

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity Parsing algorithm for CFGs Fall 2010

Ambiguity A grammar is ambiguous if some strings have more than one parse tree 1+2*2 E EE + EE * V VV 1 22 E EE * EE + V VV 12 2 E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2 = 5 = 6 

Disambiguation Sometimes we can rewrite the grammar to remove the ambiguity E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2 same precedence! Divide expression into terms and factors 2 * (1 + 2 * 2) FF T T FF

Disambiguation E  E + E | E * E | ( E ) | N N  1 N | 2 N | 1 | 2 E  T | E + T An expression is a sum of one or more terms Each term is a product of one or more factors T  F | T * F Each factor is a parenthesized expression or a number F  ( E ) | 1 | 2

Parsing example 2 * ( * 2) + 1 E  T | E + T T  F | T * F F  ( E ) | 1 | 2 E T T E + T F * E () T F FF F F T E + TE + FT *

Disambiguation Disambiguation is not always possible –There exist inherently ambiguous languages –There is no general procedure for disambiguation In programming languages, ambiguity comes from precedence rules, and we can do like in example In English, ambiguity is sometimes a problem: He ate the cookies on the floor

Parsing Do we have a method for building a parse tree? Can we tell if the parse tree is unique? S → 0S1 | 1S0S1 | T T → S |  input: 00111

First attempt Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = S0S1 1S0S1 T 00S11 01S0S11 0T1 S  10S10S1... when do we stop?

Problems How do we know when to stop? S → 0S1 | 1S0S1 | T T → S |  x = S0S1 1S0S1 00S11 01S0S11 0T1 10S10S1... when do we stop?

Problems Idea: Stop derivation when length exceeds |x| Not right because of  -productions We want to eliminate  -productions S → 0S1 | 1S0S1 | T T → S |  x = S  0S1  01S0S11  01S011 

Problems Loops among the variables ( S → T → S ) might make us go forever We want to eliminate such loops S → 0S1 | 1S0S1 | T T → S |  x = 00111

Removal of  -productions A variable N is nullable if there is a derivation How to remove  -productions Find all nullable variables N For every production of the form A →  N , add another production A →  If N →  is a production, remove it If S is nullable, add the special production S →  N  N   *   

Example Find the nullable variables S  ACD A  a B   C  ED |  D  BC | b E  b BCD nullable variablesgrammar Find all nullable variables 

Finding nullable variables To find nullable variables, we work backwards –First, mark all variables A s.t. A   as nullable –Then, as long as there are productions of the form where all of A 1,…, A k are marked as nullable, mark A as nullable A → A 1 … A k

Eliminating  -productions S  ACD A  a B   C  ED |  D  BC | b E  b nullable variables: B, C, D For every production of the form A →  N , add another production A →  If N →  is a production, remove it  D  C S  AD D  B D   S  AC S  A C  E

Dealing with loops A unit production is a production of the form where A 1 and A 2 are both variables Example A 1 → A 2 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR grammar:unit productions: ST R

Removal of unit productions If there is a cycle of unit productions delete it and replace everything with A 1 Example A 1 → A 2 →... → A k → A 1 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR ST R S → 0S1 | 1S0S1 S → R |  R → 0SR T is replaced by S in the {S, T} cycle  

Removal of unit productions For other unit productions, replace every chain by productions A 1 → ,..., A k →  Example A 1 → A 2 →... → A k →  S → R → 0SR is replaced by S → 0SR, R → 0SR S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR

Recap After eliminating  -productions and unit productions, we know that every derivation doesn’t shrink in length and doesn’t go into cycles Exception: S →  –We will not use this rule at all, except to check if  L Note –  -productions must be eliminated before unit productions S  a 1 …a k where a 1, …, a k are terminals *

Example: testing membership S → 0S1 | 1S0S1 | T T → S |  x = S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S 01, S1 1S01 1S0S , strings of length ≥ , strings of length ≥ 6 unit,  -prod eliminate only strings of length ≥ 6 0S1 0011, S11 strings of length ≥ 6 only strings of length ≥ 6

Algorithm 1 for testing membership How to check if a string x ≠  is in L(G) Eliminate all  -productions and unit productions Let X := S While some new rule R can be applied to X Apply R to X If X = x, you have found a derivation for x If |X| > |x|, backtrack If no more rules can be applied to X, x is not in L    

Practical limitations of Algorithm I This method can be very slow if x is long There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about steps!

Chomsky Normal Form A CFG is in Chomsky Normal Form if every production (except S →  ) is Convert to Chomsky Normal Form: A → BC A → a or A → BcDE replace terminals with new variables A → BCDE C → c break up sequences with new variables A → BX 1 X 1 → CX 2 X 2 → DE C → c Noam Chomsky

Algorithm 2 for testing membership S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Idea: We generate each substring of x bottom up abbaa ACBB BSA SC B–B SAC–

Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba abbaa ACACBBACACACAC BSASASASCSC B–B SAC– Tracing back the derivations, we obtain the parse tree

Cocke-Younger-Kasami algorithm For cells in last row If there is a production A  x i Put A in table cell ii For cells st in other rows If there is a production A  BC where B is in cell sj and C is in cell jt Put A in cell st x 1 x 2 … x k 11 22kk …… 1k1k table cells s jtk 1 Input: Grammar G in CNF, string x = x 1 …x k Cell ij remembers all possible derivations of substring x i …x j