Ambiguity Parsing algorithms

Slides:



Advertisements
Similar presentations
CSCI 3130: Formal Languages and Automata Theory Tutorial 5
Advertisements

Closure Properties of CFL's
CYK Parser Von Carla und Cornelia Kempa. Overview Top-downBottom-up Non-directional methods Unger ParserCYK Parser.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Western Michigan University CS6800 Advanced Theory of Computation Spring 2014 By Abduljaleel Alhasnawi & Rihab Almalki.
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule substitute B equivalent grammar.
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
Tutorial CSC3130 : Formal Languages and Automata Theory Tu Shikui ( ) SHB 905, Office hour: Thursday 2:30pm-3:30pm
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
CONVERTING TO CHOMSKY NORMAL FORM
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
Lecture 11 Theory of AUTOMATA
CSCI 3130: Formal languages and automata theory Tutorial 4 Chin.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Limitations.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Exercises on Chomsky Normal Form and CYK parsing
1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.
Theory of Languages and Automata By: Mojtaba Khezrian.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
Theory of Computation Automata Theory Dr. Ayman Srour.
CSCI 2670 Introduction to Theory of Computing September 16, 2004.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Polynomial.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
David Rodriguez-Velazquez CS – 6800 Summer I
Context-Free Grammars: an overview
LR(k) grammars The Chinese University of Hong Kong Fall 2009
Complexity and Computability Theory I
7. Properties of Context-Free Languages
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
Polynomial time The Chinese University of Hong Kong Fall 2010
Pushdown automata and CFG ↔ PDA conversions
LR(1) grammars The Chinese University of Hong Kong Fall 2010
7. Properties of Context-Free Languages
CHAPTER 2 Context-Free Languages
فصل دوم Context-Free Languages
NFAs, DFAs, and regular expressions
Decidable and undecidable languages
CSE 105 theory of computation
Theory of Computation Lecture #
LR(1) grammars The Chinese University of Hong Kong Fall 2011
NFA to DFA conversion and regular expressions
Pushdown automata The Chinese University of Hong Kong Fall 2011
The Cocke-Kasami-Younger Algorithm
LR(k) grammars The Chinese University of Hong Kong Fall 2008
Normal forms and parsing
Answer Questions about Exam2 problems
Normal Forms for Context-free Grammars
Context-Free Languages
Presentation transcript:

Ambiguity Parsing algorithms The Chinese University of Hong Kong Fall 2011 CSCI 3130: Automata theory and formal languages Ambiguity Parsing algorithms Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

Ambiguity E  E + E | E * E | (E) | N N  1N | 2N | 1 | 2  E * + N 1 2 E + * N 1 2 1+2*2 = 5 = 6 A CFG is ambiguous if some string has more than one parse tree

Example Is S  SS | x ambiguous? Yes, because x S x S xxx

Disambiguation S  SS | x S  Sx | x x Sometimes we can rewrite the grammar to remove ambiguity

Disambiguation F T Divide expression into terms and factors F E  E + E | E * E | (E) | N N  1N | 2N | 1 | 2 same precedence! F T Divide expression into terms and factors F 2 * (1 + 2 * 2)

Disambiguation An expression is a sum of one or more terms E  E + E | E * E | (E) | N N  1N | 2N | 1 | 2 An expression is a sum of one or more terms E  T | E + T Each term is a product of one or more factors T  F | T * F Each factor is a parenthesized expression or a number F  (E) | 1 | 2

Parsing example E T E T F T F E F T E T E F T T F F F + T F T F * E ( ) F T E + T E + F T * T F F F 2 * (1 + 1 + 2 * 2) + 1

Disambiguation Disambiguation is not always possible because There exist inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from precedence rules, and we can do like in example In English, ambiguity is sometimes a problem: He ate the cookies on the floor

Ambiguity in English He ate the cookies on the floor

Parsing S → 0S1 | 1S0S | T input: 0011 T → S | e How would we program the computer to build a parse tree for us?

Parsing ✔ S → 0S1 | 1S0S | T input: 0011 T → S | e ⇒ ... S 0S1 1S0S T ⇒ 00S11 01S0S1 0T1 ⇒ ⇒ 000S111 00T11 ... ... ⇒ ⇒ 00S11 0011 ✔ S  10S10S ... ⇒ First idea: Try all derivations

Problems Trying all derivations may take a very long time  If input is not in the language, parsing will never stop 

When to stop Idea 2: Stop when S → 0S1 | 1S0S | T T → S | e Problems: |derived string| > |input| Problems: S  0S1  0T1  01 1 3 2 S  T  S  T  … Derived strings may shrink because of “e-productions” Derivation may loop because of “unit productions” Task: remove e and unit productions

Removal of -productions A variable N is nullable if it derives the empty string N   * Identify all nullable variables N   Remove nullable variables carefully  If start variable S is nullable: Add a new start variable S’ Add special productions S’ → S | 

Example grammar nullable variables S  ACD A a B   C  ED |  D  BC | b E  b B C D If X  , mark X as nullable If X  YZ…W, all marked nullable, mark X as nullable also. Repeat the following: Identify all nullable variables 

Eliminating e-productions D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b nullable: B, C, D If you see X → N, add X →  If you see N → , remove it. For every nullable N:  Remove nullable variables carefully

Eliminating unit productions A unit production is a production of the form A → B grammar: unit productions graph: S → 0S1 | 1S0S | T T → S | R |  R → 0SR S T R

Removal of unit productions  If there is a cycle of unit productions delete it and replace everything with A A → B → ... → C → A S T R  S → 0S1 | 1S0S | T T → S | R |  R → 0SR S → 0S1 | 1S0S S → R |  R → 0SR  replace T by S

Removal of unit productions  Replace every chain by A → , B → ,... , C →  A → B → ... → C →  S R S → 0S1 | 1S0S | R |  R → 0SR S → 0S1 | 1S0S | 0SR |  R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR

Recap If input is not in the language, parsing will never stop Problem: Solution: Eliminate  productions Eliminate unit productions important to do in this order Try all possible derivations but stop parsing when |derived string| > |input|

Example input: 0011 conclusion: 0011 ∉ L S → 0S1 | 0S0S | T T → S | 0 0S0S 0S1 ⇒ ✘ ⇒ 001 00S11 00S0S1 ✘ too long too long ⇒ 000S 00S10S 00S0S0S ⇒ 0000 1000S1 1000S0S ✘ too long too long too long too long

Problems Trying all derivations may take a very long time  If input is not in the language, parsing will never stop 

A faster way to parse: the Cocke-Younger-Kasami algorithm Preparations A faster way to parse: the Cocke-Younger-Kasami algorithm To use it we must prepare the CFG: Eliminate  productions Eliminate unit productions Convert CFG to Chomsky Normal Form

Chomsky Normal Form A CFG is in Chomsky Normal Form if every production* has the form A → BC or A → a Convert to Chomsky Normal Form: Noam Chomsky A → BcDE A → BCDE C → c A → BX X → CY Y → DE break up sequences with new variables replace terminals with new variables C → c * Exception: We allow S → e for start variable only

Cocke-Younger-Kasami algorithm SAC S  AB | BC A  BA | a B  CC | b C  AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

Parse tree reconstruction b AC B SA SC – SAC S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

Cocke-Younger-Kasami algorithm Grammar without e and unit productions in Chomsky Normal Form table cells Input string x = x1…xk 1k … … For all cells in last row If there is a production A  xi Put A in table cell ii For cells st in other rows If there is a production A  BC where B is in cell sj and C is in cell (j+1)t Put A in cell st 12 23 11 22 kk x1 x2 … xk 1 s j t k Cell ij remembers all possible derivations of substring xi…xj