Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam
Membership problem To determine if the given string is a member of the language defined by a context free grammar. Given a context-free grammar G and a string w G = (V, ∑,P, S) where V finite set of variables ∑ (the alphabet) finite set of terminal symbols P finite set of rules S start symbol (distinguished element of V) V and ∑ are assumed to be disjoint Is W in the language of G?
CYK Algorithm Developed by J. Cocke D. Younger, T. Kasami to answer the membership problem Input should be in Chomsky Normal form A BC A a S λ where B, C Є V – {S} Uses bottom up parsing Uses dynamic programming or table filling algorithm Complexity - O(n 3 )
CYK basic Ideas CYK works on two basic ideas 1. Consider rules satisfying substrings of length from 1 to N Let the string to search be abca First consider substring of length 1 – a, b, c, a Next step length 2– ab, bc, ca Next step length 3– abc, bca Final length 4 – abca
CYK basic ideas 2. longer substrings can be parsed from parsing shorter ones Eg: abc can be split as a. bc or ab. c if we know rules to form a and bc (or ab and c) then we know the rules to form abc A substring can be given as S i,j = (S i, i, S i+1, j ), (S i, i+1, S i+2, j ) … (S i, j-1, S j, j ) i – start index and j- end index bcd can be formed from abcd as S 2,4 = (S 2,2, S 3,4 ), (S 2,3, S 4,4 ) = (b. cd), (bc. d)
CYK table filling W i,j = (W i, i, W i+1, j ), (W i, i+1, W i+2, j ) …… (W i, j-1, W j, j ) Fill the table with the rules satisfying the substrings If the final box contains the start symbol then the string is a member of the language W 1,4 W 1,3 W 2,4 W 1,2 W 2,3 W 3,4 W 1,1 W 2,2 W 3,3 W 4,4 W1 W2 W3 W4
Table filling example c b b a W 1,4 W 1,3 W 2,4 W 1,2 W 2,3 W 3,4 {A, C} {B} {A} Search string ‘cbba’
To fill the next row of the table consider W i,j = (W i, i, W i+1, j ), (W i, i+1, W i+2, j ) …… (W i, j-1, W j, j ) W 1,2 = (W 1,1, W 2,2 ) = {A,C} {B} = {AB, CB} Rules to form AB or CB = {S, C} W 2,3 = (W 2,2, W 3,3 ) = {B} {B} = {B B} Rules to form BB = ∅ W 3,4 = (W 3,3, W 4,4 ) = {B} {A} = {B A} Rules to form BA = {C } W 1,2 W 2,3 W 3,4 {A, C} {B} {A}
W 1,4 W 1,3 W 2,4 {S,C} ∅ {C} {A, C} {B} {A} Table :
W 1,3 W 2,4 {S,C} ∅ {C} {A, C} {B} {A} W i,j = (W i, i, W i+1, j ), (W i, i+1, W i+2, j ) …… (W i, j-1, W j, j ) W 1,3 = (W 1,1, W 2, 3 ), (W 1, 2, W 3, 3 ) = {A,C} U {S,C} {B} = { A, C, SB, CB} Rules to form A or C or SB or CB = {C} W 2,4 = (W 2,2, W 3, 4 ), (W 2, 3, W 4, 4 ) = {B} {C} U {A} = { BC, A} Rules to form BC or A = {B}
W 1,4 {C}{B} {S,C} ∅ {C} {A, C} {B} {A} Table :
W i,j = (W i, i, W i+1, j ), (W i, i+1, W i+2, j ) …… (W i, j-1, W j, j ) W 1,4 = (W 1, 1, W 2, 4 ), (W 1, 2, W 3, 4 ),(W 1, 3, W 4, 4 ) = {A,C} {B} U {S,C} {C} U {C} {A} = { AB, CB, SC, CC, CA} Rules to form AB or CB or SC or CC or CA = {S,C,A} W 1,4 {C}{B} {S,C} ∅ {C} {A, C} {B} {A}
{S, C, A } {C}{B} {S,C} ∅ {C} {A, C} {B} {A} Final Table : The first cell represents the original string and contains the start symbol ‘S’. Result : ‘cbba’ is a member of the language.
Design Read the input grammar from a file or prompt user to input the rules Check if the grammar is in CNF If grammar is in CNF, start filling the table Output : ‘String is a member of the input grammar’ Or ‘String is not a member of the input grammar’
References Languages and Machines, An Introduction to the Theory of Computer Science - Thomas A. Sudkamp “Parsing” Internet:
Questions