CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics
Context Free Languages Context-Free Languages
So far … Methods for describing regular languages Finite Automata Deterministic Non-deterministic Regular Expressions They are all equivalent, and limited Cannot some simple languages like {0 n 1 n | n is positive} Now, we introduce a more powerful method for describing languages Context-free Grammars (CFG)
Are CFGs any useful? Extremely useful! Artificial Intelligence Natural language Processing Programming Languages specification compilation
Example This is a CFG which we call G1 A 0A1 A B B #
Example: production rules This is a CFG which we call G1 A 0A1 A B B # Each line is a substitution rules or production rules
Example: variables This is a CFG which we call G1 A 0A1 A B B # A and B are called variables or non-terminals
Example: variables This is a CFG which we call G1 A 0A1 A B B # 0,1, and # are called terminals
Example: variables This is a CFG which we call G1 A 0A1 A B B # A is the start variable
Rules We use a CFG to describe a language by generating each string of that language Write down the start variable Pick a variable written down and a production rule that starts with that variable Replace that variable with right-hand side of the production rule Repeat until no variable remain
Derivations This is a CFG which we call G1 A 0A1 A B B # Derivations with G1 A 0A1 0B1 0#1 A 0A1 00A11 00B11 00#11 A 0A1 00A11 000A111 000B111 000#111
Parse tree Parse tree for 0#1 in G1 A 0A1 0B1 0#1 A B 0# 1 A
Parse tree Parse tree for 00#11 in G1 A 0A1 00A11 00B11 00#11 A B 0# 1 A A 0 1
Context-free languages All strings generated by a CFG constitute the language of the grammar Example: L(G1)={0 n #1 n | n is positive} Any language generated by a context-free grammar is a context-free language
A useful abbreviation Production rules A 0A1 A B B # Can be written as A 0A1 | B B #
Another example CFG G2 describing a fragment of English | | a | the boy | girl | flower touches | likes | sees with
Another example Examples of strings belonging to L(G2) a boy sees the boy sees a flower a girl with a flower likes the boy with a flower
Another example Derivation of a boy sees a a boy a boy sees
Formal definitions A context-free grammar is a 4-tuple where V is a finite set of variables ∑is a finite set of terminals R is a finite set of rules: each rule is a variable and a finite string of variable and terminals S is the start symbol
Formal definitions If u and v are strings of variable and terminals, and A w is a rule of the grammar, Then uAv yields uwv, written uAv uwv We write u * v if u = v or u u1 …. uk v
Formal definitions The language of grammar G is L(G) = {w | S * w}
Example Consider G4 = where R is S (S) | SS | ε What is the language of G4? Examples: (), (()((())), …
Example Consider G4 = where R is S (S) | SS | ε What is the language of G4? L(G4) is the set of strings of properly nested parenthesis
Example Consider G4 = where R is E E + T | T T T X F | F F (E) | a What is the language of G4? Examples: a+a+a, (a+a) x a
Example Consider G4 = where R is E E + T | T T T x F | F F (E) | a What is the language of G4? E stands for expression, T for Term, and F for Factor: so this grammar describes some arithmetic expressions
Ambiguity Sometimes a grammar can generate the same string in several different ways! This string will have several parse trees This is a very serious problem Think if a C program can have multiple interpretations? If a language has this problem, we say that it is ambiguous
Example Consider G5: + | x |( ) | a G5 is ambiguous because a+axa has two parse tress!
Example Consider G5: + | x |( ) | a G5 is ambiguous because a+axa has two parse tress! a+ a xa
Example Consider G5: + | x |( ) | a G5 is ambiguous because a+axa has two parse tress! a+ a xa a+ a xa
Formal definition: ambiguity A string w is generated ambiguously in CFG G if it has two or more different leftmost derivations! A derivation is leftmost if at every step the variable being replaced is the leftmost one Grammar G is ambiguous if it generates some string ambiguously
Chomsky Normal Form (CNF) Every rule has the form A BC A a S ε Where S is the start symbol, A, B, and C are any variables – except that B and C may not be the start symbol
Theorem Theorem: Any context-free language is generated by a context-free grammar in Chomsky normal form How? Add new start symbol S0 Eliminate all rules of the form A ε Eliminate all “unit” rules of the form A B Patch up rules so that grammar denotes the same language Convert remaining rules to proper form
Steps to convert any grammar into CNF Step1 Add a new start symbol S0 Add the rule S0 S
Steps to convert any grammar into CNF Step2: Repeat Remove some rule of the form A ε where A is not the start symbol Then, for each occurrence of A on the right-hand side of a rule, we add a new rule with that occurrence deleted E.g., if R uAvAu where u and v are strings of variables and terminals We add rules: R uvAu, R uAvu, and R uvu For R A add R ε, except if R ε has already been removed Until all ε-rules not involving the start symbol have been removed
Steps to convert any grammar into CNF Step3: eliminate unit rules Repeat Remove some rule of the form A B For each B u, add A u, except if A u has already been removed Until all unit rules have been removed
Steps to convert any grammar into CNF Step4: convert remaining rules Replace each rule A u 1 u 2 …u k, where k >2 and each u i is a terminal or a variable with the rules A u 1 A 1 A 1 u 2 A 2 A 2 u 3 A 3 …. A k-2 u k-1 u k If k=2, we replace any terminal u i in the preceding rules with the new variable U i and add the rule U i u i
Example Start with S ASA | aB A B | S B b | ε
Example Step 1: add new start symbol and new rule S0 S S ASA | aB A B | S B b | ε
Example Step 2: remove ε- rule B ε S0 S S ASA | aB | a A B | S | ε B b
Example Step 2: remove ε- rule A ε S0 S S ASA | aB | a | SA | AS | S A B | S B b
Example Step 3: remove unit rule S S S0 S S ASA | aB | a | SA | AS | S A B | S B b
Example Step 3: remove unit rule S0 S S0 S | ASA | aB | a | SA | AS S ASA | aB | a | SA | AS A B | S B b
Example Step 3: remove unit rule A B S0 ASA | aB | a | SA | AS S ASA | aB | a | SA | AS A B | S | b B b
Example Step 3: remove unit rule A S S0 ASA | aB | a | SA | AS S ASA | aB | a | SA | AS A S | b | ASA | aB | a | SA | AS B b
Example Step 3: remove unit rule A S S0 ASA | aB | a | SA | AS S ASA | aB | a | SA | AS A b | ASA | aB | a | SA | AS B b
Example Step 4: convert remaining rules S0 AA1|UB| a| SA | AS S AA1|UB | a | SA | AS A b | AA1 | UB | a | SA | AS B b U a A1 SA
Pushdown Automata
Pushdown automata Pushdown automat (PDA) are like nondeterministic finite automat but have an extra component called a stack Can push symbols onto the stack Can pop them (read them back) later Stack is potentially unbounded
State control aaba x y z stack input
Formal Definition A pushdown automaton is a 6-tuple (Q,∑,S, ξ,q0,F), where Q is a finite set of states ∑ is a finite set of symbols called the alphabet S is the stack alphabet ξ : Q x ∑ ε x S ε P(Q x S ε ) is the transition function q0 Є Q is the start state F ⊆ Q is the set of accept states or final states
Conventions Question: when is the stack empty? Start by pushing a $ onto the stack If you see it again, stack is empty Question: when is input string empty Doesn’t matter Accepting states accept only if inputs exhausted
Notation Transition a,b c means Read a from the input Pop b from stack Push c onto stack Meaning of ε transition If a = ε, don’t read input If b= ε, don’t pop any symbol If c= ε, don’t push any symbols
Example Recall 0 n 1 n which is not regular Consider the following PDA Read input symbols For each 0, push it on the stack As soon as a 1 is seen, pop a 0 for each 1 read Accept if stack is empty when last symbol read Reject if stack non-empty, or if input symbol exist, or if 0 read after a 1, etc…
Example ε,ε$ε,ε$ 0, ε 0 1,0 ε ε,$ ε {0 n 1 n | n is positive}
Example ε,ε$ε,ε$ {a i b j c k | i=j or i=k} a, ε a ε,ε εε,ε ε ε,ε εε,ε ε ε,ε εε,ε ε ε, $ ε b, ε ε c, a ε b, a ε ε,$ ε c,ε ε
Theorem Theorem: A language is context-free if and only some pushdown automaton accepts it Proof: we will skip it! (Those interested may read the book) Corollary: Every regular language is a context- free language Regular languages Context-free languages
Conclusions Context-free grammars definition ambiguity Chomsky normal form Pushdown automata definition Next: Part C; Computability Theory