Languages & Grammars
Grammars A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left
Formal Grammar Notation G = {V,T,S,P} –V are variables ( ) –T are terminals (Fritz) –S is the start variable ( ) –P are the production rules Let W be a string of variables and terminals W Y means that W can be transformed into Y using the production rules
Languages + Grammars A grammar defines a language Many grammars can define the same language Grammars that generate the same language are equivalent.
Formal languages Forget words; letters only S is usually start symbol What language? S -> aA | S -> aA | A -> bS
Exercises... or Puzzles? Write a grammar that generates: L={a n b m, n<2, m<=2} Write a grammar that generates: L={a n b m, n>0, m>=0} Write a grammar that generates all strings on {a,b}* with exactly 2 a ’ s Write a grammar that generates: L={a n b n, n>0} Write a grammar that generates: L = {w {a}*: |w| mod 4 = 0}
Mathy Questions Suppose a grammar for L1 has start symbol S1 and a grammar for L2 has start symbol S2. What grammar describes –L1 U L2? –L1L2? –L1*? Can you prove that your answer is correct? Can you prove your neighbor ’ s answer is wrong?
Recursion Generate a+ –S -> aS | a Generate a* –S -> aS | –S -> aS |
Impose Order Language: a + b + –S -> AB –A -> aA | a –B -> bB | b
Relationship between symbols Language: a n b n n>0 –S -> aSb | ab
Context-Free Grammars A grammar is context-free if all production rules have only one non- terminal on the left-hand side A -> aSa A -> AB A -> a Not context-free: ABB -> aaSB
Members of CFLs To see if a string is a member of a CFL, replace every non-terminal with the right side of one of its production rules. S -> AB A -> aaA | λ B -> Bb | λ Derivation of string aab: S -> AB -> aaAB -> aaB -> aaBb -> aab So, aab is a sentence in the language, and aaAB is a sentential form.
Theory Not all Context-free languages are regular. Example: a n b n Can you write a CF grammar? Can you write an automaton?
Theory BUT: All regular languages are also context-free. How could we prove this? CF Reg
Automata to Grammar States are non-terminals Alphabet letters are the terminals Start state corresponds to Start Symbol Final States go to lambda
S -> aA A -> aA | aS | bB | C B -> bC C -> λ
Derivations Leftmost derivation always replaces the leftmost variable in the sentential form next. Rightmost derivation always replaces the one on the right. Derivations can be shown using a derivation tree.
Example S -> aAB A -> bBb B -> A | λ Derive the string abbbb Leftmost: S-> a A B -> a bBb B -> a bAb B -> a b bBb b B -> a b b b b B -> a b b b b Rightmost:
Derivation Trees Ordered tree in which: –Interior nodes are left-hand sides of rules (variables) –Children of a node are right-hand sides –Root is start symbol –Leaves are terminals Reading the leaves from left to right is the yield of the tree (a sentence in the language)
Example: S -> aAB A -> bBb B -> A | λ Show derivation tree for derivation of abbbb S a A B b B b λ A b B b λ
Ambiguity For any string in a context-free language, there may be more than one derivation tree that produces it. This is ambiguity. Programming Languages cannot have ambiguity. Try to find equivalent, unambiguous grammars, but it can ’ t always be done. If there is any unambiguous grammar for a language, it is unambiguous. If the language has no unambiguous grammar, the language is inherently ambiguous.
Exercises Grammar: –S -> Sa | SSb | b | –S -> Sa | SSb | b | Give rightmost derivation of string bbaaba Draw derivation tree Show that the grammar is ambiguous