Computability Joke. Context-free grammars Parsing. Chomsky Homework: Design grammar for [simple] computer language
Proof by induction Requires the subject domain to be classified by natural numbers: 0 or 1 or some starting point, and then all numbers following Prove a starting case, for example, N=1 Prove either if it is true for k, FOR ALL k, can prove it for k+1 if it is true for p<k, FOR ALL k, can prove it for k Think of induction step as short cut to proving theorem for 2, 3, 4, 5, … SO, with my screaming capital letters as a hint, what was wrong with the All horses are the same color proof?
Preview on proofs Another typical form of proof is by construction build / design the FSM, etc. Another is by contradiction: assuming result and show it leads to some falsehood One category is: assuming you can make a list of all Xs…. then some special example must be on the list but then ….
Hierarchy Moving from languages defined by FSM (aka finite state automata), equivalent to non-deterministic FSM, equivalent to regular expressions to Languages defined by Context-Free Grammars, equivalent to [non-deterministic ] push-down automata will turn out that deterministic PDA are less powerful. a FSM can be considered a special type of PDA
Grammar A grammar has a [finite] alphabet A (sometimes called terminals) plus a finite set V of variables. Starting symbol S is a member of V. A production rule is a mapping/substitution of strings A grammar has a finite set of production rules A context-free grammar has production rules of the form a single variable V to a string of symbols from A and V V string of letters from A and V A non-context-free production rule would be aVb adWb, meaning, when V is in the context of a and b, then you can substitute dW Can combine production rules using |
Derivations Applying the rules until there are no more variables, just terminals is a derivation. A string is in the language defined by the grammar if there is a derivation. Think of the variables as the parts of speech.
Example Let the alphabet of terminals be: (, ), +, *, v, w, x, y, z Let the variables be E, the starting symbol, think of it as expression F, factor OP, operator (I use two letters for readability) Rules are E ( E ) | E OP E | F F v | w | x | y | z OP + | *
Sample derivation E E OP E E ( E ) E E OP E E F F v OP + F w etc. FINISH! Draw as a tree. Trees in computer science are upside down!
Parsing a string is producing a set of rules, often recorded using a tree, that derive (cover) the string. So for the string (x+y) E ( E ) E E OP E E F x E R y
Parsing If there isn't a parse tree, then the string isn't in the language, though it may require some proof…
Derivation vs Parsing Opposite directions The goal of parsing is to find a derivation that generates the string. In compiling, parsing produces information that directs the compiler to generate code.
Exercises Produce the tree(s) for x x + (v*w) x + y * z (x*y)+(v*w) When are trees the same and when are they different? ambiguity is when the trees are really different, not just expanded in a different order. This will be made formal next.
Left most derivation A derivation of a string w in a Grammar G is a leftmost derivation if at each step the leftmost remaining variable is the one replaced. A string is derived ambiguously in a CF grammar if it has two or more different leftmost derivations. A grammar G is ambiguous if it generates some string ambiguously.
Compare for ambiguity Variables E, T (for term), F (for factor), alphabet {a, +, *, (, ) } Rules: E E+T | T T T * F | F F (E) | a Variables E, alphabet { a, +, *, (, ) } Rules: E E+E | E*E | (E) | a Try each on the strings: a+a*a a+(a*a) (a+a)*a a+a+a+a
Regular languages All regular languages are context free languages! Proof: Consider the FSM that recognizes a language. Define the following Context-free grammar: alphabet for the FSM is the terminal alphabet let each state of the FSM be a variable. Let the initial state be the initial variable. Rules are: if there is an arrow from state V to state W labeled with letter a, then add the production rule: V aW If state X is an accepting state, add the rule X ∊ So…strings generated by the grammar are the strings in the language.
CF languages Each regular language is CF, but not vice versa… Recall B = {0i1i | i>=0}. B is strings with the a set number of 0s followed by the same number of 1s. This was shown to be non-regular. Let grammar be S 0S1 | ∊
Chomsky normal form A CF grammar is in Chomsky normal form if each rule is of the form A BC A a where A, B, and C are variables and a is any terminal and B and C are not the start variable S. It is permitted (but not required) to have the rule S ∊ but no other variable can produce the empty string. There are several other normal forms.
Outline of proof for Chomsky NF Any context free language can be generated by a grammar in Chomsky normal form. Create a new start variable to prevent the start variable being on the right Eliminate A ∊ rules. If there is a rule R uAv, add rule R uv. If R uAvAw, add R uvAw | uAvw | uvw Remove unit rules A B. If B u, then add A u (unless previously removed) If A u1u2…uk and k>=3, add new variable Ai and replace with A u1A1, A1 u2A2, etc. If A u1u2, replace with A U1U2 and U1 u1 and U2 u2 Read on-line, Sipser text on reserve, videos, etc. for complete proof.
Example: B = {0i1i | i>=0} Convert S 0S1 | ∊ to CNF add new start and new rule: S0 S remove S ∊ and add S 01 | 0S1 and S0 ∊ replace unit rule (S0 S): S0 01 | 0S1| ∊ and S 01 | 0S1 address other problems by creating new variables S0 A0A1 | A0A3 | ∊ S A0A1 | A0A3 A0 0 A1 1 A3 SA1 Does this work (produce strings in the pattern)? Claim: yes, because notice that an A3 only arises if there was an A0 before it.
Intuition…. Context free grammars appear to be able to keep track of things…. Even the leftmost derivation rule still has something like recursion.
Preview Will define push-down automata, a type of machine equivalent to context-free grammars for defining languages Pumping lemma
Classwork/Homework Create a grammar for a simple programming language: assignment statements if statements function calls expressions can include function calls as well as operators and parentheses terminals are names and numbers (lexical units) plus operators (+ and *) and parentheses, brackets, and ;