Enter Chomsky Grammars
2 What has Chomsky* to do with computing? Linguistics and computing intersect at various places: Things that are used to create computer software—programming languages, compilers, text editors, etc.—all of these have elements of linguistics. More importantly, any computing problem can be seen as a “language recognition” problem! Even problems that seem remotely connected to language recognition such as adding two numbers, for instance, can be seen as language recognition problems: Consider L = { a m b n c x | there is a c for each and every occurrence of an a and a b }. In fact, recognizing L amounts to adding two numbers: a ’s represent the first number, b ’s denote the second number and c ’s, the sum of the two (in unary form). Thus, the theoretician rightly views the whole business of computing simply as “language recognition”. And, describing languages in precise ways is definitely an issue … * Noam Chomsky is, for linguists, what Einstein is for physicists. Chomsky, a Professor of Linguistics at MIT, is also a political analyst well-known for his criticisms of the US foreign policy.
3 What has Chomsky to do with computing? What Chomsky set out to do: “Assuming the set of grammatical sentences of English to be given, we now ask what sort of device can produce this set …..” (N. Chomsky, Syntactic Structures, 1957, proposition 3.1.) Chomsky “invented” a powerful theoretical tool for generating strings/patterns of a given type (a language) known as grammars and launched the new field, Formal Languages.
4 Grammars are “string generators” 0123 a abb Language recognizers (e.g. finite state automata) a abb Language generators (e.g. grammars)
5 God loves chicken and even numbers. He decides to create a world with just chicken---even number of them, and allow them to “multiply”. (At no point of time, he’d allow odd number of chicken in the world.) God is intelligent (!) and he does the following trick: A fairy tale Creates the first egg. Creates the following rule: From each egg can come out two little eggs or two chicken. (Once again, from each little egg can come out two tiny eggs or two chicken and so on.)
6 Each egg (shell) can break and let out either two chicken or two little shells that can in turn let out two chicken or two tiny little shells that can … One possible “chicken world” or The Rule Using this rule, one generates any even number of chicken. Shell Shell Shell | a a Grammar Rule that mimics bursting shells: Generates language L = {strings of a ’s of even length (greater than 0) } A fairy tale Grammar rules operate in a way similar to the way “God’s rule” works in the fairy tale.
7 The Two Commandments 1. Thou shalt not produce any string that is “outside the set” (strings that don’t belong to L). 2. Thou shalt produce ALL strings that are “inside the set” (strings that belong to L). Anyone who wishes to write grammar rules to generate a language L has to follow these “commandments”:
8 An example S S S Let Σ = { a, b } L = { w є Σ* | w is of even length } Did you notice? … that Finite State Automata use loops/cycles to generate patterns repetitively and grammars use recursion for the same purpose. production rules (substitution rules) Example: Let’s see how the string abbaba (which is of even length) can be generated / derived from this grammar. S S S S S S ab S S abbaS abbaba Derivation: 1 S S S ab S ab S S abbaS abbaba Derivation: 2 A grammar that allows two different (left-most*) derivations for the same string (as above) is not considered “good” in general. But, this example is used just to show how grammar works. Also, one can easily modify this grammar to avoid such multiple derivations. *Left-most capital-letter-symbol is expanded first always. S aa | ab | ba | bb S ε
9 Grammar: Definition A grammar G is a 4-tuple ( V, Σ, R, S ) finite set of non_terminals (capital letters) finite set of terminals (small letters) finite set of rules start symbol a special non-terminal V and Σ are disjoint All grammar rules that we develop (in this course) will have only one non-terminal symbol on the left side of “ ”. A grammar with such a restriction is known as “context-free grammar”.
10 Exercises (see next page for answers to some of the questions) (one or more a ’s) (one or more a ’s OR one or more b ’s) (zero or more a ’s followed by one or more b ’s) (any combination of a ’s and b ’s) (zero or more a ’s and b ’s followed by abb ) ( a ’s and b ’s of length = 2) ( a ’s and b ’s of length ≠ 2) L 8 = { w є { a,b } * : | w | is even } ( a ’s and b ’s of even length) L 1 = { a n | n >= 1} L 2 = { a n | n >= 1} U { b n | n >= 1} L 3 = { a m b n | m >= 0, n >= 1} L 4 = { a,b }* - { ε } L 5 = { x abb | x є { a,b }*} L 6 = { w є { a,b } * : | w | = 2 } L 7 = { w є { a,b } * : | w | ≠ 2 } L 9 = { w є { a,b } * : | w | is odd } ( a ’s and b ’s of odd length) L 10 = { w є { a,b } * : w doesn’t have two consecutive b ’s} L 11 = { ww R | w є { a,b } * } (any string followed by its reverse) L 12 = { w є { a,b } * | w is a palindrome} L 13 = { w є { [, ] } * | w has balanced parentheses} L 15 = { a n b n | n > 0 } L 14 = { w є { a,b } * | w contains bbb } L 16 = { a n b m c m d n | n, m > 0 } L 17 = { a m b n | m, n > 0, m < n } L 18 = { a m b n | m, n > 0, m ≠ n }
11 Examples (one or more a ’s) S aS | a (one or more a ’s OR one or more b ’s) (zero or more a ’s followed by one or more b ’s) S A B A aA | ε B bB | b (any combination of a ’s and b ’s) S aS | bS | a | b S A | B A aA | a B bB | b (zero or more a ’s and b ’s followed by abb ) Please work it out yourself! ( a ’s and b ’s of length = 3) S A A A A a | b ( a ’s and b ’s of length ≠ 2) S ε | a | b | aa A | ab A | bb A | bb A A aA | bA | a | b L 8 = { w є { a,b } * : | w | is even } ( a ’s and b ’s of even length) S aaS | bbS | abS | baS | ε L 1 = { a n | n >= 1} L 2 = { a n | n >= 1} U { b n | n >= 1} L 3 = { a m b n | m >= 0, n >= 1} L 4 = { a,b }* - { ε } L 5 = { x abb | x є { a,b }*} L 6 = { w є { a,b } * : | w | = 3 } L 7 = { w є { a,b } * : | w | ≠ 2 }
12 L 9 = { w є { a,b } * : | w | is odd } ( a ’s and b ’s of odd length) More examples L 10 = { w є { a,b } * : w doesn’t have two consecutive b ’s} S baS ……. (Work out the remaining rules for this language yourself!) Please work it out yourself! L 14 = { w є { a,b } * | w contains bbb } S L bbb L L aL | bL | ε Some of the remaining (harder) problems will be solved in class.