Download presentation
1
Formal Languages and Grammars
Xiaoyin Wang CS 5363 Spring 2016
2
Last Class Compilers: Introduction Why Compilers? Input and Output
Structure of Compilers Compiler Design
3
Today’s Class Formal Languages 3 What are languages?
What are grammars? Phrase-Structure Grammars Classification of Grammars (and corresponding languages) 3
4
Intro to Languages English grammar tells us if a given combination of words is a valid sentence. The syntax of a sentence concerns its form while the semantics concerns its meaning, e.g. the mouse wrote a poem From a syntax point of view this is a valid sentence. 4 4
5
Intro to Languages From a semantics point of view not so fast…perhaps in Disney land Natural languages (English, French, etc) have very complex rules of syntax and not necessarily well-defined. Long time no see. Here you are. 5 5
6
Formal Language Formal language The key problem: well-defined
finite set of rules The key problem: How to express infinite sentences with finite set of rules? The answer is recursion Example: Natural Numbers {1, 2, 3, …} 6
7
Grammars A formal grammar G is compact, precise mathematical definition of a language L. Not a list of examples Finite Fixed Typically use recursion to resolve infinity
8
Grammars A grammar implies an algorithm that would generate all legal sentences of the language. Often, it takes the form of a set of recursive definitions. An example of recursive definition: <expression> -> <expression> + 1 <expression> -> 1 {1, 1+1, 1+1+1, , …}
9
What is Grammar? Answer two key questions:
Is a combination of words a valid sentence in a formal language? How can we generate the valid sentences of a formal language? Grammars provide models for both natural languages and programming languages. 9
10
Grammars Example: A grammar that generates a subset of the English language
12
A derivation of “the boy sleeps”:
13
A derivation of “a dog runs”:
14
Language of the grammar
L = { “a boy runs”, “a boy sleeps”, “the boy runs”, “the boy sleeps”, “a dog runs”, “a dog sleeps”, “the dog runs”, “the dog sleeps” } Problems with the Grammar?
15
Notation Variable Terminal or Production Symbols of Non-terminal rule
the vocabulary Terminal Symbols of the vocabulary Production rule
16
BCT2083 DISCRETE STRUCTURE & APPLICATIONS
Basic Terminology A vocabulary / alphabet, V is a finite non-empty set of elements called symbols. Example: V = {a, b, c, A, B, C, S} A word/sentence over V is a string of finite length of elements of V. Example: Aba The empty/null string, ε is the string with no symbols. CHAPTER 5 16
17
Basic Terminology V* is the set of all words over V.
BCT2083 DISCRETE STRUCTURE & APPLICATIONS Basic Terminology V* is the set of all words over V. Example: V* = {Aba, BBa, bAA, cab …} A language over V is a subset of V*. We can give some criteria for a word to be in a language. CHAPTER 5 17
18
Phrase-Structure Grammars
A popular way to specify a grammar recursively A phrase-structure grammar (abbr. PSG) G = (V,T,S,P) is a 4-tuple, in which: V is a vocabulary (set of symbols) The “template vocabulary” of the language. T V is a set of symbols called terminals Actual symbols of the language. Also, N :≡ V − T is a set of special “symbols” called non-terminals. (Representing concepts like “noun”)
19
Phrase-Structure Grammars
SN is a special non-terminal, the start symbol. in our example the start symbol was “sentence”. P is a set of productions (to be defined). Rules for substituting one sentence fragment for another Every production rule must contain at least one non-terminal on its left side.
20
Productions A production pP is a pair p=(b,a) of sentence fragments a, b We often denote the production as b → a. Read “replace b by a” Call b the “before” string, a the “after” string. b must contain at least 1 non-terminal
21
Productions Examples S -> b S -> ε A -> B S -> AB
S -> aSb AS -> BB aSb -> AB AaBb -> CD …
22
Derivation Let G=(V,T,S,P) be a phrase-structure grammar
Let w0=lz0r (the concatenation of l, z0, and r) and w1=lz1r be strings over V If z0 z1 is a production of G we say that w1 is directly derivable from w0 and we write wo => w1
23
Derivation If w0, w1, …., wn are strings over V such that w0 =>w1,w1=>w2,…, wn-1 => wn, then we say that wn is derivable from w0, and write w0=>*wn. The sequence of steps used to obtain wn from wo is called a derivation.
24
Examples Examples Productions: S -> A, A -> aB, B -> b
Derivations: S => A => aB => ab S =>* ab
25
Languages from PSGs The recursive definition of the language L defined by the PSG: G = (V, T, S, P): Rule 1: S LT (LT is L’s template language) The start symbol is a sentence template (member of LT).
26
Languages from PSGs Rule 2: (b→a)P: l,rV*: lbr LT → lar LT
Abbreviate this using lbr lar. (read, “lar is directly derivable from lbr”). Rule 2: (b→a)P: l,rV*: lbr LT → lar LT Any production, after substituting in any fragment of any sentence template, yields another sentence template. Rule 3: (σ LT: ¬nN: nσ) → σL All sentence templates that contain no non- terminal symbols are sentences in L.
27
Example Let G = (V, T, S, P) Process: V = {a, b, A, S} T = {a, b}
BCT2083 DISCRETE STRUCTURE & APPLICATIONS Example Let G = (V, T, S, P) V = {a, b, A, S} T = {a, b} S is a start symbol P = {S → aA, A → aa, A → b} Process: L(G)T = {S} L(G)T = {S, aA} L(G)T = {S, aA, aaa, ab} L(G) = {aaa, ab} CHAPTER 5 27
28
Language Let G(V,T,S,P) be a phrase-structure grammar. The language generated by G (or the language of G) denoted by L(G) , is the set of all strings of terminals that are derivable from the starting state S. L(G)= {w T* | S =>*w} 28
29
What sentences can be generated with this grammar?
BCT2083 DISCRETE STRUCTURE & APPLICATIONS Languages of PSG EXAMPLE: Let G = (V, T, S, P), where V = {a, b, A, B, S} T = {a, b}, S is a start symbol P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}. G is a Phrase-Structure Grammar. What sentences can be generated with this grammar? CHAPTER 5 29
30
Languages of PSG Language of the grammar: How to prove?
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}. Language of the grammar: S => Ab => ab S => Ab => aSb => aAbb => aabb … S => aB => aSb => aaBb => aabb L = {anbn, n>=1} How to prove? Mathematical Induction
31
Languages of PSG P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}
L1 = {anbn, n>=1} (1) Proving L L(P)
32
Languages of PSG P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b} L1 = {anbn, n>=1} (2) Proving L(P) L1 Using derivation steps as N Within x+2 steps, enumerating all productions, we can only derive: awb, w belongs to L1, so awb also belongs to L1
33
Another PSG Example – English Fragment
We have G = (V, T, S, P), where: V = {(sentence), (noun phrase), (verb phrase), (article), (adjective), (noun), (verb), (adverb), a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly} T = {a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly} S = (sentence) P = (see next slide)
34
Productions for our Language
P = { (sentence) → (noun phrase) (verb phrase) (noun phrase) → (article) (adjective) (noun) (noun phrase) → (article) (noun) (verb phrase) → (verb) (adverb) (verb phrase) → (verb) (article) → a (article) → the (adjective) → large (adjective) → hungry (noun) → rabbit (noun) → mathematician (verb) → eats (verb) → hops (adverb) → quickly (adverb) → wildly }
35
A Sample Sentence Derivation
On each step, we apply a production to a fragment of the previous sentence template to get a new sentence template. Finally, we end up with a sequence of terminals (real words), that is, a sentence of our language L. (sentence) (noun phrase) (verb phrase) (article) (adj.) (noun) (verb phrase) (art.) (adj.) (noun) (verb) (adverb) the (adj.) (noun) (verb) (adverb) the large (noun) (verb) (adverb) the large rabbit (verb) (adverb) the large rabbit hops (adverb) the large rabbit hops quickly
36
Another Example V T Let G = ({a, b, A, B, S}, {a, b}, S,
{S → ABa, A → BB, B → ab, AB → b}). One possible derivation in this grammar is: S ABa Aaba BBaba Bababa abababa. P
37
Defining the PSG Types Type 0: Phase-structure grammars – no restrictions on the production rules Type 1: Context-Sensitive PSG: All after fragments are either longer than the corresponding before fragments, or empty: if b → a, then |b| < |a| a = ε .
38
Defining the PSG Types Type 2: Context-Free PSG: Type 3: Regular PSGs:
All before fragments have length 1 and are non-terminals: if b → a, then |b| = 1 (b N). Type 3: Regular PSGs: All before fragments have length 1 and nonterminals All after fragments are either single terminals, or a pair of a terminal followed by a nonterminal. if b → a, then a T a TN.
39
Types of Grammars - Chomsky hierarchy of languages
Venn Diagram of Grammar Types: Type 0 – Phrase-structure Grammars Type 1 – Context-Sensitive Type 2 – Context-Free Type 3 – Regular
40
Examples: Regular Grammars
Only the last symbol at the right hand side can be non-terminal E.g. P = {S → aB, B → bB, B → a} Generates the language ab*a The restriction can be either on the right or left E.g., P = {S → Ba, B → Bb, B → a} Also called linear grammars Why regular language is not suitable for programming language?
41
Examples: Context Free Grammars
Only one non-terminal on the left E.g. P = {S → aSa, S → b} Generates the language anban, n>=1 Widely used as programming languages Example of non-context-free language? {anbncn, n > 1}
42
Examples: Context Sensitive Grammars
Multiple non-terminal on the left P = {S → aSBc | abc, cB → Bc, bB → bb} Generates the language anbncn, n>=1 Exemplar Process: S => aSBc => aabcBc => aabBcc => aabbcc How to generate anbncndn, n>=1? P = {S → aBSCd | abcd, Ba → aB, dC →Cd, cC → cc, Bb → bb }
43
Examples: Context Sensitive Grammars
For the English Grammar An apple and A book <noun_phrase> → <article> <noun> <noun> → <vowel_noun> | <other_noun> <article> <vowel_noun> → an <vowel_noun> <article> <other_noun> → a <vowel_noun>
44
Unrestrictive grammars
No restrictions on the productions Belonging-ship can be undecidable Language acceptable by Turing machine Cannot be easily described in simple mathematical formula
45
Property of Grammars: Regular Language
Closure Properties Intersection Union Complement Concatenation Kleene Star Decidable Properties Membership Emptiness Containment
46
Property of Grammars: Context Free Language
Closure Properties Union Concatenation Kleene Star Decidable Properties Membership (cubic to sentence length) Emptiness (linear to # productions)
47
Property of Grammars: Context Sensitive Language
Closure Properties Union Intersection Complement Concatenation Kleene Star Decidable Properties Membership (P-Space complete)
48
Today’s Class Formal Languages 48 What are languages?
What are grammars? Phrase-Structure Grammars Classification of Grammars (and corresponding languages) 48
49
Next Class Context Free Grammar 49 Grammar Norms CYK Algorithm
Derivation Tree and Ambiguity 49
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.