Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.

Similar presentations


Presentation on theme: "CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free."— Presentation transcript:

1 CSCI 3130: Automata theory and formal languages Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 The Chinese University of Hong Kong Context-free languages Fall 2010

2 Context-free grammar A → 0 A 1 A → B B → # A, B are variables A  0A1 0A1  00 A 11  000 A 111  000 B 111  000#111 0, 1, # are terminals A is the start variable this is a derivation

3 Context-free grammar A context-free grammar (CFG) is (V, , R, S) where –V is a finite set of variables or non-terminals –  is a finite set of terminals ( V  =  ) –R is a set of productions or substitution rules of the form where A is a variable V and  is a string of variables and terminals –S is a variable called the start variable A → 

4 The grammar of English a girl with a flower likes the boy ARTNOUNPREPARTNOUNVERBARTNOUN SENTENCE VERB-PHRASE NOUN-PHRASE CMPLX-VERB PREP-PHRASE NOUN-PHRASE CMPLX-NOUN

5 The grammar of English SENTENCE → NOUN-PHRASE VERB-PHRASE NOUN-PHRASE → CMPLX-NOUN NOUN-PHRASE → CMPLX-NOUN PREP-PHRASE VERB-PHRASE → CMPLX-VERB VERB-PHRASE → CMPLX-VERB PREP-PHRASE PREP-PHRASE → PREP CMPLX-NOUN CMPLX-NOUN → ARTICLE NOUN CMPLX-VERB → VERB NOUN-PHRASE CMPLX-VERB → VERB ARTICLE → a ARTICLE → the NOUN → boy NOUN → girl NOUN → flower VERB → likes VERB → touches VERB → sees PREP → with variables: SENTENCE, NOUN-PHRASE, … terminals: a, the, boy, girl, flower, likes, touches, sees, with start variable: SENTENCE This grammar describes (a part of) English

6 Derivations in English SENTENCE → NOUN-PHRASE VERB-PHRASE NOUN-PHRASE → CMPLX-NOUN NOUN-PHRASE → CMPLX-NOUN PREP-PHRASE VERB-PHRASE → CMPLX-VERB VERB-PHRASE → CMPLX-VERB PREP-PHRASE PREP-PHRASE → PREP CMPLX-NOUN CMPLX-NOUN → ARTICLE NOUN CMPLX-VERB → VERB NOUN-PHRASE CMPLX-VERB → VERB ARTICLE → a ARTICLE → the NOUN → boy NOUN → girl NOUN → flower VERB → likes VERB → touches VERB → sees PREP → with  NOUN-PHRASE VERB-PHRASE (1)  CPLX-NOUN VERB-PHRASE(2) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) SENTENCE  ARTICLE NOUN VERB-PHRASE(7)  a NOUN VERB-PHRASE(10)  a boy VERB-PHRASE(12)  a boy CPLX-VERB(4)  a boy VERB(9)  a boy sees(17)

7 Grammars for programming languages E  E + E E  E * E E  ( E ) E  0 E  1 … E  9 Variables: E Terminals: +*()0123456789  E * E  ( E ) * E E  ( E + E ) * E  (2 + E ) * E  (2 + 3) * E  (2 + 3) * 5 (2 + 3) * 5 meaning: “add 2 and 3, and then multiply by 5 ” bash-3.2$ python Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) >>> (2+3)*5 25

8 Notation and conventions E  E + E E  E * E E  ( E ) E  N E  E + E | E * E | ( E ) | N N  0 N | 1 N | 0 | 1 Variables: E, N Terminals: +, *, (, ), 0, 1 Start variable: E N  0 N N  1 N N  0 N  1 Variables in UPPERCASE Start variable comes first conventions : shorthand :

9 Derivation A derivation is a sequential application of productions: E derivation  E * E  ( E )* E  ( E )* N  ( E + E )* 1  ( E + N )* 1  ( N + N )* 1  ( N + 1 N )* 1  ( N + 10)* 1  (1 + 10)* 1   obtained from  in one production  *  obtained from  in zero or more productions E  E + E | E * E | ( E ) | N N  0 N | 1 N | 0 | 1 E  (1 + 10)* 1 *

10 Context-free languages The language of a CFG is the set of all strings of terminals that can be derived from the start variable L(G) = {w : w   * and S  w } * Questions we will ask: I give you a CFG, what is the language? I give you a language, write a CFG for it

11 Analysis example 1 Can you derive: A → 0 A 1 | B B → # 00#11 00#111 00##11 # A  0A1 0A1  00 A 11  00 B 11  00#11 A  B B  # # No, there is an uneven number of 0 s and 1 s No, there are too many # L(G) = { 0 n #1 n : n ≥ 0}

12 Analysis example 1 Can you derive: What is the language of this CFG? A → 0 A 1 | B B → # variables: A, B terminals: 0, 1, # start variable: A L = { 0 n #1 n : n ≥ 0} 00#11 00#111 00##11 #

13 Analysis example 2 Can you derive S  SS | ( S ) |  S  ( S ) (2)  () (3) S  ( S )  ( SS )  (( S ) S )  (( S )( S ))  (()( S ))  (()()) () (()())

14 Parse trees A parse tree gives a more compact representation: S  ( S )  ( SS )  (( S ) S )  (( S )( S ))  (()( S ))  (()()) (()()) S S  SS | ( S ) |  SS () S  () S  S ( )

15 Parse trees S  ( S )  ( SS )  (( S ) S )  (( S )( S ))  (()( S ))  (()()) S S S () S  S ( ) One parse tree can represent several derivations () S  S  ( S )  ( SS )  (( S ) S )  (() S )  (()( S ))  (()()) S  ( S )  ( SS )  ( S ( S ))  (( S )( S ))  (()( S ))  (()()) S  ( S )  ( SS )  ( S ( S ))  ( S ())  (( S )())  (()())

16 Analysis example 2 Can you derive S  SS | ( S ) |  (()() No, because there is an uneven number of ( and ) ())()) No, because there is a prefix with an excess of )

17 Analysis example 2 S  SS | ( S ) |  L(G) = {w: w has the same number of ( and ) no prefix of w has more ) than ( } ( ( ) ( ) ) ( ) Parsing rules: Divide w up in blocks with same number of ( and ) Each block is in L(G) Parse each block recursively S S S S S S S   S S 

18 Design example 1 L = {0 n 1 n | n  0} S  These strings have recursive structure: 000000111111 0000011111 00001111 000111 0011 01  0S1| 

19 Design example 2 L = numbers without leading zeros 0, 109, 2, 23 , 01, 003 allowednot allowed L → 1|2|3|4|5|6|7|8|9 S → 0|LN D → 0|L N → ND|  1052870032 any number N leading digit L

20 Design examples L = {0 n 1 n 0 m 1 m | n  0, m  0} These strings have two parts: L 1 = {0 n 1 n | n  0} L 2 = {0 m 1 m | m  0} L = L 1 L 2 rules for L 1 :S 1  0S 1 1|  L 2 is the same as L 1 S  S 1 S 1 S 1  0S 1 1 |  010011 000111 00110011

21 Design examples L = {0 n 1 m 0 m 1 n | n  0, m  0} These strings have nested structure: inner part: 1 m 0 m outer part: 0 n 1 n S  0S1|I I  1I0 |  011001 1100 0011 00110011

22 Design examples L = {x: x has two 0-blocks with same number of 0s} 01011, 001011001, 10010101001 01001000, 01111 allowednot allowed 10010011010010110 initial part middle partfinal part ABC A : , or ends in 1 C : , or begins with 1

23 Design examples 10010011010010110 ABC A : , or ends in 1 C : , or begins with 1 A →  | U1 U → 0U | 1U |  C →  | 1U D → 1U1 | 1 S → ABC B has recursive structure: 00110100 D same number of 0 s at least one 0 B → 0D0 | 0B0 U : any string D : begins and ends in 1

24 Context-free versus regular Write a CFG for the language (0 + 1)*111 Can you do so for every regular language? S  U111 U  0U | 1U |  Every regular language is context-free regular expression DFANFA

25 From regular to context-free regular expression   a (alphabet symbol) E 1 + E 2 CFG E1E2E1E2 E1*E1* grammar with no rules S  →  S →  a S  → S 1 | S 2 S  → S 1 S 2 S  → SS 1 |  In all cases, S becomes the new start symbol

26 Context-free versus regular Is every context-free language regular? S → 0S1 |  L = {0 n 1 n : n ≥ 0} Is context-free but not regular regularcontext-free


Download ppt "CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free."

Similar presentations


Ads by Google