CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

CFGs and PDAs Sipser 2 (pages ). Long long ago…
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
CFGs and PDAs Sipser 2 (pages ). Last time…
CS5371 Theory of Computation
Context-Free Grammars Lecture 7
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Chapter 3: Formal Translation Models
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Specifying Languages CS 480/680 – Comparative Languages.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
Prof. Busch - LSU1 Context-Free Languages. Prof. Busch - LSU2 Regular Languages Context-Free Languages.
Fall 2005Costas Busch - RPI1 Context-Free Languages.
S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Syntax Analysis The recognition problem: given a grammar G and a string w, is w  L(G)? The parsing problem: if G is a grammar and w  L(G), how can w.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Context-free Grammars [Section 2.1] - more powerful than regular languages - originally developed by linguists - important for compilation of programming.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
CS 3240: Languages and Computation Context-Free Languages.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
CSE 105 Theory of Computation Alexander Tsiatas Spring 2012 Theory of Computation Lecture Slides by Alexander Tsiatas is licensed under a Creative Commons.
Context Free Grammars 1. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Parse Trees Definitions Relationship to Left- and Rightmost Derivations Ambiguity in Grammars.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
CSCI 2670 Introduction to Theory of Computing September 14, 2005.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Pushdown.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CSCI 2670 Introduction to Theory of Computing September 16, 2004.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Context-Free Grammars: an overview
Formal Language & Automata Theory
CS510 Compiler Lecture 4.
Fall Compiler Principles Context-free Grammars Refresher
Ambiguity Parsing algorithms
PARSE TREES.
Context-Free Languages
CSE 311: Foundations of Computing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Fall Compiler Principles Context-free Grammars Refresher
COSC 3340: Introduction to Theory of Computation
Context-Free Grammars
Presentation transcript:

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free languages Fall 2008

Context-free grammar This is an a different model for describing languages The language is specified by productions (substitution rules) that tell how strings can be obtained, e.g. Using these rules, we can derive strings like this: A → 0A1 A → B B → # A, B are variables 0, 1, # are terminals A is the start variable A  0A1  00A11  000A111  000B111  000#111

Some natural examples Context-free grammars were first used for natural languages a girl with a flower likes the boy SENTENCE NOUN-PHRASE VERB-PHRASE ARTNOUNPREPARTNOUNVERBARTNOUN CMPLX-NOUN PREP-PHRASE CMPLX-VERB NOUN-PHRASE

Natural languages We can describe (some fragments) of the English language by a context-free grammar: SENTENCE → NOUN-PHRASE VERB-PHRASE NOUN-PHRASE → CMPLX-NOUN NOUN-PHRASE → CMPLX-NOUN PREP-PHRASE VERB-PHRASE → CMPLX-VERB VERB-PHRASE → CMPLX-VERB PREP-PHRASE PREP-PHRASE → PREP CMPLX-NOUN CMPLX-NOUN → ARTICLE NOUN CMPLX-VERB → VERB NOUN-PHRASE CMPLX-VERB → VERB ARTICLE → a ARTICLE → the NOUN → boy NOUN → girl NOUN → flower VERB → likes VERB → touches VERB → sees PREP → with variables: SENTENCE, NOUN-PHRASE, … terminals: a, the, boy, girl, flower, likes, touches, sees, with start variable: SENTENCE

Programming languages Context-free grammars are also used to describe (parts of) programming languages For instance, expressions like (2 + 3) * 5 or * 7 can be described by the CFG  +  *  ( )  0  1 …  9 Variables: Terminals: +, *, (, ), 0, 1, …, 9

Motivation for studying CFGs Context-free grammars are essential for understanding the meaning of computer programs They are used in compilers code: (2 + 3) * 5 meaning: “add 2 and 3, and then multiply by 5 ”

Definition of context-free grammar A context-free grammar (CFG) is a 4-tuple (V, T, P, S) where –V is a finite set of variables or non-terminals –T is a finite set of terminals ( V  T =  ) –P is a set of productions or substitution rules of the form where A is a symbol in V and  is a string over V  T –S is a variable in V called the start variable A → 

Shorthand notation for productions When we have multiple productions with the same variable on the left like we can write this in shorthand as E  E + E E  E * E E  (E) E  N E  E + E | E * E | (E) | 0 | 1 N  0N | 1N | 0 | 1 Variables: E, N Terminals: +, *, (, ), 0, 1 Start variable: E N  0N N  1N N  0 N  1

Derivation A derivation is a sequential application of productions: E derivation  E * E  (E) * E  (E) * N  (E + E ) * N  (E + E ) * 1  (E + N) * 1  (N + N) * 1  (N + 1N) * 1  (N + 10) * 1  (1 + 10) * 1  means  can be obtained from  with one production  * means  can be obtained from  after zero or more productions

Language of a CFG The language of a CFG (V, T, P, S) is the set of all strings containing only terminals that can be derived from the start variable S This is a language over the alphabet T A language L is context-free if it is the language of some CFG L = {  |   T* and S   } *

Example 1 Is the string 00#11 in L ? How about 00#111, 00#0#1#11 ? What is the language of this CFG? A → 0A1 | B B → # variables: A, B terminals: 0, 1, # start variable: A L = {0 n #1 n : n ≥ 0}

Example 2 Give derivations of (), (()()) How about ()) ? S  SS | (S) |  convention: variables in uppercase, terminals in lowercase, start variable first S  (S)(rule 2)  ()(rule 3) S  (S)(rule 2)  (SS)(rule 1)  ((S)S)(rule 2)  ((S)(S))(rule 2)  (()(S))(rule 3)  (()())(rule 3)

Examples: Designing CFGs Write a CFG for the following languages –Linear equations over x, y, z, like: x + 5y – z = 9 11x – y = 2 –Numbers without leading zeros, e.g., 109, 0 but not 019 –The language L = {a n b n c m d m | n  0, m  0} –The language L = {a n b m c m d n | n  0, m  0}

Context-free versus regular Write a CFG for the language (0 + 1)*111 Can you do so for every regular language? Proof: S  A111 A   | 0A | 1A Every regular language is context-free regular expression DFANFA

From regular to context-free regular expression   a (alphabet symbol) E 1 + E 2 CFG E1E2E1E2 E1*E1* grammar with no rules S  →  S →  a S  → S 1 | S 2 S  → S 1 S 2 S  → SS 1 |  In all cases, S becomes the new start symbol

Context-free versus regular Is every context-free language regular? No! We already saw some examples: This language is context-free but not regular A → 0A1 | B B → # L = {0 n #1 n : n ≥ 0}

Parse tree Derivations can also be represented using parse trees E  E + E  V + E  x + E  x + (E)  x + (E  E)  x + (V  E)  x + (y  E)  x + (y  V)  x + (y  z) E EE+ V(E) E  E VV x yz E  E + E | E - E | (E) | V V  x | y | z

Definition of parse tree A parse tree for a CFG G is an ordered tree with labels on the nodes such that –Every internal node is labeled by a variable –Every leaf is labeled by a terminal or  –Leaves labeled by  have no siblings –If a node is labeled A and has children A 1, …, A k from left to right, then the rule is a production in G. A → A 1 …A k

Left derivation Always derive the leftmost variable first: Corresponds to a left-to-right traversal of parse tree E  E + E  V + E  x + E  x + (E)  x + (E  E)  x + (V  E)  x + (y  E)  x + (y  V)  x + (y  z) E EE+ V(E) E  E VV x yz

Ambiguity A grammar is ambiguous if some strings have more than one parse tree Example: E  E + E | E  E | (E) | V V  x | y | z x + y + z E EE+ EE+V VVx yz E EE+ EE+V VV xy z

Why ambiguity matters The parse tree represents the intended meaning: x + y + z E EE+ EE+V VVx yz E EE+ EE+V VV xy z “first add y and z, and then add this to x ” “first add x and y, and then add z to this”

Why ambiguity matters Suppose we also had multiplication: E  E + E | E  E | E  E | (E) | V V  x | y | z x  y + z E EE* EE+V VVx yz E EE+ EE  V VV xy z “first x  y, then + z ”“first y + z, then x  ” 

Disambiguation Sometimes we can rewrite the grammar to remove the ambiguity Rewrite grammar so  cannot be broken by + : E  E + E | E  E | E  E | (E) | V V  x | y | z E  T | E + T | E  T T  F | T  F F  (E) | V V  x | y | z T stands for term: x * (y + z) F stands for factor: x, (y + z) A term always splits into factors A factor is either a variable or a parenthesized expression

Disambiguation Example x  y + z E  T | E + T | E  T T  F | T  F F  (E) | V V  x | y | z V VV F T F T E E T F

Disambiguation Can we always disambiguate a grammar? No, for two reasons –There exists an inherently ambiguous context-free L : Every CFG for this language is ambiguous –There is no general procedure that can tell if a grammar is ambiguous However, grammars used in programming languages can typically be disambiguated

Another Example Is ab, baba, abbbaa in L ? How about a, bba ? What is the language of this CFG? Is the CFG ambiguous? S  aB | bA A  a | aS | bAA B  b | bS | aBB