Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of formalizing the structure.

Slides:



Advertisements
Similar presentations
Formal Language, chapter 10, slide 1Copyright © 2007 by Adam Webber Chapter Ten: Grammars.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
CS5371 Theory of Computation
1 Module 28 Context Free Grammars –Definition of a grammar G –Deriving strings and defining L(G) Context-Free Language definition.
1 Lecture 29 Context Free Grammars –Examples of “real-life” grammars –Definition of a grammar G –Deriving strings and defining L(G) Context-Free Language.
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 4 Updated by Marek Perkowski.
1 Context Free Languages October 2, Announcement HW 3 due now.
Normal forms for Context-Free Grammars
Transparency No. P2C1-1 Formal Language and Automata Theory Part II Pushdown Automata and Context-Free Languages.
Chapter 3: Formal Translation Models
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Languages & Strings String Operations Language Definitions.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of CFL was to formalize.
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Grammars CPSC 5135.
Discrete Structure Li Tak Sing( 李德成 ) Lectures
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Formal Languages and Grammars
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Context Free Grammars & Parsing CPSC 388 Fall 2001 Ellen Walker Hiram College.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Chapter 2. Formal Languages Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Chapter 1 INTRODUCTION TO THE THEORY OF COMPUTATION.
PROGRAMMING LANGUAGES
Context-Free Languages & Grammars (CFLs & CFGs) (part 2)
Describing Syntax and Semantics
Grammars.
Context-Free Languages & Grammars (CFLs & CFGs)
System Software Unit-1 (Language Processors) A TOY Compiler
Context-Free Grammars: an overview
Formal Language & Automata Theory
CS 404 Introduction to Compiler Design
Chapter 3 Context-Free Grammar and Parsing
Complexity and Computability Theory I
Natural Language Processing - Formal Language -
Context free grammar.
FORMAL LANGUAGES AND AUTOMATA THEORY
PARSE TREES.
Context-Free Languages
Context-Free Grammars and Languages
Context-Free Grammars
CS21 Decidability and Tractability
CHAPTER 2 Context-Free Languages
CSE 311: Foundations of Computing
CSE 105 theory of computation
CSC 4181Compiler Construction Context-Free Grammars
Context-Free Grammars 1
Finite Automata and Formal Languages
Context Free Expressions
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSC 4181 Compiler Construction Context-Free Grammars
Theory of Computation Lecture #
CSE 105 theory of computation
Regular Grammars.
COSC 3340: Introduction to Theory of Computation
CSE 105 theory of computation
Presentation transcript:

Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of formalizing the structure of natural languages is still elusive, but CFGs are now the universally accepted formalism for definition of (the syntax of) programming languages. Writing parsers has become an almost fully automated process thanks to this theory.

A Simple Grammar for English Example taken from Floyd & Beigel. <Sentence> ® <Subject> <Predicate> <Subject> <Pronoun1> | <Pronoun2> <Pronoun1> I | we | you | he | she | it | they <Noun Phrase> <Simple Noun Phrase> | <Article> <Noun Phrase> <Article> a | an | the <Predicate> <Noun> | <Adjective> <Simple Noun Phrase> <Simple Noun Phrase> <Verb> | <Verb> <Object> <Object> <Pronoun2> | <Noun Phrase> <Pronoun2> me | us | you | him | her | it | them <Noun> . . . <Verb>

Example Derive the sentence “She drives a shiny black car” from these rules. sentence subject predicate pronoun1 She verb object drives Noun phrase adjective Simple Noun phrase Simple noun phrase article a shiny black car Noun

<sentence> Þ <subject> <predicate> Þ <pronoun> <predicate> Þ She <predicate> Þ She <verb> <object> Þ She drives <object> Þ She drives <simple noun phrase> Þ She drives <article> <noun phrase> Þ She drives a <noun phrase> Þ She drives a <adjective> <noun phrase> Þ She drives a shiny <noun phrase> Þ She drives a shiny <adjective> <simple noun phrase> Þ She drives a shiny black <simple noun phrase> Þ She drives a shiny black <noun> Þ She drives a shiny black car

A Grammar for Expressions ® <Term> | <Expression> + <Term> <Term> <Factor> | <Term> * <Factor> <Factor> <Identifer> | ( <Expression> ) <Identifier> x | y | z | … In class exercise: Derive x + ( y * 3) x + z * w + q

Definition of Context-Free-Grammars A CFG is a quadruple G= (V,T,P,S), where V is a finite set of variables (nonterminals, syntactic categories) T is a finite set of terminals P is a finite set of productions -- rules of the form X¾®a, where XÎV and aÎ(V È T)* S, the start symbol, is an element of V   Vertical bar (|), as used in the examples on the previous slide, is used to denote a set of several productions (with the same lhs).

Example <Expression> ® <Term> | <Expression> + <Term> <Term> <Factor> | <Term> * <Factor> <Factor> <Identifer> | ( <Expression> ) <Identifier> x | y | z | … V = {<Expression>, <Term>, <Factor>, <Identifier>} T = {+, *, (, ), x, y, z, …} P = { <Expression> ® <Term> <Expression> ® <Expression> + <Term> <Term> ® <Factor> <Term> ® <Term> * <Factor> <Factor> ® <Identifer> <Factor> ® ( <Expression> ) <Identifier> ® x <Identifier> ® y <Identifier> ® z <Identifier> ® … } S = <Expression>

Notational Conventions a,b,c, … (lower case, beginning of alphabet) are concrete terminals; u,v,w,x,y,z (lower case, end of alphabet) are for strings of terminals a,b,g, … (Greek letters) are for strings over (T È V) (sentential forms) A,B,C, … (capitals, beginning of alphabet) are for variables (for non-terminals). X,Y,Z are for variables standing for terminals.

Short-hand Note. We often abbreviate a context free grammar, such as: G2 = ( V={S}, T={(,)}, P={ S ® e, S ® SS, S ® (S) }, S=S} By giving just its productions S ® e | SS |(S) And by using the following conventions. The start symbol is the lhs of the first production. Multiple production for the same lhs non-terminal can be grouped together by using vertical bar ( | ) Non-terminals are capitalized. Terminal-symbols are lower case or non-alphabetic.

Derivations The single-step derivation relation Þ on (VÈ T)* is defined by: a Þ b iff b is obtained from a by replacing an occurrence of the lhs of a production with its rhs. That is, a'Aa'' Þ a'ga'' is true iff A ® g is a production. We write a Þ* b when b can be obtained from a through a sequence of several (possibly zero) derivation steps. The language of the CFG , G, is the set L(G) = {wÎT* | S Þ* w} (where S is the start symbol of G)   Context-free languages are languages of the form L(G)

Example 1 The familiar non-regular language L = { akbk | k ³ 0 } is context-free. The grammar G1 for it is given by T={a,b}, V={S}, and productions: S ® L S ® a S b   Here is a derivation showing a3b3Î L(G): S Þ2 aSb Þ2 aaSbb Þ2 aaaSbbb Þ1 aaabbb (Note: we sometimes label the arrow with a subscript which tells the production used to enable the transformation)

Example 1 continued Note, however, that the fact L=L(G1) is not totally obvious. We need to prove set inclusion both ways. To prove L Í L(G1) we must show that there exists a derivation for every string akbk; this is done by induction on k. For the converse, L(G1) Í L, we need to show that if S Þ* w and wÎT*, then wÎ L. This is done by induction on the length of derivation of w.

Example 2 The language of balanced parentheses is context-free. It is generated by the following grammar: G2 = ( V={S}, T={(,)}, P={ S ® L | SS |(S)}, S=S}  

Example 3 Consider the grammar: S ® AS | L A ® 0A1 | A1 | 01 The derivation: SÞ AS Þ A1S Þ 011S Þ 011AS Þ 0110A1S Þ 0110011S Þ 0110011 shows that 0110011 Î L(G3).

Example 3 notes The language L(G3) consists of strings wÎ{0,1}* such that: P(w): Either w=e, or w begins with 0, and every block of 0's in w is followed by at least as many 1's Again, the proof that G3 generates all and only strings that satisfy P(w) is not obvious. It requires a two-part inductive proof.

Leftmost and Rightmost Derivations The same string w usually has many possible derivations S  a0Þa1Þa2Þ … Þ an  w We call a derivation leftmost if in every step aiÞai+1, it is the first (leftmost) variable in ai$ that is being replaced with the rhs of a production. Similarly, in a rightmost derivation, it is always the last variable that gets replaced.   The above derivation of the string 0110011 in the grammar G3 is leftmost. Here is a rightmost derivation of the same string: S Þ AS Þ AAS Þ AA Þ A0A1 Þ A0011 Þ A10011 Þ 0110011 S ® AS | e A ® 0A1 | A1 | 01

Facts Every Regular Language is also a Context Free Language How might we prove this? Choose one of the many specifications for regular languages Show that every instance of that kind of specification has a total mapping into a Context Free Grammar What is an appropriate choice?

In Class Exercise Map the Regular Expressions into a Context Free language. data RegExp a = Lambda -- the empty string "" | Empty -- the empty set | One a -- a singleton set {a} | Union (RegExp a) (RegExp a) -- union of two RegExp | Cat (RegExp a) (RegExp a) -- Concatenation | Star (RegExp a) -- Kleene closure

Find CFG for these languages {an b an | n Є Nat} { w | w Є {a,b}*, and w is a palindrome of even length} {an bk | n,k Є Nat, n ≤ k} {an bk | n,k Є Nat, n ≥ k} { w | w Є {a,b}*, w has equal number of a’s and b’s }