A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ISBN Chapter 3 Describing Syntax and Semantics.
CS5371 Theory of Computation
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Context-free.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
PZ02A - Language translation
Context-Free Grammars Lecture 7
Discussion #31/20 Discussion #3 Grammar Formalization & Parse-Tree Construction.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 3: Formal Translation Models
COP4020 Programming Languages
Languages and Grammars MSU CSE 260. Outline Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar,
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
CMSC 330: Organization of Programming Languages
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
PART I: overview material
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
CMSC 330: Organization of Programming Languages
Chapter 3 Describing Syntax and Semantics
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
ISBN Chapter 3 Describing Syntax and Semantics.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Formal Languages and Grammars
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
CSE 311 Foundations of Computing I Lecture 18 Recursive Definitions: Context-Free Grammars and Languages Autumn 2011 CSE 3111.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
Automata and Languages What do these have in common?
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
CSE 311: Foundations of Computing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Presentation transcript:

A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase may also be composed of an adjective (ADJ) and a noun (N) A verb phrase may be composed of a verb (V) and a noun (N) or noun phrase (NP). CMSC 330 English Grammar

CMSC 330: Organization of Programming Languages Context-Free Grammars

CMSC 3303 Program structure Syntax Source code form “What a program looks like” In general, syntax is described using grammars. Semantics Execution behavior “What a program does”

CMSC 3304 Motivation Programs are just strings of text –But they’re strings that have a certain structure A C program is a list of declarations and definitions A function definition contains parameters and a body A function body is a sequence of statements A statement is either an expression, an if, a goto, etc. An expression may be assignment, addition, subtraction, etc We want to solve two problems –We want to describe programming languages precisely –We need to describe more than the regular languages Recall that regular expressions, DFAs, and NFAs are limited in their expressiveness

CMSC 3305 Context-Free Grammars (CFGs) A way of generating sets of strings or languages They subsume regular expressions (and DFAs and NFAs) –There is a CFG that generates any regular language –(But regular expressions are a better notation for languages which are regular.) They can be used to describe programming languages –They (mostly) describe the parsing process

CMSC 3306 Simple Example S  0 | 1 | 0S | 1S |  This is the same as the regular expression –(0|1)* But CFGs can do a lot more!

CMSC 3307 Formal Definition A context-free grammar G is a 4-tuple: –Σ – a finite set of terminal or alphabet symbols Often written in lowercase –V – a finite, nonempty set of nonterminal symbols Often written in uppercase Sometimes called variables No common elements; i.e., it must be that V ∩ Σ = ∅ –P – a set of productions of the form V → (Σ|V)* Informally this means that the nonterminal can be replaced by the string of zero or more terminals or non-terminals to the right of the → –S ∊ V – the start symbol

CMSC 3308 Notational Shortcuts If S is not specified, assume the left-hand side of the first listed production is the start symbol Productions with the same left-hand sides are usually combined with | If a production has an empty right-hand side it means ε

CMSC 3309 Informal Definition of Acceptance A string is accepted by a CFG if there is some path that can be followed starting at the start symbol which generates the string Example: S  0 | 1 | 0S | 1S |  0101: S  0S  01S  010S  0101

CMSC Example: Arithmetic Expressions (Limited) E → a | b | c | E+E | E-E | E*E | (E) –An expression E is either a letter a, b, or c –Or an E followed by + followed by an E –etc. This describes or generates a set of strings –{a, b, c, a+b, a+a, a*c, a-(b*a), c*(b + a), …} Example strings not in the language –d, c(a), a+, b**c, etc.

CMSC Formal Description of Example Formally, the grammar we just showed is –Σ = { +, -, *, (, ), a, b, c } –V = { E } –P = { E → a, E → b, E → c, E → E-E, E → E+E, E → E*E, E → (E)} –S = E

CMSC Uniqueness of Grammars Grammars are not unique. Different grammars can generate the same set of strings. The following grammar generates the same set of strings as the previous grammar: E → E+T | E-T | T T → T*P | P P → (E) | a | b | c

CMSC Another Example Grammar S → aS | T T → bT | U U → cU | ε What are some strings in the language?

CMSC Practice Try to make a grammar which accepts… 0*|1* a n b n –Remember, we couldn't do this with a regex! Give some example strings from this language: S  0 | 1S What language is it?

CMSC Backus-Naur Form Context-free grammar production rules are also called Backus-Naur Form or BNF –A production like A → B c D is written in BNF as ::= c (Non-terminals written with angle brackets and ::= is used instead of →) –Often used to describe language syntax John Backus –Chair of the Algol committee in the early 1960s Peter Naur –Secretary of the committee, who used this notation to describe Algol in 1962

Type 0: Any formal grammar Turing machines Type-1: Linear bounded automata Type-2: Pushdown automata Type-3: Regular expressions Finite state automata Chomsky Hierarchy Categorization of various languages and grammars Each is strictly more descriptive than the previous First described by Noam Chomsky in 1956 CMSC 330

17 Sentential Forms A sentential form is a string of terminals and nonterminals produced from the start symbol Inductively: –The start symbol –If αAδ is a sentential form for a grammar, where (α and δ ∊ (V|Σ)*), and A → γ is a production, then αγδ is a sentential form for the grammar In this case, we say that αAδ derives αγδ in one step, which is written as αAδ ⇒ αγδ

CMSC Derivations ⇒ is used to indicate a derivation of one step ⇒ + is used to indicate a derivation of one or more steps ⇒ * indicates a derivation of zero or more steps Example: S  0|1|0S|1S|  0101: S ⇒ 0S ⇒ 01S ⇒ 010S ⇒ 0101 S ⇒ S ⇒ * S

CMSC Language Generated by Grammar A slightly more formal definition… The language defined by a CFG is the set of all sentential forms made up of only terminals. Example: S  0|1|0S|1S|  In language:Not in language: 01, 000, 11,  …0S, a, 11S, …

CMSC Example S → aS | T T → bT | U U → cU | ε A derivation: –S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac Abbreviated as S ⇒ + ac So S, aS, aT, aU, acU, ac are all sentential forms for this grammar –S ⇒ T ⇒ U ⇒ ε Is there any derivation –S ⇒ + ccc ? S ⇒ + Sa ? –S ⇒ + bab ? S ⇒ + bU ?

CMSC The Language Generated by a CFG The language generated by a grammar G is L(G) = { ω | ω ∊ Σ* and S ⇒ + ω } –(where S is the start symbol of the grammar and Σ is the alphabet for that grammar) I.e., all sentential forms with only terminals I.e., all strings over Σ that can be derived from the start symbol via one or more productions

CMSC Example (cont’d) S → aS | T T → bT | U U → cU | ε Generates what language? Do other grammars generate this language? S → ABC A → aA | ε B → bB | ε C → cC | ε –So grammars are not unique

CMSC Parse Trees A parse tree shows how a string is produced by a grammar –The root node is the start symbol –Each interior node is a nonterminal –Children of node are symbols on r.h.s of production applied to that nonterminal –Leaves are all terminal symbols Reading the leaves left-to-right shows the string corresponding to the tree

CMSC Example S → aS | T T → bT | U U → cU | ε S ⇒ aS ⇒ aT ⇒ aU ⇒ acU ⇒ ac

CMSC Parse Trees for Expressions A parse tree shows the structure of an expression as it corresponds to a grammar E → a | b | c | d | E+E | E-E | E*E | (E) aa*cc*(b+d)

CMSC Practice E → a | b | c | d | E+E | E-E | E*E | (E) Make a parse tree for… a*b a+(b-c) d*(d+b)-a (a+b)*(c-d) a+(b-c)*d