Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Context-Free Grammars Lecture 7
Discussion #31/20 Discussion #3 Grammar Formalization & Parse-Tree Construction.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Chapter 3: Formal Translation Models
COP4020 Programming Languages
Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.
Languages and Grammars MSU CSE 260. Outline Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar,
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Language Translation Principles Part 1: Language Specification.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Languages & Strings String Operations Language Definitions.
Introduction Syntax: form of a sentence (is it valid) Semantics: meaning of a sentence Valid: the frog writes neatly Invalid: swims quickly mathematics.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
ISBN Chapter 3 Describing Syntax and Semantics.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
1 Chapter 4 Grammars and Parsing. 2 Context-Free Grammars: Concepts and Notation A context-free grammar G = (Vt, Vn, S, P) –A finite terminal vocabulary.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
ISBN Chapter 3 Describing Syntax and Semantics.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Formal Languages and Grammars
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Spring
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
7.2 Programming Languages - An Introduction to Informatics WMN Lab. Hye-Jin Lee.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Organization of Programming Languages Meeting 3 January 15, 2016.
Grammars, Derivations and Parsing. Sample Grammar Simple arithmetic expressions (E) Basis Rules: –A Variable is an E –An Integer is an E Inductive Rules:
Syntax and Semantics Structure of programming languages.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
CS 404 Introduction to Compiler Design
Programming Languages Translator
Parsing and Parser Parsing methods: top-down & bottom-up
Chapter 3 Context-Free Grammar and Parsing
Compiler Construction
CS 363 Comparative Programming Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Grammars, Languages and Parse Trees

Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e., L  V* L may be finite or infinite Programming language –Set of all possible programs (valid, very long string) –Programs with syntax errors are not in the set –Infinite number of programs

Language Representation Finite –Enumerate all sentences Infinite language –Cannot be specified by enumeration –Use a generative device, i.e., a grammar Specifies the set of all legal sentences Defined recursively (or inductively)

Sample Grammar Simple arithmetic expressions (E) Basis Rules: –A Variable is an E –An Integer is an E Inductive Rules: –If E 1 and E 2 are Es, so is (E 1 + E 2 ) –If E 1 and E 2 are Es, so is (E 1 * E 2 ) Examples: x, y, 3, 12, (x + y), (z * (x + y)), ((z * (x + y)) + 12)

Production Rules Use symbols (aka syntactical categories) and meta-symbols to define basis and inductive rules For our example: E  V E  I E  (E + E) E  (E * E) Inductive Rules Basis Rules

Formal Definition of a Grammar G = (V N, V T, S,  ), where – V N, V T, sets of non-terminal and terminal symbols – S  V N, a start symbol –  = a finite set of relations from (V T  V N ) + to (V T  V N ) * An element ( ,  ) of , is written as    and is called a production rule or a rewrite rule

Sample Grammar Revisited 1.E  V | I | (E + E) | (E * E) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z V N : E, V, I, D, L V T : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, x, y, z S = E  : rules 1-5

Another Simple Grammar Symbols: S: sentence V: verb O: object A: article N: noun SP: subject phrase VP: verb phrase NP: noun phrase Rules: S  SP VP SP  A N A  a | the N  monkey | banana | tree VP  V O V  ate | climbs O  NP NP  A N

Context-Free Grammar A context-free grammar is a grammar with the following restriction: – The relation  is a finite set of relations from V N to (V T  V N ) + The left hand side of a production is a single non-terminal The right hand side of any production cannot be empty Context-free grammars generate context-free languages. With slight variations, essentially all programming languages are context-free languages. We will focus on context-free grammars

More Grammars G 1 = (V N, V T, S,  ), where: V N = {S, B} V T = {a, b, c} S = S  = { S  aBSc, S  abc, Ba  aB, Bb  bb } G 2 = (V N, V T, S,  ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL, L  a | b | … | z, D  0 | 1 | … | 9 } G 3 = (V N, V T, S,  ), where:  = { S  aA, V N = {S, A, B } A  aA | bB, V T = {a, b} B  bB |  } S = S Which are context-free?

Direct Derivative Let G = (V N, V T, S,  ) be a grammar Let α, β  (V N  V T ) * β is said to be a direct derivative of α, written α  β, if there are strings  1 and  2 such that: α =  1 L  2, β =  1 λ  2, L  V N and L  λ is a production of G We go from α to β using a single rule

Examples of Direct Derivatives G = (V N, V T, S,  ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 } αβRule Used 11 22 IL I  L  IbLb I  L  b Lbab L  a  b IDDI0D D  0 ID

Derivation Let G = (V N, V T, S,  ) be a grammar A string α produces ω, or α reduces to ω, or ω is a derivation of α, written α  + ω, if there are strings  1, …,  n (n≥1) such that: α   1   2  …   n-1   n  ω We go from α to ω using several rules

1.E  V | I | (E + E) | (E * E) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 12 ) ? Example of Derivation E  ( E + E )  ( ( E * E ) + E )  ( ( E * ( E + E ) ) + E )  ( ( V * ( V + V ) ) + I )  ( ( L * ( L + L ) ) + ID )  ( ( z * ( x + y ) ) + DD )  ( ( z * ( x + y ) ) + 12 ) How about: ( x + 2 ) ( 21 * ( x4 + 7 ) ) 3 * z 2y

Grammar-generated Language If G is a grammar with start symbol S, a sentential form is any derivative of S A language L generated by a grammar G is the set of all sentential forms whose symbols are all terminals: L(G) = {  | S  +  and   V T * }

Example of Language Let G = (V N, V T, S,  ), where: V N = {I, L, D} V T = {a, b, …, z, 0, 1, …, 9} S = I  = { I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 } L(G) = {abc12, x, m , a1b2c3, …} I  ID  IDD  ILDD  ILLDD  LLLDD  aLLDD  abLDD  abcDD  abc1D  abc12

Syntax Analysis: Parsing The parse of a sentence is the construction of a derivation for that sentence The parsing of a sentence results in – acceptance or rejection – and, if acceptance, then also a parse tree We are looking for an algorithm to parse a sentence (i.e., to parse a program) and produce a parse tree

Parse Trees A parse tree is composed of – interior nodes representing elements of V N – leaf nodes representing elements of V T For each interior node N, the transition from N to its children represents the application of one production rule

Parse Tree Construction Top-down – Start with the root (start symbol) – Proceed downward to leaves using productions Bottom-up – Start from leaves – Proceed upward to the root Although these seem like reasonable approaches to develop a parsing algorithm, we’ll see later that neither is ideal  we’ll find a better way!

1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) ) ( ( L * ( L + L ) ) + D D ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( V * ( V + V ) ) + I D ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( A * ( A + A ) ) + I ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( A * A ) + A ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( A + A ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z A 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 12 ) Top down

( ( z * ( x + y ) ) ) ( ( V * ( V + V ) ) + I D) A ( A + A ) ( ( L * ( L + L ) ) + D D) ( ( A * ( A + A ) ) + I ) ( ( A * A ) + A ) 1.A  V | I | (A + A) | (A * A) 2.V  L | VL | VD 3.I  D | ID 4.D  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 5.L  x | y | z ( ( z * ( x + y ) ) + 12 ) Bottom up

Lexical Analyzer and Parser Lexical analyzers –Input: symbols of length 1 –Output: classified tokens Parsers –Input: classified tokens –Output: parse tree (i.e., syntactically correct program) A syntactically correct program will run. Will it do what you want? [a monkey ate a banana / a banana climbs the tree]

Backus-Naur Form (BNF) A traditional meta-language to represent grammars for programming languages – Every non-terminal is enclosed in – Instead of the symbol , we use ::= Example I  L | ID | IL L  a | b | … | z D  0 | 1 | … | 9 ::= | | ::= a | b | … | z ::= 0 | 1 | … | 9 WHY?