作者 : 陳鍾誠 單位 : 金門技術學院資管系 URL : 日期 : 2016/6/4 程式語言的語法 Grammar.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
ISBN Chapter 3 Describing Syntax and Semantics.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
PZ02A - Language translation
Context-Free Grammars Lecture 7
A basis for computer theory and A means of specifying languages
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
COP4020 Programming Languages
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
PART I: overview material
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
ISBN Chapter 3 Describing Syntax and Semantics.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
5. Context-Free Grammars and Languages
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Context-Free Grammars: an overview
CS510 Compiler Lecture 4.
Lexical and Syntax Analysis
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
CSE 3302 Programming Languages
Lexical Analysis & Syntactic Analysis
5. Context-Free Grammars and Languages
CHAPTER 2 Context-Free Languages
CSC 4181Compiler Construction Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CSC 4181 Compiler Construction Context-Free Grammars
COMPILER CONSTRUCTION
Presentation transcript:

作者 : 陳鍾誠 單位 : 金門技術學院資管系 URL : 日期 : 2016/6/4 程式語言的語法 Grammar

Grammar 2 陳鍾誠 /6/4

Language 3 陳鍾誠 /6/4

Recursive Definition 4 陳鍾誠 /6/4

Mathematical Expression 5 陳鍾誠 /6/4

Structure of Expressions 6 陳鍾誠 /6/4

Formal Language 7 陳鍾誠 /6/4

Backus Naur Form (BNF) 8 陳鍾誠 /6/ by J. Backus and P. Naur

EBNF (Extended BNF) 9 陳鍾誠 /6/4

BNF  EBNF 10 陳鍾誠 /6/4 BNFEBNF

Formalism (Formal notation) N. Chomsky 近代結構語言學之父 11 陳鍾誠 /6/4 N. Chromsky -

Differing structural trees for the same expression 12 陳鍾誠 /6/4

Problem of Different structural trees 13 陳鍾誠 /6/4

No Ambiguous Sentence 14 陳鍾誠 /6/4

Context Free Language Syntactic equations of the form defined in EBNF generate context- free languages. The term "context free” is due to Chomsky and stems from the fact that substitution of the symbol left of = by a sequence derived from the expression to the right of = is always permitted, regardless of the context in which the symbol is embedded within the sentence. It has turned out that this restriction to context freedom (in the sense of Chomsky) is quite acceptable for programming languages, and that it is even desirable. Context dependence in another sense, however, is indispensible. We will return to this topic in Chapter 陳鍾誠 /6/4

Regular Expression A language is regular, if its syntax can be expressed by a single EBNF expression. The requirement that a single equation suffices also implies that only terminal symbols occur in the expression. Such an expression is called a regular expression. 16 陳鍾誠 /6/4

Syntax Analysis v.s. Regular Expression The reason for our interest in regular languages lies in the fact that programs for the recognition of regular sentences are particularly simple and efficient. By "recognition" we mean the determination of the structure of the sentence, and thereby naturally the determination of whether the sentence is well formed, that is, it belongs to the language. Sentence recognition is called syntax analysis. 17 陳鍾誠 /6/4

Regular Expression v.s. State Machine For the recognition of regular sentences a finite automaton, also called a state machine, is necessary and sufficient. In each step the state machine reads the next symbol and changes state. The resulting state is solely determined by the previous state and the symbol read. If the resulting state is unique, the state machine is deterministic, otherwise nondeterministic. If the state machine is formulated as a program, the state is represented by the current point of program execution. 18 陳鍾誠 /6/4

EBNF  Program The analyzing program can be derived directly from the defining syntax in EBNF. For each EBNF construct K there exists a translation rule which yields a program fragment Pr(K). The translation rules from EBNF to program text are shown below. Therein sym denotes a global variable always representing the symbol last read from the source text by a call to procedure next. Procedure error terminates program execution, signaling that the symbol sequence read so far does not belong to the language. 19 陳鍾誠 /6/4

Analyzing program 20 陳鍾誠 /6/4

EBNF with only 1 rule 21 陳鍾誠 /6/4

First() 22 陳鍾誠 /6/4

Precondition 23 陳鍾誠 /6/4

Lexical Analysis for Identifier 24 陳鍾誠 /6/4

Lexical Analysis for Integer 25 陳鍾誠 /6/4

Scanner The process of syntax analysis is based on a procedure to obtain the next symbol. This procedure in turn is based on the definition of symbols in terms of sequences of one or more characters. This latter procedure is called a scanner, and syntax analysis on this second, lower level, lexical analysis. 26 陳鍾誠 /6/4

Lexical Analysis v.s. Syntax Analysis 27 陳鍾誠 /6/4

A Scanner Example As an example we show a scanner for a parser of EBNF. Its terminal symbols and their definition in terms of characters are 28 陳鍾誠 /6/4

Procedure GetSym() –(1) 29 陳鍾誠 /6/4

Procedure GetSym() –(2) 30 陳鍾誠 /6/4

Procedure GetSym() –(3) 31 陳鍾誠 /6/4

Syntax Analysis Overview Goal – determine if the input token stream satisfies the syntax of the program What do we need to do this? An expressive way to describe the syntax A mechanism that determines if the input token stream satisfies the syntax description For lexical analysis Regular expressions describe tokens Finite automata = mechanisms to generate tokens from input stream

Just Use Regular Expressions? REs can expressively describe tokens Easy to implement via DFAs So just use them to describe the syntax of a programming language NO! – They don’t have enough power to express any non- trivial syntax Example – Nested constructs (blocks, expressions, statements) – Detect balanced braces: {{} {} {{} { }}} { {{{{ }}}}}... - We need unbounded counting! - FSAs cannot count except in a strictly modulo fashion

Context-Free Grammars Consist of 4 components: Terminal symbols = token or  Non-terminal symbols = syntactic variables Start symbol S = special non-terminal Productions of the form LHS  RHS LHS = single non-terminal RHS = string of terminals and non-terminals Specify how non-terminals may be expanded Language generated by a grammar is the set of strings of terminals derived from the start symbol by repeatedly applying the productions L(G) = language generated by grammar G S  a S a S  T T  b T b T  

CFG - Example Grammar for balanced-parentheses language S  ( S ) S S   1 non-terminal: S 2 terminals: “)”, “)” Start symbol: S 2 productions If grammar accepts a string, there is a derivation of that string using the productions “(())” S = (S)  = ((S) S)  = ((  )  )  = (()) ? Why is the final S required?

More on CFGs Shorthand notation – vertical bar for multiple productions S  a S a | T T  b T b |  CFGs powerful enough to expression the syntax in most programming languages Derivation = successive application of productions starting from S Acceptance? = Determine if there is a derivation for an input token stream

A Parser Parser Context free grammar, G Token stream, s (from lexer) Yes, if s in L(G) No, otherwise Error messages Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted Various kinds: LL(k), LR(k), SLR, LALR

RE is a Subset of CFG Can inductively build a grammar for each RE  S   aS  a R1 R2S  S1 S2 R1 | R2S  S1 | S2 R1*S  S1 S |  Where G1 = grammar for R1, with start symbol S1 G2 = grammar for R2, with start symbol S2

Grammar for Sum Expression Grammar S  E + S | E E  number | (S) Expanded S  E + S S  E E  number E  (S) 4 productions 2 non-terminals (S,E) 4 terminals: “(“, “)”, “+”, number start symbol: S

Constructing a Derivation Start from S (the start symbol) Use productions to derive a sequence of tokens For arbitrary strings α, β, γ and for a production: A  β A single step of the derivation is α A γ α β γ (substitute β for A) Example S  E + S (S + E) + E  (E + S + E) + E

Class Problem S  E + S | E E  number | (S) Derive: ( (3 + 4)) + 5

Parse Tree S E+S ( S )E E + S 5 1 2E ( S ) E + S E34 Parse tree = tree representation of the derivation Leaves of the tree are terminals Internal nodes are non-terminals No information about the order of the derivation steps

Parse Tree vs Abstract Syntax Tree S E+S ( S )E E + S 5 1 2E ( S ) E + S E Parse tree also called “concrete syntax” AST discards (abstracts) unneeded information – more compact format

Derivation Order Can choose to apply productions in any order, select non-terminal and substitute RHS of production Two standard orders: left and right-most Leftmost derivation In the string, find the leftmost non-terminal and apply a production to it E + S  1 + S Rightmost derivation Same, but find rightmost non-terminal E + S  E + E + S

Leftmost/Rightmost Derivation Examples » S  E + S | E » E  number | (S) » Leftmost derive: ( (3 + 4)) + 5 S  E + S  (S)+S  (E+S) + S  (1+S)+S  (1+E+S)+S  (1+2+S)+S  (1+2+E)+S  (1+2+(S))+S  (1+2+(E+S))+S  (1+2+(3+S))+S  (1+2+(3+E))+S  (1+2+(3+4))+S  (1+2+(3+4))+E  (1+2+(3+4))+5 »Now, rightmost derive the same input string Result: Same parse tree: same productions chosen, but in diff order S  E+S  E+E  E+5  (S)+5  (E+S)+5  (E+E+S)+5  (E+E+E)+5  (E+E+(S))+5  (E+E+(E+S))+5  (E+E+(E+E))+5  (E+E+(E+4))+5  (E+E+(3+4))+5  (E+2+(3+4))+5  (1+2+(3+4))+5

Class Problem S  E + S | E E  number | (S) | -S Do the rightmost derivation of : 1 + (2 + -(3 + 4)) + 5

Ambiguous Grammars In the sum expression grammar, leftmost and rightmost derivations produced identical parse trees + operator associates to the right in parse tree regardless of derivation order (1+2+(3+4))

An Ambiguous Grammar + associates to the right because of the right- recursive production: S  E + S Consider another grammar S  S + S | S * S | number Ambiguous grammar = different derivations produce different parse trees More specifically, G is ambiguous if there are 2 distinct leftmost (rightmost) derivations for some sentence

Ambiguous Grammar - Example S  S + S | S * S | number Consider the expression: * 3 Derivation 1: S  S+S  1+S  1+S*S  1+2*S  1+2*3 Derivation 2: S  S*S  S+S*S  1+S*S  1+2*S  1+2*3 + *1 23 * Obviously not equal!

Impact of Ambiguity Different parse trees correspond to different evaluations! Thus, program meaning is not defined!! + *1 23 * = 7 = 9

Can We Get Rid of Ambiguity? Ambiguity is a function of the grammar, not the language! A context-free language L is inherently ambiguous if all grammars for L are ambiguous Every deterministic CFL has an unambiguous grammar So, no deterministic CFL is inherently ambiguous No inherently ambiguous programming languages have been invented To construct a useful parser, must devise an unambiguous grammar

Eliminating Ambiguity Often can eliminate ambiguity by adding nonterminals and allowing recursion only on right or left S  S + T | T T  T * num | num T non-terminal enforces precedence Left-recursion; left associativity S S + T TT * 3 12

A Closer Look at Eliminating Ambiguity Precedence enforced by Introduce distinct non-terminals for each precedence level Operators for a given precedence level are specified as RHS for the production Higher precedence operators are accessed by referencing the next-higher precedence non- terminal

Associativity An operator is either left, right or non associative Left:a + b + c = (a + b) + c Right:a ^ b ^ c = a ^ (b ^ c) Non:a < b < c is illegal (thus undefined) Position of the recursion relative to the operator dictates the associativity Left (right) recursion  left (right) associativity Non: Don’t be recursive, simply reference next higher precedence non-terminal on both sides of operator

Class Problem (Tough) S  S + S | S – S | S * S | S / S | (S) | -S | S ^ S | number Enforce the standard arithmetic precedence rules and remove all ambiguity from the above grammar Precedence (high to low) (), unary – ^ *, / +, - Associativity ^ = right rest are left