Defining Program Syntax Chapter TwoModern Programming Languages, 2nd ed.1.

Slides:



Advertisements
Similar presentations
ISBN Chapter 3 Describing Syntax and Semantics.
Advertisements

CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
Honors Compilers An Introduction to Grammars Feb 12th 2002.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
PZ02A - Language translation
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 2-a Defining Program Syntax
30-Jun-15 BNF. Metalanguages A metalanguage is a language used to talk about a language (usually a different one) We can use English as its own metalanguage.
Chapter 3: Formal Translation Models
COP4020 Programming Languages
S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Chapter TwoModern Programming Languages1 Defining Program Syntax.
Chapter TwoModern Programming Languages1 Defining Program Syntax.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Syntax and Backus Naur Form
Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
PART I: overview material
Programming Languages Third Edition Chapter 6 Syntax.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
ProgrammingLanguages Programming Languages Language Syntax This lecture introduces the the lexical structure of programming languages; the context-free.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Chapter 3 Describing Syntax and Semantics
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
ISBN Chapter 3 Describing Syntax and Semantics.
Syntax and Grammars.
The Interpreter Pattern (Behavioral) ©SoftMoore ConsultingSlide 1.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
©SoftMoore ConsultingSlide 1 Context-Free Grammars.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
PROGRAMMING LANGUAGES
Automata and Languages What do these have in common?
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Chapter Twelve: Context-Free Languages
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
High-Level Programming Language
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Defining Program Syntax Chapter TwoModern Programming Languages, 2nd ed.1

Syntax And Semantics n Programming language syntax: how programs look, their form and structure – Syntax is defined using a kind of formal grammar n Programming language semantics: what programs do, their behavior and meaning – Semantics is harder to define—more on this in Chapter 23 Chapter TwoModern Programming Languages, 2nd ed.2

Outline n Grammar and parse tree examples n BNF and parse tree definitions n Constructing grammars n Phrase structure and lexical structure n Other grammar forms Chapter TwoModern Programming Languages, 2nd ed.3

An English Grammar Chapter TwoModern Programming Languages, 2nd ed.4 A sentence is a noun phrase, a verb, and a noun phrase. A noun phrase is an article and a noun. A verb is… An article is… A noun is... ::= ::= loves | hates|eats ::= a | the ::= dog | cat | rat

How The Grammar Works n The grammar is a set of rules that say how to build a tree—a parse tree n You put at the root of the tree n The grammar’s rules say how children can be added at any point in the tree n For instance, the rule says you can add nodes,, and, in that order, as children of Chapter TwoModern Programming Languages, 2nd ed.5 ::=

A Parse Tree Chapter TwoModern Programming Languages, 2nd ed.6 <S><S> the dog the cat loves

A Programming Language Grammar n An expression can be the sum of two expressions, or the product of two expressions, or a parenthesized subexpression Or it can be one of the variables a, b or c Chapter TwoModern Programming Languages, 2nd ed.7 ::= + | * | ( ) | a | b | c

A Parse Tree Chapter TwoModern Programming Languages, 2nd ed.8 + ( ) * ( ) ab ((a+b)*c) c

Outline n Grammar and parse tree examples n BNF and parse tree definitions n Constructing grammars n Phrase structure and lexical structure n Other grammar forms Chapter TwoModern Programming Languages, 2nd ed.9

Chapter TwoModern Programming Languages, 2nd ed.10 ::= ::= loves | hates|eats ::= a | the ::= dog | cat | rat tokens non-terminal symbols start symbol a production

BNF Grammar Definition n A BNF grammar consists of four parts: – The set of tokens – The set of non-terminal symbols – The start symbol – The set of productions Chapter TwoModern Programming Languages, 2nd ed.11

Definition, Continued n The tokens are the smallest units of syntax – Strings of one or more characters of program text – They are atomic: not treated as being composed from smaller parts n The non-terminal symbols stand for larger pieces of syntax – They are strings enclosed in angle brackets, as in – They are not strings that occur literally in program text – The grammar says how they can be expanded into strings of tokens n The start symbol is the particular non-terminal that forms the root of any parse tree for the grammar Chapter TwoModern Programming Languages, 2nd ed.12

Definition, Continued n The productions are the tree-building rules Each one has a left-hand side, the separator ::=, and a right-hand side – The left-hand side is a single non-terminal – The right-hand side is a sequence of one or more things, each of which can be either a token or a non-terminal n A production gives one possible way of building a parse tree: it permits the non-terminal symbol on the left-hand side to have the things on the right- hand side, in order, as its children in a parse tree Chapter TwoModern Programming Languages, 2nd ed.13

Alternatives n When there is more than one production with the same left-hand side, an abbreviated form can be used n The BNF grammar can give the left-hand side, the separator ::=, and then a list of possible right-hand sides separated by the special symbol | Chapter TwoModern Programming Languages, 2nd ed.14

Example Chapter TwoModern Programming Languages, 2nd ed.15 Note that there are six productions in this grammar. It is equivalent to this one: ::= + | * | ( ) | a | b | c ::= + ::= * ::= ( ) ::= a ::= b ::= c

Empty The special nonterminal is for places where you want the grammar to generate nothing n For example, this grammar defines a typical if-then construct with an optional else part: Chapter TwoModern Programming Languages, 2nd ed.16 ::= if then ::= else |

Parse Trees n To build a parse tree, put the start symbol at the root n Add children to every non-terminal, following any one of the productions for that non-terminal in the grammar n Done when all the leaves are tokens n Read off leaves from left to right—that is the string derived by the tree Chapter TwoModern Programming Languages, 2nd ed.17

Practice Chapter TwoModern Programming Languages, 2nd ed.18 Show a parse tree for each of these strings: a+b a*b+c (a+b) (a+(b)) ::= + | * | ( ) | a | b | c

Compiler Note n What we just did is parsing: trying to find a parse tree for a given string n That’s what compilers do for every program you try to compile: try to build a parse tree for your program, using the grammar for whatever language you used n Take a course in compiler construction to learn about algorithms for doing this efficiently Chapter TwoModern Programming Languages, 2nd ed.19

Language Definition n We use grammars to define the syntax of programming languages n The language defined by a grammar is the set of all strings that can be derived by some parse tree for the grammar n As in the previous example, that set is often infinite (though grammars are finite) n Constructing grammars is a little like programming... Chapter TwoModern Programming Languages, 2nd ed.20

Outline n Grammar and parse tree examples n BNF and parse tree definitions n Constructing grammars n Phrase structure and lexical structure n Other grammar forms Chapter TwoModern Programming Languages, 2nd ed.21

Constructing Grammars n Most important trick: divide and conquer n Example: the language of Java declarations: a type name, a list of variables separated by commas, and a semicolon n Each variable can be followed by an initializer: Chapter TwoModern Programming Languages, 2nd ed.22 float a; boolean a,b,c; int a=1, b, c=1+2;

Example, Continued n Easy if we postpone defining the comma- separated list of variables with initializers: n Primitive type names are easy enough too: n (Note: skipping constructed types: class names, interface names, and array types) Chapter TwoModern Programming Languages, 2nd ed.23 ::= ; ::= boolean | byte | short | int | long | char | float | double

Example, Continued n That leaves the comma-separated list of variables with initializers n Again, postpone defining variables with initializers, and just do the comma- separated list part: Chapter TwoModern Programming Languages, 2nd ed.24 ::= |,

Example, Continued n That leaves the variables with initializers: n For full Java, we would need to allow pairs of square brackets after the variable name n There is also a syntax for array initializers And definitions for and Chapter TwoModern Programming Languages, 2nd ed.25 ::= | =

Outline n Grammar and parse tree examples n BNF and parse tree definitions n Constructing grammars n Phrase structure and lexical structure n Other grammar forms Chapter TwoModern Programming Languages, 2nd ed.26

Where Do Tokens Come From? n Tokens are pieces of program text that we do not choose to think of as being built from smaller pieces Identifiers ( count ), keywords ( if ), operators ( == ), constants ( ), etc. n Programs stored in files are just sequences of characters n How is such a file divided into a sequence of tokens? Chapter TwoModern Programming Languages, 2nd ed.27

Lexical Structure And Phrase Structure n Grammars so far have defined phrase structure: how a program is built from a sequence of tokens n We also need to define lexical structure: how a text file is divided into tokens Chapter TwoModern Programming Languages, 2nd ed.28

One Grammar For Both n You could do it all with one grammar by using characters as the only tokens n Not done in practice: things like white space and comments would make the grammar too messy to be readable Chapter TwoModern Programming Languages, 2nd ed.29 ::= if then ::= else |

Separate Grammars n Usually there are two separate grammars – One says how to construct a sequence of tokens from a file of characters – One says how to construct a parse tree from a sequence of tokens Chapter TwoModern Programming Languages, 2nd ed.30 ::= | ::= | | ::= | | ::= | | | …

Separate Compiler Passes n The scanner reads the input file and divides it into tokens according to the first grammar n The scanner discards white space and comments n The parser constructs a parse tree (or at least goes through the motions—more about this later) from the token stream according to the second grammar Chapter TwoModern Programming Languages, 2nd ed.31

Historical Note #1 n Early languages sometimes did not separate lexical structure from phrase structure – Early Fortran and Algol dialects allowed spaces anywhere, even in the middle of a keyword – Other languages like PL/I allow keywords to be used as identifiers n This makes them harder to scan and parse n It also reduces readability Chapter TwoModern Programming Languages, 2nd ed.32

Historical Note #2 n Some languages have a fixed-format lexical structure—column positions are significant – One statement per line (i.e. per card) – First few columns for statement label – Etc. n Early dialects of Fortran, Cobol, and Basic n Most modern languages are free-format: column positions are ignored Chapter TwoModern Programming Languages, 2nd ed.33

Outline n Grammar and parse tree examples n BNF and parse tree definitions n Constructing grammars n Phrase structure and lexical structure n Other grammar forms Chapter TwoModern Programming Languages, 2nd ed.34

Other Grammar Forms n BNF variations n EBNF variations n Syntax diagrams Chapter TwoModern Programming Languages, 2nd ed.35

BNF Variations n Some use  or = instead of ::= n Some leave out the angle brackets and use a distinct typeface for tokens Some allow single quotes around tokens, for example to distinguish ‘ | ’ as a token from | as a meta-symbol Chapter TwoModern Programming Languages, 2nd ed.36

EBNF Variations n Additional syntax to simplify some grammar chores: – {x} to mean zero or more repetitions of x – [x] to mean x is optional (i.e. x | ) – () for grouping – | anywhere to mean a choice among alternatives – Quotes around tokens, if necessary, to distinguish from all these meta-symbols Chapter TwoModern Programming Languages, 2nd ed.37

EBNF Examples n Anything that extends BNF this way is called an Extended BNF: EBNF n There are many variations Chapter TwoModern Programming Languages, 2nd ed.38 ::= { ;} ::= if then [else ] ::= { ( | ) ;} ::= a[1] ::= ‘ a[1] ’

Syntax Diagrams n Syntax diagrams (“railroad diagrams”) n Start with an EBNF grammar n A simple production is just a chain of boxes (for nonterminals) and ovals (for terminals): Chapter TwoModern Programming Languages, 2nd ed.39 ifthenelse exprstmt if-stmt ::= if then else

Bypasses n Square-bracket pieces from the EBNF get paths that bypass them Chapter TwoModern Programming Languages, 2nd ed.40 ifthenelse exprstmt if-stmt ::= if then [else ]

Branching n Use branching for multiple productions Chapter TwoModern Programming Languages, 2nd ed.41 ::= + | * | ( ) | a | b | c

Loops n Use loops for EBNF curly brackets Chapter TwoModern Programming Languages, 2nd ed.42 ::= {+ }

Syntax Diagrams, Pro and Con n Easier for people to read casually n Harder to read precisely: what will the parse tree look like? n Harder to make machine readable (for automatic parser-generators) Chapter TwoModern Programming Languages, 2nd ed.43

Formal Context-Free Grammars n In the study of formal languages and automata, grammars are expressed in yet another notation: n These are called context-free grammars n Other kinds of grammars are also studied: regular grammars (weaker), context- sensitive grammars (stronger), etc. Chapter TwoModern Programming Languages, 2nd ed.44 S  aSb | X X  cX | ε

Many Other Variations n BNF and EBNF ideas are widely used n Exact notation differs, in spite of occasional efforts to get uniformity n But as long as you understand the ideas, differences in notation are easy to pick up Chapter TwoModern Programming Languages, 2nd ed.45

Example Chapter TwoModern Programming Languages, 2nd ed.46 WhileStatement: while ( Expression ) Statement DoStatement: do Statement while ( Expression ) ; BasicForStatement: for ( ForInit opt ; Expression opt ; ForUpdate opt ) Statement [from The Java™ Language Specification, Third Edition, James Gosling et. al.]

Conclusion n We use grammars to define programming language syntax, both lexical structure and phrase structure n Connection between theory and practice – Two grammars, two compiler passes – Parser-generators can write code for those two passes automatically from grammars Chapter TwoModern Programming Languages, 2nd ed.47

Conclusion, Continued n Multiple audiences for a grammar – Novices want to find out what legal programs look like – Experts—advanced users and language system implementers—want an exact, detailed definition – Tools—parser and scanner generators—want an exact, detailed definition in a particular, machine-readable form Chapter TwoModern Programming Languages, 2nd ed.48