COMPILER CONSTRUCTION WEEK-2: LANGUAGE DESCRIPTION- SYNTACTIC STRUCTURE:

Slides:



Advertisements
Similar presentations
Operators and Arithmetic Operations. Operators An operator is a symbol that instructs the code to perform some operations or actions on one or more operands.
Advertisements

Programming Language Concepts
Where Syntax Meets Semantics
Expression Trees What is an Expression tree? Expression tree implementation Why expression trees? Evaluating an expression tree (pseudo code) Prefix, Infix,
Stacks - 3 Nour El-Kadri CSI Evaluating arithmetic expressions Stack-based algorithms are used for syntactical analysis (parsing). For example.
COP4020 Programming Languages Expression and assignment Prof. Xin Yuan.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Kymberly Fergusson CSE1303 Part A Data Structures and Algorithms Summer Semester 2003 Lecture A12 – Binary Trees.
Infix, Postfix, Prefix.
Chapter 8 . Sequence Control
Section 5.2 Defining Languages. © 2005 Pearson Addison-Wesley. All rights reserved5-2 Defining Languages A language –A set of strings of symbols –Examples:
Kymberly Fergusson CSE1303 Part A Data Structures and Algorithms Summer Semester 2003 Lecture A12 – Binary Trees.
Chapter 7 Expressions and Assignment Statements. Copyright © 2007 Addison-Wesley. All rights reserved. 1–2 Arithmetic Expressions Arithmetic evaluation.
The Stack and Queue Types Lecture 10 Hartmut Kaiser
Chapter Chapter Summary Introduction to Trees Applications of Trees (not currently included in overheads) Tree Traversal Spanning Trees Minimum.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Computer Science 112 Fundamentals of Programming II Expression Trees.
Stack Applications.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
CS 331, Principles of Programming Languages Chapter 2.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
Binary Trees 2 Overview Trees. Terminology. Traversal of Binary Trees. Expression Trees. Binary Search Trees.
Lecture 8 Tree.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Overview of Previous Lesson(s) Over View  An ambiguous grammar which fails to be LR and thus is not in any of the classes of grammars i.e SLR, LALR.
Data Structures and Algorithm Analysis Trees Lecturer: Jing Liu Homepage:
Expressions and Assignment Statements
LESSON 04.
CS 331, Principles of Programming Languages Chapter 2.
Prefix, Postfix and Infix. Infix notation  A-B/(C+D)  evaluate C+D (call the result X),  then B/X (call the result Y),  and finally A-Y.  The order.
Semantics (1).
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
1 Structure of a Compiler Source Language Target Language Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Semantics(1). 2 Symantec(1)  To provide an authoritative definition of the meaning of all language constructs for: 1.Programmers 2.Compiler writers 3.Standards.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Operators & Expressions
Chapter 3 – Describing Syntax
A Simple Syntax-Directed Translator
Infix to postfix conversion
Expressions and Assignment
CS510 Compiler Lecture 4.
Syntax (1).
CO4301 – Advanced Games Development Week 2 Introduction to Parsing
Compiler Construction
Stacks Chapter 4.
Trees Trees are a very useful data structure. Many different kinds of trees are used in Computer Science. We shall study just a few of these.
Binary Tree Application Expression Tree
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
R.Rajkumar Asst.Professor CSE
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Associativity and Prescedence
Queue Applications Lecture 31 Mon, Apr 9, 2007.
Expressions.
(Part 2) Infix, Prefix & Postfix
Queue Applications Lecture 31 Tue, Apr 11, 2006.
CO4301 – Advanced Games Development Week 3 Parsing Continued
Context Free Grammars-II
Programming Languages 2nd edition Tucker and Noonan
Trees Trees are a very useful data structure. Many different kinds of trees are used in Computer Science. We shall study just a few of these.
Faculty of Computer Science and Information System
Presentation transcript:

COMPILER CONSTRUCTION WEEK-2: LANGUAGE DESCRIPTION- SYNTACTIC STRUCTURE:

An Overview Clear and complete descriptions of a language are needed by programmers, implementers, and even language designers. The syntax of a language specifies how programs in the language are built up. The semantics of the language specifies what programs mean. For example, dates are built up from digits represented by D and the symbol / as follows: D D / D D / D D D D According to this syntax, 01/02/2001 is a date. The day this date refers to is not identified by the syntax. In the United States, this date refer to January 2, 2001, but elsewhere 01 is interpreted as the day and 02 as the month, so the date refers to February 1, The same syntax therefore has different semantics in different parts of the world.

Expression Notations: Expression such as a+b*c have been in use for centuries and were a starting point for design of programming languages. For example, a expression in Fortran can be written as: (- b + b2 – 4 * a * c ) / (2 * a) (- b + sqrt (b * b – 4.0 * a * c)) / (2.0 * a) Programming languages use a mix of infix, prefix, and postfix notations. (Assignment) A binary operator is applied to two operands. In infix notation, a binary operator is written between its operands, as in the expression a+b. Other alternative are prefix notation, in which the operator is written first, as + a b, and postfix notation, in which the operator is written last, as a b +. An expression can be enclosed within parentheses without affecting its value. Expression E has the same value as (E), as a rule. Prefix and Postfix notations are sometimes called parenthesis-free because as we shall see, the operands of each operator can be found unambiguously, without the need for parentheses.

Prefix Notation: An expression in prefix notation is written as follows: The prefix notation for a constant or a variable is the constant or variable itself. The application of an operator op to sub-expressions E1 and E2 is written in prefix notation as op E1 E2. An advantage of prefix notation is that it is easy to decode during a left-to-right scan of an expression. If a prefix expression begins with operator +, the next expression after + must be the first operands of + and the expression after that must be the second operand of +. For example, the sum of x and y is written in prefix notation as + x y. The product of + x y and z is written as * + x y z. Thus equals to 50 and * = * = 3000 Or * = * = 1800

Postfix Notation: An expression in postfix notation is written as follows: The postfix notation for a constant or a variable is the constant or variable itself. The application of an operator op to sub-expressions E1 and E2 is written in postfix notation as E1 E2 op. An advantage of postfix notation is that they can be mechanically evaluated with the help of a stack data structure. For example, the sum of x and y is written in postfix notation as x y +. The product of x y + and z is written as x y + z *. Thus equals to 50 and * = 50 60* = 3000 Or * = * = 1800

Infix Notation: In infix notation, operators appear between their operands; + appear between a and b in the sum a + b. An advantage of infix notation is that it is familiar and hence easy to read. Infix notation comes with rules for precedence and associativity. How is an expression like a + b * c to be decoded? Is it the sum of a and b * c, or is it the product of a + b and c? The operator * usually takes its operands before + does. An operator at a higher precedence level takes its operands before an operator at a lower precedence level. BODMAS rules is an example.

Mixfix Notation: Operations specified by a combination of symbols do not fit neatly into the prefix, infix, postfix classification. For example the keywords, if, then, and else are used together in the expression if a > b then a else b The meaningful components of this expression are the condition a>b and the expressions a and b. If a>b evaluates to true, then the value of the expression is a, otherwise, it is b. When symbols or keywords appear interspersed with the components of an expression, the operation will be said to be in mixfix notation.

Abstract Syntax Trees: The abstract syntax of a language identifies the meaningful components of each construct in the language. The prefix expression +ab, the infix expression a+b, and the postfix expression ab+ all have the same meaningful components; the operator + and the sub-expressions a and b. A corresponding tree representation is a better grammar can be designed if the abstract syntax of a language is known before the grammar is specified. + ab An operator and its operands are represented by a node and its children. A tree consists of a node with k 0 trees as its children. When k = 0, a tree consists of just a node, with no children. A node with no children is called a leaf. The root of a tree is a node with no parent; that is, it is not a child of any node.

Lexical Syntax: Keyword like if and symbol like <= are treated as units in a programming language, just as words are treated as units in English. The meaning of the word dote (love / admire) bears no relation to the meaning of dot, despite the similarity of their written representations. The two-characters symbol <= is treated as a unit in Pascal and C. It is distinct from the one-character < and =, which have different meaning of their own. For example: <> in Pascal and != in C mod in Pascal and % in C etc. Grammars deal with units called tokens The syntax of a programming language is specified in terms of units called tokens or terminals. A lexical syntax for language specifies the correspondence between the written representation of the language and the tokens or terminals in a grammar for the language. Alphabetic character sequences that are treated as units in a language are called keywords.

Lexical Syntax: Similarly comments between tokens are ignored. Informal descriptions usually suffice for white space, comments and the correspondence between tokens and their spellings, so lexical syntax will not be formalized. Real numbers are a possible exception. The most complex rules in a lexical syntax are typically the ones describing the syntax of real numbers, because parts of the syntax are optional. The following some of the ways of writing the same number: 314.E-2=3.14=0.314E+1= 0.313E1 and leading 0 can sometimes be dropped as:.314E1

Context-Free Grammars: The concrete syntax of a language describes its written representation, including lexical details such as the placement of keywords and punctuation marks. Context-free grammars, or simply grammars, are a notation for specifying concrete syntax. BNF-form, Backus-Nour Form, is a one way of writing grammars. (Assignment) A grammar for a language imposes a hierarchical structure, called a parse tree on programs in the language. The following is a parse tree for the string 3.14 in a language of real numbers:

Context-Free Grammars: real number Integer part digit fraction digit

Context-Free Grammars: The leaves at the bottom of a parse tree are labeled with terminals or tokens like 3; tokens represent themselves. By contrast, the other nodes of a parse tree are labeled with non- terminals like real-number and digit; non-terminal represent language constructs. Each node in the parse tree is based on a production, a rule that defines a non-terminal in terms of a sequence of terminals and non- terminals. The root of the parse tree for 3.14 is based on the following informally stated production: A real number consists of an integer part, a point, and a fraction part. Together the tokens, the non-terminals, the productions, and a distinguished non-terminal, called the starting non-terminal, constitute a grammar for a language. The starting non-terminal may represent a portion of a complete program when fragments of a programming language are studies. Both tokens and non-terminals are referred to as grammar symbols, or simply symbols.

Definition of Context-Free Grammars: Given a set of symbols, a starting over the set is a finite sequence of zero or more symbols from the set. The number of symbol in the sequence is said to be the length of the string. The length of the string teddy is 5. An empty string is a string of length zero. A context-free grammar, or simply grammar, has four parts: –A set of tokens or terminal; these are the atomic symbols in the language –A set of non-terminals; these are the variable representing constructs in the language. –A set of rules called productions for identifying the components of a construct. Each production has a non-terminals as its left side, the symbol =, and a string over the sets of terminals and non-terminals as its right side –A non-terminal chosen as the starting non-terminal; it represents the main construct of the language. Unless otherwise stated, the production for the starting non-terminal appear first.

BNF: Backus-Naur Form: The concept of a context-free grammar, consisting of terminals, non- terminals, productions, and a string non-terminal, is independent of the notation used to write grammars. BNF is one such notation, made popular by its use to organize the report on the Algol-60 programming language. Grammars for Expressions: A well-designed grammar can make it easy to pick out the meaningful components of a construct. In other words, with a well-designed grammar, parse trees are similar enough to abstract syntax trees that the grammar can be used to organize a language description or a program that exploits the syntax. An example of a program that exploits syntax is an expression evaluator that analyzes and evaluates expressions. After expressions, the remaining syntax is often easy.

Variants of Grammars: The other ways of grammars are Extended BNF and Syntax Charts. EBNF is an extension of BNF that allows lists and optional elements to be specified. Lists or sequences of elements appear frequently in the syntax of programming language. The appeal of EBNF is convenience, not additional capability, since anything that can be specified with EBNF can also be specified using BNF. Syntax charts are a graphical notation for grammars. They have visual appeal; again, anything that can be specified using syntax charts can also be specified using BNF. (Assignment)