Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.

Slides:



Advertisements
Similar presentations
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
Advertisements

Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ICE1341 Programming Languages Spring 2005 Lecture #4 Lecture #4 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
ISBN Chapter 3 Describing Syntax and Semantics.
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
PZ02A - Language translation
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 3: Formal Translation Models
Specifying Languages CS 480/680 – Comparative Languages.
COP4020 Programming Languages
Languages and Grammars MSU CSE 260. Outline Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar,
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
CS 331, Principles of Programming Languages Chapter 2.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Context-Free Grammars
Grammars CPSC 5135.
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Bernd Fischer RW713: Compiler and Software Language Engineering.
CPS 506 Comparative Programming Languages Syntax Specification.
D Goforth COSC Translating High Level Languages.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
ISBN Chapter 3 Describing Syntax and Semantics.
LESSON 04.
CS 331, Principles of Programming Languages Chapter 2.
Syntax and Grammars.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
Programming Languages
Chapter 3 – Describing Syntax
CSE 3302 Programming Languages
Lecture 7: Introduction to Parsing (Syntax Analysis)
Lecture 4: Lexical Analysis & Chomsky Hierarchy
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee

Lee CSCE 314 TAMU 2 Language = syntax + semantics The syntax of a language is concerned with the form of a program: how expressions, commands, declarations etc. are put together to result in the final program. The semantics of a language is concerned with the meaning of a program: how the programs behave when executed on computers Syntax defines the set of valid programs, semantics how valid programs behave What Is a Programming Language?

Lee CSCE 314 TAMU 3 Syntax: grammatical structure lexical – how words are formed phrasal – how sentences are formed from words Semantics: meaning of programs Informal: English documents such as reference manuals Formal: 1. Operational semantics (execution on an abstract machine) 2. Denotational semantics (meaning defined as a mathematical function from input to output, definition compositional) 3. Axiomatic semantics (each construct is defined by pre- and post- conditions) Programming Language Definition

Lee CSCE 314 TAMU 4 Defines legal programs: programs that can be executed by machine Defined by grammar rules Define how to make “sentences” out of “words” For programming languages Sentences are called statements, expressions, terms, commands, and so on Words are called tokens Grammar rules describe both tokens and statements Often, grammars alone cannot capture exactly the set of valid programs. Grammars combined with additional rules are a common approach. Language Syntax

Lee CSCE 314 TAMU 5 Statement is a sequence of tokens Token is a sequence of characters Lexical analyzer produces a sequence of tokens from a character sequence Parser produces a statement representation from the token sequence Statements are represented as parse trees (abstract syntax tree) Language Syntax (Cont.)

Lee CSCE 314 TAMU 6 BNF is a common notation to define programming language grammars A BNF grammar G = (N, T, P, S) A set of non-terminal symbols N A set of terminal symbols T (tokens) A set of grammar rules P A start symbol S Grammar rule form (describe context-free grammars): ::= Backus-Naur Form (BNF)

Lee CSCE 314 TAMU 7 BNF rules for robot commands A robot arm accepts any command from the set {up, down, left, right} Rules: ::= | ::= up ::= down ::= left ::= right Examples of accepted sequences up down left up up right Examples of BNF

Lee CSCE 314 TAMU 8 From left to right Generates the following sequence Each terminal symbol is added to the sequence Each non-terminal is replaced by its definition For each |, pick any of the alternatives Note that a grammar can be used to both generate a statement, and verify that a statement is legal The latter is the task of parsing – find out if a sentence (program) is in a language, and how the grammar generates the sentence How to Read Grammar Rules

Lee CSCE 314 TAMU 9 Constructs and notation: nonterminal x ::= Body is defined by Body the sequence followed by { } the sequence of zero or more occurrences of { }+ the sequence of one or more occurrences of [ ] zero or one occurrence of Example ::= | ::= + |... ::= if then { elseif then }+ [ else ] end |... ::= | return |... Extended BNF

Lee CSCE 314 TAMU 10 Example Grammar Rules (Part of C++ Grammar) expression-statement: expression opt ; compound-statement: { statement-seq opt } statement-seq: statement statement-seq statement A.5 Statements statement: labeled-statement expression-statement compound-statement selection-statement iteration-statement jump-statement declaration-statement try-block labeled-statement: identifier : statement case constant-expression : statement default : statement selection-statement: if ( condition ) statement if ( condition ) statement else statement switch ( condition ) statement condition: expression type-specifier-seq declarator = assignment-expression iteration-statement: while ( condition ) statement do statement while ( expression ) ; for ( for-init-statement ; condition opt ; expression opt ) statement for-init-statement: expression-statement simple-declaration jump-statement: break ; continue ; return expression opt ; goto identifier ; declaration-statement: block-declaration

Lee CSCE 314 TAMU 11 A grammar G = (N, T, S, P) with the set of alphabet V is called context free if and only if all productions in P are of the form A -> B where A is a single nonterminal symbol and B is in V *. The reason this is called “context free” is that the production A -> B can be applied whenever the symbol A occurs in the string, no matter what else is in the string. Example: The grammar G = ( {S}, {a,b}, S, P ) where P = { S -> ab | aSb } is context free. The language generated by G is L(G) = { a n b n | n >= 1}. Context Free Grammars

Lee CSCE 314 TAMU 12 Concrete syntax tree Result of using a PL grammar to parse a program is a parse tree Contains every symbol in the input program, and all non-terminals used in the program’s derivation Abstract syntax tree (AST) Many symbols in input text are uninteresting (punctuation, such as commas in parameter lists, etc.) AST only contains “meaningful” information Other simplifications can also be made, e.g., getting rid of syntactic sugar, removing intermediate non- terminals, etc. Concrete vs. Abstract Syntax

Lee CSCE 314 TAMU 13 A grammar is ambiguous if there exists a string which gives rise to more than one parse tree E.g., infix binary operators ‘-’ ::= | ‘-’ Now parse 1 – Ambiguity (1) As (1-2)-3 As 1-(2-3)

Lee CSCE 314 TAMU 14 E.g., infix binary operators ‘+’ and ‘*’ ::= | + | * | == Now parse * 3 Ambiguity (2) As (1+2)*3 As 1+(2*3)

Lee CSCE 314 TAMU Between two calls to the same binary operator Associativity rules left-associative: a op b op c parsed as (a op b) op c right-associative: a op b op c parsed as a op (b op c) By disambiguating the grammar ::= | ‘-’ vs. ::= | ‘-’ 2. Between two calls to different binary operator Precedence rules if op1 has higher-precedence than op2 then a op1 b op2 c => (a op1 b) op2 c if op2 has higher-precedence than op1 then a op1 b op2 c => a op1 (b op2 c) Resolving Ambiguities

Lee CSCE 314 TAMU 16 Rewriting the ambiguous grammar: ::= | + | * | == Let us give * the highest precedence, + the next highest, and == the lowest ::= { == } ::= | + ::= | * Resolving Ambiguities (Cont.)

Lee CSCE 314 TAMU 17 Ambiguity in grammar is not a problem occurring only with binary operators For example, ::= if then | if then else Now consider the string: if A then if B then X else Y 1. if A then ( if B then X else Y ) ? 2. if A then ( if B then X ) else Y ? Dangling-Else Ambiguity

Lee CSCE 314 TAMU 18 Four classes of grammars that define particular classes of languages 1.Regular grammars 2.Context free grammars 3.Context sensitive grammars 4.Phrase-structure (unrestricted) grammars Ordered from less expressive to more expressive (but faster to slower to parse) Two first ones are of interest in theory of programming languages Chomsky Hierarchy

Lee CSCE 314 TAMU 19 Productions are of the form A -> aB or A -> a where A, B are nonterminal symbols and a is a terminal symbol. Can contain S -> λ. Example regular grammar G = ({A, S}, {a, b, c}, S, P), where P consists of the following productions: S -> aA A -> bA | cA | a G generates the following words aa, aba, aca, abba, abca, acca, abbba, abbca, abcba, … The language L(G) in regular expression: a(b+c)*a Regular Grammar

Lee CSCE 314 TAMU 20 The following three formalisms all express the same set of (regular) languages: 1.Regular grammars 2.Regular expressions 3.Finite state automata Not very expressive. For example, the language L = { a n b n | n >= 1 } is not regular. Question: Can you relate this language L to parsing programming languages? Answer: balancing parentheses Regular Languages

Lee CSCE 314 TAMU 21 A finite state automaton M=(S, I, f, s 0, F) consists of: a finite set S of states a finite set of input alphabet I a transition function f: SxI -> S that assigns to a given current state and input the next state of the automaton an initial state s 0, and a subset F of S consisting of accepting (or final) states Example: 1.Regular grammar 3. FSA S -> aA A -> bA | cA | a 2.Regular expression a(b+c)*a Finite State Automata S F A a b c a

Lee CSCE 314 TAMU 22 Regular languages are not sufficient for expressing the syntax of practical programming languages, so why use them? Simpler (and faster) implementation of the tedious (and potentially slow) “character-by- character” processing: DFA gives a direct implementation strategy Separation of concerns – deal with low level issues (tabs, linebreaks, token positions) in isolation: grammars for parsers need not go below token level Why a Separate Lexer?

Lee CSCE 314 TAMU Phrase-structure (unrestricted) grammars A -> B where A is string in V * containing at least one nonterminal symbol, and B is a string in V *. 2. Context sensitive grammars lAr -> lwr where A is a nonterminal symbol, and w a nonempty string in V *. Can contain S -> λ if S does not occur on RHS of any production. 3. Context free grammars A -> B where A is a nonterminal symbol. 4. Regular grammars A -> aB or A -> a where A, B are nonterminal symbols and a is a terminal symbol. Can contain S -> λ. Summary of the Productions