Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.

Slides:



Advertisements
Similar presentations
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
Advertisements

Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
ISBN Chapter 3 Describing Syntax and Semantics.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Programming Languages 2nd edition Tucker and Noonan
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 2: Syntax Fall 2009 Marco.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
S YNTAX. Outline Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Building lexical and syntactic analyzers
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker.
Syntax – Intro and Overview CS331. Syntax Syntax defines what is grammatically valid in a programming language –Set of grammatical rules –E.g. in English,
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Syntax and Backus Naur Form
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Context-Free Grammars and Parsing
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
CPS 506 Comparative Programming Languages Syntax Specification.
CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2012 A language that is simple to parse for the compiler.
D Goforth COSC Translating High Level Languages.
Syntax and Semantics Structure of programming languages.
CS 345 Project Presentation OOH A More Object-Oriented Hmm++ Alysha Behn Jesse Vera.
ISBN Chapter 3 Describing Syntax and Semantics.
Syntax (2).
Syntax and Grammars.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
9/15/2010CS485, Lecture 2, Fall Lecture 2: Introduction to Syntax (Revised based on the Tucker’s slides)
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
©SoftMoore ConsultingSlide 1 Context-Free Grammars.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Structure of programming languages
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.
Chapter 3 – Describing Syntax
CSCE 330 Programming Language Structures Syntax (Slides mainly based on Tucker and Noonan) Fall 2012 A language that is simple to parse for the compiler.
Programming Languages 2nd edition Tucker and Noonan
Semantic Analysis Chapter 4.
Programming Languages
Chapter 3 – Describing Syntax
Syntax (1).
What does it mean? Notes from Robert Sebesta Programming Languages
Lecture 3: Introduction to Syntax (Cont’)
Programming Languages 2nd edition Tucker and Noonan
Programming Languages 2nd edition Tucker and Noonan
R.Rajkumar Asst.Professor CSE
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Programming Languages 2nd edition Tucker and Noonan
Chapter 3 Describing Syntax and Semantics.
Programming Languages 2nd edition Tucker and Noonan
Presentation transcript:

Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax Hmm Abstract Syntax Programming Languages Noam Chomsky

Dr. Philip Cannata 2 Regular grammar – used for tokenizing Context-free grammar (BNF) – used for parsing Context-sensitive grammar – not really used for programming languages Chomsky Hierarchy

Dr. Philip Cannata 3 Simplest; least powerful Equivalent to: –Regular expression (think of perl) –Finite-state automaton Right regular grammar:  Terminal*, A and B  Nonterminal A →  B A →  Example: Integer → 0 Integer | 1 Integer |... | 9 Integer | 0 | 1 |... | 9 Regular Grammar

Dr. Philip Cannata 4 Less powerful than context-free grammars The following is not a regular language { aⁿ bⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end Regular Grammar

Dr. Philip Cannata 5 Regular Expressions xa character x \xan escaped character, e.g., \n { name }a reference to a name M | NM or N M NM followed by N M*zero or more occurrences of M M+One or more occurrences of M M?Zero or one occurrence of M [aeiou]the set of vowels [0-9]the set of digits.any single character

Dr. Philip Cannata 6 Regular Expressions

Dr. Philip Cannata 7 Regular Expressions

Dr. Philip Cannata 8 (S, a2i$) ├ (I, 2i$) ├ (I, i$) ├ (I, $) ├ (F, ) Thus: (S, a2i$) ├* (F, ) Finite State Automaton for Identifiers

Dr. Philip Cannata 9 Deterministic Finite State Automaton Examples

Dr. Philip Cannata 10 Production: α → β α  Nonterminal β  (Nonterminal  Terminal)* ie, lefthand side is a single nonterminal, and righthand side is a string of nonterminals and/or terminals (possibly empty). Context-Free Grammar

Dr. Philip Cannata 11 Production: α → β|α| ≤ |β| α, β  (Nonterminal  Terminal)* ie, lefthand side can be composed of strings of terminals and nonterminals Context-Sensitive Grammar

Dr. Philip Cannata 12 The syntax of a programming language is a precise description of all its grammatically correct programs. Precise syntax was first used with Algol 60, and has been used ever since. Three levels: –Lexical syntax - all the basic symbols of the language (names, values, operators, etc.) –Concrete syntax - rules for writing expressions, statements and programs. –Abstract syntax - internal representation of the program, favoring content over form. Syntax

Dr. Philip Cannata 13 Grammars Grammars: Metalanguages used to define the concrete syntax of a language. Backus Normal Form – Backus Naur Form (BNF) Stylized version of a context-free grammar (cf. Chomsky hierarchy) First used to define syntax of Algol 60 Now used to define syntax of most major languages Production: α → β α  Nonterminal β  (Nonterminal  Terminal)* ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty).nonterminalterminals Example Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Dr. Philip Cannata 14 Extended BNF (EBNF) Additional metacharacters { } a series of zero or more ( ) must pick one from a list [ ] pick none or one from a list Example Expression -> Term { ( + | - ) Term } IfStatement -> if ( Expression ) Statement [ else Statement ] EBNF is no more powerful than BNF, but its production rules are often simpler and clearer. Javacc EBNF ( … )* a series of zero or more ( … )+ a series of one or more [ … ] optional

Dr. Philip Cannata 15 For more details, see Chapter 2 of “Programming Language Pragmatics, Third Edition (Paperback)” Michael L. ScottMichael L. Scott (Author)

Dr. Philip Cannata 16 Internal Parse Tree Abstract Syntax int main () { return 0 ; } Program (abstract syntax): Function = main; Return type = int params = Block: Return: Variable: return#main, LOCAL addr=0 IntValue: 0 Instance of a Programming Language:

Dr. Philip Cannata 17 Now we’ll focus on the internal parse tree

Dr. Philip Cannata 18 Parse Trees Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Parse Tree for 352 as an Integer

Dr. Philip Cannata 19 Arithmetic Expression Grammar Expr  Expr + Term | Expr – Term | Term Term  0 |... | 9 | ( Expr ) Parse of

Dr. Philip Cannata 20 A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and -. Consider the following grammar: Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 |... | 9 | ( Expr ) Associativity and Precedence

Dr. Philip Cannata 21 Associativity and Precedence Parse of 4**2**3 + 5 * 6 + 7

Dr. Philip Cannata 22 PrecedenceAssociativityOperators 3right ** 2left * / % 1left + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level. Associativity and Precedence

Dr. Philip Cannata 23 A grammar is ambiguous if one of its strings has two or more diffferent parse trees. Example: Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | ** Equivalent to previous grammar but ambiguous Ambiguous Grammars

Dr. Philip Cannata 24 Ambiguous Parse of 5 – Ambiguous Grammars

Dr. Philip Cannata 25 Dangling Else Ambiguous Grammars IfStatement -> if ( Expression ) Statement | if ( Expression ) Statement else Statement Statement -> Assignment | IfStatement | Block Block -> { Statements } Statements -> Statements Statement | Statement With which ‘if’ does the following ‘else’ associate if (x < 0) if (y < 0) y = y - 1; else y = 0;

Dr. Philip Cannata 26 Dangling Else Ambiguous Grammars

Dr. Philip Cannata 27 Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject} Function : ( ) Block MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function } } MyObject: Identifier Identifier = create Identifier callArgs Constructor: Identifier ([{ Parameter } ]) block Declaration : Type Identifier [ [Literal] ]{, Identifier [ [ Literal ] ] } Type : int|bool| float | list |tuple| object | string | void Statements : { Statement } Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|ReturnStatement Block : { Statements } ForEach: for( Expression <- Expression ) Block Assignment : Identifier [ [ Expression ] ]= Expression ; Parameter : Type Identifier IfStatement: if ( Expression ) Block [elseifStatement| Block ] WhileStatement: while ( Expression ) Block Hmm BNF (i.e., Concrete Syntax)

Dr. Philip Cannata 28 Expression : Conjunction {|| Conjunction } Conjunction : Equality {&&Equality } Equality : Relation [EquOp Relation ] EquOp: == | != Relation : Addition [RelOp Addition ] RelOp: |>= Addition : Term {AddOp Term } AddOp: + | - Term : Factor {MulOp Factor } MulOp: * | / | % Factor : [UnaryOp]Primary UnaryOp: - | ! Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension| ObjFunction callOrLambda : Identifier callArgs|LambdaDef callArgs : ([Expression |passFunc {,Expression |passFunc}] ) passFunc : Identifier (Type Identifier { Type Identifier } ) LambdaDef : (\\ Identifier {,Identifier } -> Expression) Hmm BNF (i.e., Concrete Syntax)

Dr. Philip Cannata 29 Hmm BNF (i.e., Concrete Syntax) IdentifierOrArrayRef : Identifier [ [Expression] ] subExpressionOrTuple : ([ Expression [,[ Expression {, Expression } ] ] ] ) ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<- Expression ] } ] ObjFunction: Identifier. Identifier. Identifier callArgs Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)} Literal : Integer | True | False | ClFloat | ClString Integer : Digit { Digit } ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9} ClString: ” {~[“] }”

Dr. Philip Cannata 30 Clite OperatorAssociativity Unary - ! none * /left + -left >=none == !=none &&left ||left Associativity and Precedence for Hmm

Dr. Philip Cannata 31 Hmm Parse Tree Example z = x + 2 * y;

Dr. Philip Cannata 32 Now we’ll focus on the Abstract Syntax

Dr. Philip Cannata 33 Hmm Parse Tree z = x + 2 * y; =

Dr. Philip Cannata 34 Very Approximate Hmm Abstract Syntax

Dr. Philip Cannata 35 Assignment = Variable target; Expression source Expression = VariableRef | Value | Binary | Unary VariableRef = Variable | ArrayRef Variable = String id ArrayRef = String id; Expression index Value = IntValue | BoolValue | FloatValue | CharValue Binary = Operator op; Expression term1, term2 Unary = UnaryOp op; Expression term Operator = ArithmeticOp | RelationalOp | BooleanOp IntValue = Integer intValue … Very Approximate Hmm Abstract Syntax

Dr. Philip Cannata 36 Binary Operator Variable Value + 2 y * x Hmm Abstract Syntax – Binary Example z = x + 2 * y =