Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.

Similar presentations


Presentation on theme: "Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax."— Presentation transcript:

1 Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax Hmm Abstract Syntax Programming Languages Noam Chomsky

2 Dr. Philip Cannata 2 Regular grammar – used for tokenizing Context-free grammar (BNF) – used for parsing Context-sensitive grammar – not really used for programming languages Chomsky Hierarchy

3 Dr. Philip Cannata 3 Simplest; least powerful Equivalent to: –Regular expression (think of perl) –Finite-state automaton Right regular grammar:  Terminal*, A and B  Nonterminal A →  B A →  Example: Integer → 0 Integer | 1 Integer |... | 9 Integer | 0 | 1 |... | 9 Regular Grammar

4 Dr. Philip Cannata 4 Less powerful than context-free grammars The following is not a regular language { aⁿ bⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end Regular Grammar

5 Dr. Philip Cannata 5 Regular Expressions xa character x \xan escaped character, e.g., \n { name }a reference to a name M | NM or N M NM followed by N M*zero or more occurrences of M M+One or more occurrences of M M?Zero or one occurrence of M [aeiou]the set of vowels [0-9]the set of digits.any single character

6 Dr. Philip Cannata 6 Regular Expressions

7 Dr. Philip Cannata 7 Regular Expressions

8 Dr. Philip Cannata 8 (S, a2i$) ├ (I, 2i$) ├ (I, i$) ├ (I, $) ├ (F, ) Thus: (S, a2i$) ├* (F, ) Finite State Automaton for Identifiers

9 Dr. Philip Cannata 9 Deterministic Finite State Automaton Examples

10 Dr. Philip Cannata 10 Production: α → β α  Nonterminal β  (Nonterminal  Terminal)* ie, lefthand side is a single nonterminal, and righthand side is a string of nonterminals and/or terminals (possibly empty). Context-Free Grammar

11 Dr. Philip Cannata 11 Production: α → β|α| ≤ |β| α, β  (Nonterminal  Terminal)* ie, lefthand side can be composed of strings of terminals and nonterminals Context-Sensitive Grammar

12 Dr. Philip Cannata 12 The syntax of a programming language is a precise description of all its grammatically correct programs. Precise syntax was first used with Algol 60, and has been used ever since. Three levels: –Lexical syntax - all the basic symbols of the language (names, values, operators, etc.) –Concrete syntax - rules for writing expressions, statements and programs. –Abstract syntax - internal representation of the program, favoring content over form. Syntax

13 Dr. Philip Cannata 13 Grammars Grammars: Metalanguages used to define the concrete syntax of a language. Backus Normal Form – Backus Naur Form (BNF) Stylized version of a context-free grammar (cf. Chomsky hierarchy) First used to define syntax of Algol 60 Now used to define syntax of most major languages Production: α → β α  Nonterminal β  (Nonterminal  Terminal)* ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty).nonterminalterminals Example Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

14 Dr. Philip Cannata 14 Extended BNF (EBNF) Additional metacharacters { } a series of zero or more ( ) must pick one from a list [ ] pick none or one from a list Example Expression -> Term { ( + | - ) Term } IfStatement -> if ( Expression ) Statement [ else Statement ] EBNF is no more powerful than BNF, but its production rules are often simpler and clearer. Javacc EBNF ( … )* a series of zero or more ( … )+ a series of one or more [ … ] optional

15 Dr. Philip Cannata 15 For more details, see Chapter 2 of “Programming Language Pragmatics, Third Edition (Paperback)” Michael L. ScottMichael L. Scott (Author)

16 Dr. Philip Cannata 16 Internal Parse Tree Abstract Syntax int main () { return 0 ; } Program (abstract syntax): Function = main; Return type = int params = Block: Return: Variable: return#main, LOCAL addr=0 IntValue: 0 Instance of a Programming Language:

17 Dr. Philip Cannata 17 Now we’ll focus on the internal parse tree

18 Dr. Philip Cannata 18 Parse Trees Integer  Digit | Integer Digit Digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Parse Tree for 352 as an Integer

19 Dr. Philip Cannata 19 Arithmetic Expression Grammar Expr  Expr + Term | Expr – Term | Term Term  0 |... | 9 | ( Expr ) Parse of 5 - 4 + 3

20 Dr. Philip Cannata 20 A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics; * and / have higher precedence than + and -. Consider the following grammar: Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 |... | 9 | ( Expr ) Associativity and Precedence

21 Dr. Philip Cannata 21 Associativity and Precedence Parse of 4**2**3 + 5 * 6 + 7

22 Dr. Philip Cannata 22 PrecedenceAssociativityOperators 3right ** 2left * / % 1left + - Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level. Associativity and Precedence

23 Dr. Philip Cannata 23 A grammar is ambiguous if one of its strings has two or more diffferent parse trees. Example: Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | ** Equivalent to previous grammar but ambiguous Ambiguous Grammars

24 Dr. Philip Cannata 24 Ambiguous Parse of 5 – 4 + 3 Ambiguous Grammars

25 Dr. Philip Cannata 25 Dangling Else Ambiguous Grammars IfStatement -> if ( Expression ) Statement | if ( Expression ) Statement else Statement Statement -> Assignment | IfStatement | Block Block -> { Statements } Statements -> Statements Statement | Statement With which ‘if’ does the following ‘else’ associate if (x < 0) if (y < 0) y = y - 1; else y = 0;

26 Dr. Philip Cannata 26 Dangling Else Ambiguous Grammars

27 Dr. Philip Cannata 27 Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject} Function : ( ) Block MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function } } MyObject: Identifier Identifier = create Identifier callArgs Constructor: Identifier ([{ Parameter } ]) block Declaration : Type Identifier [ [Literal] ]{, Identifier [ [ Literal ] ] } Type : int|bool| float | list |tuple| object | string | void Statements : { Statement } Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|ReturnStatement Block : { Statements } ForEach: for( Expression <- Expression ) Block Assignment : Identifier [ [ Expression ] ]= Expression ; Parameter : Type Identifier IfStatement: if ( Expression ) Block [elseifStatement| Block ] WhileStatement: while ( Expression ) Block Hmm BNF (i.e., Concrete Syntax)

28 Dr. Philip Cannata 28 Expression : Conjunction {|| Conjunction } Conjunction : Equality {&&Equality } Equality : Relation [EquOp Relation ] EquOp: == | != Relation : Addition [RelOp Addition ] RelOp: |>= Addition : Term {AddOp Term } AddOp: + | - Term : Factor {MulOp Factor } MulOp: * | / | % Factor : [UnaryOp]Primary UnaryOp: - | ! Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension| ObjFunction callOrLambda : Identifier callArgs|LambdaDef callArgs : ([Expression |passFunc {,Expression |passFunc}] ) passFunc : Identifier (Type Identifier { Type Identifier } ) LambdaDef : (\\ Identifier {,Identifier } -> Expression) Hmm BNF (i.e., Concrete Syntax)

29 Dr. Philip Cannata 29 Hmm BNF (i.e., Concrete Syntax) IdentifierOrArrayRef : Identifier [ [Expression] ] subExpressionOrTuple : ([ Expression [,[ Expression {, Expression } ] ] ] ) ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<- Expression ] } ] ObjFunction: Identifier. Identifier. Identifier callArgs Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)} Literal : Integer | True | False | ClFloat | ClString Integer : Digit { Digit } ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9} ClString: ” {~[“] }”

30 Dr. Philip Cannata 30 Clite OperatorAssociativity Unary - ! none * /left + -left >=none == !=none &&left ||left Associativity and Precedence for Hmm

31 Dr. Philip Cannata 31 Hmm Parse Tree Example z = x + 2 * y;

32 Dr. Philip Cannata 32 Now we’ll focus on the Abstract Syntax

33 Dr. Philip Cannata 33 Hmm Parse Tree z = x + 2 * y; =

34 Dr. Philip Cannata 34 Very Approximate Hmm Abstract Syntax

35 Dr. Philip Cannata 35 Assignment = Variable target; Expression source Expression = VariableRef | Value | Binary | Unary VariableRef = Variable | ArrayRef Variable = String id ArrayRef = String id; Expression index Value = IntValue | BoolValue | FloatValue | CharValue Binary = Operator op; Expression term1, term2 Unary = UnaryOp op; Expression term Operator = ArithmeticOp | RelationalOp | BooleanOp IntValue = Integer intValue … Very Approximate Hmm Abstract Syntax

36 Dr. Philip Cannata 36 Binary Operator Variable Value + 2 y * x Hmm Abstract Syntax – Binary Example z = x + 2 * y =


Download ppt "Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax."

Similar presentations


Ads by Google