Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 4: Lexical Analysis & Chomsky Hierarchy

Similar presentations


Presentation on theme: "Lecture 4: Lexical Analysis & Chomsky Hierarchy"— Presentation transcript:

1 Lecture 4: Lexical Analysis & Chomsky Hierarchy
(Revised based on the Tucker’s slides) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

2 Revisit Expression Grammar
Let us consider the following Grammar for Assignment: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer | ID Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Build a parse tree abc = x + y 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

3 Lecture 4: Lexical Analysis & Chomsky Grammar
Levels of Syntax Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form. E.g., 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

4 Lecture 4: Lexical Analysis & Chomsky Grammar
So Expression Grammar For the following grammar: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer Concrete Syntax Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Lexical Syntax 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

5 Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Grammar Simplest; least powerful Concentrate on the lexical syntax Right regular grammar:   T*, B  N, a  T A →  B A → ε (ε is an empty string) or A → A → a 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

6 Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Grammar Left regular grammar:   T*, B  N, a  T A → B  A → ε A → a A regular grammar is either a left regular grammar or right regular grammar Consider the following grammar: S → aA A → Sb S → ε 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

7 Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Grammars Equivalent to: Regular expression Finite-state automaton Used in construction of tokenizers Less powerful than context-free grammars Not a regular language { aⁿ bⁿ | n ≥ 1 } and { am bⁿ | 1 ≤m≤n } i.e., cannot balance: ( ), { }, begin end 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

8 Compilers & Interpreters
Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens Source Program Machine Code Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Find syntax errors Find semantic errors 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

9 Lecture 4: Lexical Analysis & Chomsky Grammar
Purpose: transform program representation Input: printable Ascii characters Output: tokens (Terminals T) Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol. A token is corresponding to a Terminal Symbol in CFG 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

10 Lecture 4: Lexical Analysis & Chomsky Grammar
Example Tokens Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char ... Operators: + - * / ... Punctuation: ; , ( ) { } 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

11 Lecture 4: Lexical Analysis & Chomsky Grammar
Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file All of the above languages can be defined by the CFG Grammar. 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

12 Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Expressions RegExpr Meaning x a character x \x an escaped character, e.g., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

13 Lecture 4: Lexical Analysis & Chomsky Grammar
RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits Any single character 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

14 Lecture 4: Lexical Analysis & Chomsky Grammar
Clite Lexical Syntax Category Definition anyChar [ -~] Letter [a-zA-Z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

15 Lecture 4: Lexical Analysis & Chomsky Grammar
Category Definition Identifier {Letter}({Letter} | {Digit})* integerLit {Digit}+ floatLit {Digit}+\.{Digit}+ charLit ‘{anyChar}’ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

16 Lecture 4: Lexical Analysis & Chomsky Grammar
Category Definition Operator = | || | && | == | != | < | <= | >| >= | + | - | * | / |! | [ | ] Separator : | . | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol} 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

17 Lecture 4: Lexical Analysis & Chomsky Grammar
Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

18 Lecture 4: Lexical Analysis & Chomsky Grammar
Chomsky Hierarchy Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

19 Context-free Grammars
BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

20 Context-Sensitive Grammars
Production: α → β |α| ≤ |β| α, β  (N  T)* ie, lefthand side can be composed of strings of terminals and nonterminals 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

21 Regular Expression Exercise
Describe the languages denoted by the following REs 0(0|1)*0 ((|0)1*)* (0|1)*0(0|1)(0|1) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

22 Regular Expression Exercise
Consider a small language using only the letter “z” and “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+,  Valid: /o/zzzz/oo/, /ozz/oz////o/ Invalid: /o/, /ozzzooo/zzzo/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

23 Regular Expression Exercise
Consider a small language using only the letters “z”, “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+,  /o(o*z|/)*o+/ /o(o|z|/)*o/ /o/*(o*z/*)*o+/ /o(/|oz|oo)*o+/ /o(/*o*z)*/*o+/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

24 Regular Expression Exercise
All Strings of 0’s and 1’s to satisfy the following condition all binary strings except empty string contains at least three 1s does not contain the substring 110 length is at least 1 and at most 3 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar


Download ppt "Lecture 4: Lexical Analysis & Chomsky Hierarchy"

Similar presentations


Ads by Google