Lecture 4: Lexical Analysis & Chomsky Hierarchy

Lecture 4: Lexical Analysis & Chomsky Hierarchy
(Revised based on the Tucker’s slides) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Revisit Expression Grammar
Let us consider the following Grammar for Assignment: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer | ID Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Build a parse tree abc = x + y 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Grammar
Levels of Syntax Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form. E.g., 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

So Expression Grammar For the following grammar: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer Concrete Syntax Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Lexical Syntax 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Grammar Simplest; least powerful Concentrate on the lexical syntax Right regular grammar:   T*, B  N, a  T A →  B A → ε (ε is an empty string) or A → A → a 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Grammar Left regular grammar:   T*, B  N, a  T A → B  A → ε A → a A regular grammar is either a left regular grammar or right regular grammar Consider the following grammar: S → aA A → Sb S → ε 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Grammars Equivalent to: Regular expression Finite-state automaton Used in construction of tokenizers Less powerful than context-free grammars Not a regular language { aⁿ bⁿ | n ≥ 1 } and { am bⁿ | 1 ≤m≤n } i.e., cannot balance: ( ), { }, begin end 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Compilers & Interpreters
Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens Source Program Machine Code Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Find syntax errors Find semantic errors 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Purpose: transform program representation Input: printable Ascii characters Output: tokens (Terminals T) Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol. A token is corresponding to a Terminal Symbol in CFG 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Example Tokens Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char ... Operators: + - * / ... Punctuation: ; , ( ) { } 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file All of the above languages can be defined by the CFG Grammar. 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Expressions RegExpr Meaning x a character x \x an escaped character, e.g., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits Any single character 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Clite Lexical Syntax Category Definition anyChar [ -~] Letter [a-zA-Z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Category Definition Identifier {Letter}({Letter} | {Digit})* integerLit {Digit}+ floatLit {Digit}+\.{Digit}+ charLit ‘{anyChar}’ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Category Definition Operator = | || | && | == | != | < | <= | >| >= | + | - | * | / |! | [ | ] Separator : | . | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol} 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Chomsky Hierarchy Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Context-free Grammars
BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Context-Sensitive Grammars
Production: α → β |α| ≤ |β| α, β  (N  T)* ie, lefthand side can be composed of strings of terminals and nonterminals 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Regular Expression Exercise
Describe the languages denoted by the following REs 0(0|1)*0 ((|0)1*)* (0|1)*0(0|1)(0|1) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Consider a small language using only the letter “z” and “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+,  Valid: /o/zzzz/oo/, /ozz/oz////o/ Invalid: /o/, /ozzzooo/zzzo/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Consider a small language using only the letters “z”, “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+,  /o(o*z|/)*o+/ /o(o|z|/)*o/ /o/*(o*z/*)*o+/ /o(/|oz|oo)*o+/ /o(/*o*z)*/*o+/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

All Strings of 0’s and 1’s to satisfy the following condition all binary strings except empty string contains at least three 1s does not contain the substring 110 length is at least 1 and at most 3 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar

Lecture 4: Lexical Analysis & Chomsky Hierarchy

Similar presentations

Presentation on theme: "Lecture 4: Lexical Analysis & Chomsky Hierarchy"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 4: Lexical Analysis & Chomsky Hierarchy

Similar presentations

Presentation on theme: "Lecture 4: Lexical Analysis & Chomsky Hierarchy"— Presentation transcript:

Similar presentations

About project

Feedback