Download presentation
Presentation is loading. Please wait.
Published byἈράχνη Πρωτονοτάριος Modified over 6 years ago
1
Lecture 4: Lexical Analysis & Chomsky Hierarchy
(Revised based on the Tucker’s slides) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
2
Revisit Expression Grammar
Let us consider the following Grammar for Assignment: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer | ID Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Build a parse tree abc = x + y 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
3
Lecture 4: Lexical Analysis & Chomsky Grammar
Levels of Syntax Lexical syntax = all the basic symbols of the language (names, values, operators, etc.) Concrete syntax = rules for writing expressions, statements and programs. Abstract syntax = internal representation of the program, favoring content over form. E.g., 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
4
Lecture 4: Lexical Analysis & Chomsky Grammar
So Expression Grammar For the following grammar: Assignment -> ID ‘=‘ Exp Exp -> Exp + Term | Term Term -> Term * Integer | Integer Concrete Syntax Integer -> 0 | 1 | …| 9 | 0 Integer | 1 Integer | …| 9 Integer ID -> a | b | … | z | a ID | b ID | … | z ID Lexical Syntax 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
5
Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Grammar Simplest; least powerful Concentrate on the lexical syntax Right regular grammar: T*, B N, a T A → B A → ε (ε is an empty string) or A → A → a 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
6
Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Grammar Left regular grammar: T*, B N, a T A → B A → ε A → a A regular grammar is either a left regular grammar or right regular grammar Consider the following grammar: S → aA A → Sb S → ε 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
7
Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Grammars Equivalent to: Regular expression Finite-state automaton Used in construction of tokenizers Less powerful than context-free grammars Not a regular language { aⁿ bⁿ | n ≥ 1 } and { am bⁿ | 1 ≤m≤n } i.e., cannot balance: ( ), { }, begin end 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
8
Compilers & Interpreters
Intermediate Code (IC) Intermediate Code (IC) Abstract Syntax Tokens Source Program Machine Code Lexical Analyzer Syntactic Analyzer Semantic Analyzer Code Optimizer Code Generator Find syntax errors Find semantic errors 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
9
Lecture 4: Lexical Analysis & Chomsky Grammar
Purpose: transform program representation Input: printable Ascii characters Output: tokens (Terminals T) Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol. A token is corresponding to a Terminal Symbol in CFG 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
10
Lecture 4: Lexical Analysis & Chomsky Grammar
Example Tokens Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char ... Operators: + - * / ... Punctuation: ; , ( ) { } 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
11
Lecture 4: Lexical Analysis & Chomsky Grammar
Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file All of the above languages can be defined by the CFG Grammar. 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
12
Lecture 4: Lexical Analysis & Chomsky Grammar
Regular Expressions RegExpr Meaning x a character x \x an escaped character, e.g., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
13
Lecture 4: Lexical Analysis & Chomsky Grammar
RegExpr Meaning M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits Any single character 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
14
Lecture 4: Lexical Analysis & Chomsky Grammar
Clite Lexical Syntax Category Definition anyChar [ -~] Letter [a-zA-Z] Digit [0-9] Whitespace [ \t] Eol \n Eof \004 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
15
Lecture 4: Lexical Analysis & Chomsky Grammar
Category Definition Identifier {Letter}({Letter} | {Digit})* integerLit {Digit}+ floatLit {Digit}+\.{Digit}+ charLit ‘{anyChar}’ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
16
Lecture 4: Lexical Analysis & Chomsky Grammar
Category Definition Operator = | || | && | == | != | < | <= | >| >= | + | - | * | / |! | [ | ] Separator : | . | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol} 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
17
Lecture 4: Lexical Analysis & Chomsky Grammar
Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
18
Lecture 4: Lexical Analysis & Chomsky Grammar
Chomsky Hierarchy Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
19
Context-free Grammars
BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
20
Context-Sensitive Grammars
Production: α → β |α| ≤ |β| α, β (N T)* ie, lefthand side can be composed of strings of terminals and nonterminals 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
21
Regular Expression Exercise
Describe the languages denoted by the following REs 0(0|1)*0 ((|0)1*)* (0|1)*0(0|1)(0|1) 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
22
Regular Expression Exercise
Consider a small language using only the letter “z” and “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+, Valid: /o/zzzz/oo/, /ozz/oz////o/ Invalid: /o/, /ozzzooo/zzzo/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
23
Regular Expression Exercise
Consider a small language using only the letters “z”, “o”, and the slash char “/”. A comment in this language start with “/o” and ends after the very next “o/”. Comments do not nest. (The regular notations that can be used are A|B, AB, A*, A+, /o(o*z|/)*o+/ /o(o|z|/)*o/ /o/*(o*z/*)*o+/ /o(/|oz|oo)*o+/ /o(/*o*z)*/*o+/ 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
24
Regular Expression Exercise
All Strings of 0’s and 1’s to satisfy the following condition all binary strings except empty string contains at least three 1s does not contain the substring 110 length is at least 1 and at most 3 12/10/2018 Lecture 4: Lexical Analysis & Chomsky Grammar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.