Lexical Analysis Why separate lexical and syntax analyses?

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

4b Lexical analysis Finite Automata
Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
2. Lexical Analysis Prof. O. Nierstrasz
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis Recognize tokens and ignore white spaces, comments
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Chapter 3 Lexical Analysis
Topic #3: Lexical Analysis
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Other Issues - § 3.9 – Not Discussed More advanced algorithm construction – regular expression to DFA directly.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Compiler Construction 2 주 강의 Lexical Analysis. “get next token” is a command sent from the parser to the lexical analyzer. On receipt of the command,
Lexical Analyzer in Perspective
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
By Neng-Fa Zhou Programming language syntax 4 Three aspects of languages –Syntax How are sentences formed? –Semantics What does a sentence mean? –Pragmatics.
CSc 453 Lexical Analysis (Scanning)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis.
1st Phase Lexical Analysis
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analyzer in Perspective
CS510 Compiler Lecture 2.
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Introduction to Lexical Analysis
Two issues in lexical analysis
Recognizer for a Language
פרק 3 ניתוח לקסיקאלי תורת הקומפילציה איתן אביאור.
Chapter 3: Lexical Analysis
Lexical Analysis and Lexical Analyzer Generators
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
4b Lexical analysis Finite Automata
Chapter 3. Lexical Analysis (2)
Other Issues - § 3.9 – Not Discussed
4b Lexical analysis Finite Automata
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Lexical Analysis Why separate lexical and syntax analyses? simpler design efficiency portability by Neng-Fa Zhou

Tokens, Patterns, Lexemes Terminal symbols in the grammar Patterns Description of a class of tokens Lexemes Words in the the source program by Neng-Fa Zhou

Languages Examples Terms on parts of a string Fixed and finite alphabet (vocabulary) Finite length sentences Possibly infinite number of sentences Examples Natural numbers {1,2,3,...10,11,...} Strings over {a,b} anban Terms on parts of a string prefix, suffix, substring, proper .... by Neng-Fa Zhou

Operations on Languages by Neng-Fa Zhou

Examples L = {A,B,...,Z,a,b,...,z} D = {0,1,...,9} L  D : the set of letters and digits LD : a letter followed by a digit L4 : four-letter strings L* : all strings of letters, including e L(L  D)* : strings of letters and digits beginning with a letter D+ : strings of one or more digits by Neng-Fa Zhou

Regular Expression(RE) e is a RE a symbol in S is a RE Let r and s be REs. (r) | (s) : or (r)(s) : concatenation (r)* : zero or more instances (r)+ : one or more instances (r)? : zero or one instance by Neng-Fa Zhou

Precedence of Operators all left associative Examples high S = {a,b} 1. a|b 2. (a|b)(a|b) 3. a* 4. (a|b)* 5. a| a*b r* r+ r? rs low r|s by Neng-Fa Zhou

Algebraic Properties of RE by Neng-Fa Zhou

Regular Definitions d1 r1 d2 r2 di is a RE over S  {d1,d2,...,di-1} .... dn rn not recursive by Neng-Fa Zhou

Example-1 %{ int num_lines = 0, num_chars = 0; %} %% \n ++num_lines; ++num_chars; . ++num_chars; main() { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } yywrap(){return 0;} by Neng-Fa Zhou

Example-2 D [0-9] INT {D}{D}* %% {INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);} . {printf("unrecognized %s\n",yytext);} int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } yywrap(){return 0;} by Neng-Fa Zhou

java.util.regex import java.util.regex.*; class Number { public static void main(String[] args){ String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?"; if (Pattern.matches(regExNum,args[0])) System.out.println("valid"); else System.out.println("invalid"); } by Neng-Fa Zhou

String Pattern Matching in Perl print "Input a string :"; $_ = <STDIN>; chomp($_); if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n"; } else { print "invalid\n"; } by Neng-Fa Zhou

Finite Automata Nondeterministic finite automaton (NFA) NFA = (S, , T, s0,F) S: a set of states : a set of symbols T: a transition mapping s0: the start state F: final states or accepting states by Neng-Fa Zhou

Example by Neng-Fa Zhou

Deterministic Finite Automata (DFA) T: a transition function There is only one arc going out from each node on each symbol. by Neng-Fa Zhou

Simulating a DFA s = s0; c = nextchar; while (c != eof) { s = move(s,c); if (s==error_s) break; } if (s is in F) return "yes"; else return "no"; by Neng-Fa Zhou

From RE to NFA e a in S s|t by Neng-Fa Zhou

From RE to NFA (cont.) st s* by Neng-Fa Zhou

Example (a|b)*a by Neng-Fa Zhou

Building Lexical Analyzer RE NFA DFA Algorithm 3.23 (Thompson's construction) Algorithm 3.32 (Subset construction) Emulator by Neng-Fa Zhou

Conversion of an NFA into a DFA Intuition move(s,a) is a function in a DFA move(s,a) is a mapping in a NFA NFA DFA A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string. by Neng-Fa Zhou

Computation of e-Closure e-Closure(T): The set of NFA states that are reachable from state in T by e-transitions alone. by Neng-Fa Zhou

From an NFA to a DFA (The subset construction) by Neng-Fa Zhou

Example NFA DFA by Neng-Fa Zhou

Algorithm 3.39 P = {F, S-F}; do begin P0=P; for each group G in P do begin partition G into subgroups such that two states s and t of G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group; replace G in P by the set of all subgroups formed; end if (P == P0) return;; end; by Neng-Fa Zhou

Example a b AC B AC B B D D B E E B AC by Neng-Fa Zhou

Construct a DFA Directly from a Regular Expression by Neng-Fa Zhou

Implementation Issues Input buffering Read in characters one by one Unable to look ahead Inefficient Read in a whole string and store it in memory Requires a big buffer Buffer pairs by Neng-Fa Zhou

Buffer Pairs by Neng-Fa Zhou

Use Sentinels by Neng-Fa Zhou

Lexical Analyzer by Neng-Fa Zhou

Lex A tool for automatically generating lexical analyzers by Neng-Fa Zhou

Lex Specifications declarations %% p1 {action1} translation rules auxiliary procedures p1 {action1} p2 {action2} ... pn {actionn} by Neng-Fa Zhou

Lex Regular Expressions by Neng-Fa Zhou

yylex() yylex(){ switch (pattern_match()){ case 1: {action1} ... case n: {actionn} } by Neng-Fa Zhou

Example DIGIT [0-9] ID [a-z][a-z0-9]* %% {DIGIT}+ {printf("An integer:%s(%d)\n",yytext,atoi(yytext));} {DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n",yytext,atof(yytext));} if|then|begin|end|procedure|function {printf("A keyword: %s\n",yytext);} {ID} {printf("An identifier %s\n",yytext);} "+"|"-"|"*"|"/" {printf("An operator %s\n",yytext);} "{"[^}\n]*"}" {/* eat up one-line comments */} [ \t\n]+ {/* eat up white space */} . {printf("Unrecognized character: %s\n", yytext);} int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } by Neng-Fa Zhou