Download presentation
Presentation is loading. Please wait.
Published byΕυτύχιος Ἀδάμ Γερμανού Modified over 6 years ago
1
Lexical Analysis Why separate lexical and syntax analyses?
simpler design efficiency portability by Neng-Fa Zhou
2
Tokens, Patterns, Lexemes
Terminal symbols in the grammar Patterns Description of a class of tokens Lexemes Words in the the source program by Neng-Fa Zhou
3
Languages Examples Terms on parts of a string
Fixed and finite alphabet (vocabulary) Finite length sentences Possibly infinite number of sentences Examples Natural numbers {1,2,3,...10,11,...} Strings over {a,b} anban Terms on parts of a string prefix, suffix, substring, proper .... by Neng-Fa Zhou
4
Operations on Languages
by Neng-Fa Zhou
5
Examples L = {A,B,...,Z,a,b,...,z} D = {0,1,...,9}
L D : the set of letters and digits LD : a letter followed by a digit L4 : four-letter strings L* : all strings of letters, including e L(L D)* : strings of letters and digits beginning with a letter D+ : strings of one or more digits by Neng-Fa Zhou
6
Regular Expression(RE)
e is a RE a symbol in S is a RE Let r and s be REs. (r) | (s) : or (r)(s) : concatenation (r)* : zero or more instances (r)+ : one or more instances (r)? : zero or one instance by Neng-Fa Zhou
7
Precedence of Operators
all left associative Examples high S = {a,b} 1. a|b 2. (a|b)(a|b) 3. a* 4. (a|b)* 5. a| a*b r* r+ r? rs low r|s by Neng-Fa Zhou
8
Algebraic Properties of RE
by Neng-Fa Zhou
9
Regular Definitions d1 r1 d2 r2 di is a RE over S {d1,d2,...,di-1}
.... dn rn not recursive by Neng-Fa Zhou
10
Example-1 %{ int num_lines = 0, num_chars = 0; %} %%
\n num_lines; ++num_chars; num_chars; main() { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } yywrap(){return 0;} by Neng-Fa Zhou
11
Example-2 D [0-9] INT {D}{D}* %%
{INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);} {printf("unrecognized %s\n",yytext);} int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } yywrap(){return 0;} by Neng-Fa Zhou
12
java.util.regex import java.util.regex.*; class Number {
public static void main(String[] args){ String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?"; if (Pattern.matches(regExNum,args[0])) System.out.println("valid"); else System.out.println("invalid"); } by Neng-Fa Zhou
13
String Pattern Matching in Perl
print "Input a string :"; $_ = <STDIN>; chomp($_); if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n"; } else { print "invalid\n"; } by Neng-Fa Zhou
14
Finite Automata Nondeterministic finite automaton (NFA) NFA = (S, , T, s0,F) S: a set of states : a set of symbols T: a transition mapping s0: the start state F: final states or accepting states by Neng-Fa Zhou
15
Example by Neng-Fa Zhou
16
Deterministic Finite Automata (DFA)
T: a transition function There is only one arc going out from each node on each symbol. by Neng-Fa Zhou
17
Simulating a DFA s = s0; c = nextchar; while (c != eof) {
s = move(s,c); if (s==error_s) break; } if (s is in F) return "yes"; else return "no"; by Neng-Fa Zhou
18
From RE to NFA e a in S s|t by Neng-Fa Zhou
19
From RE to NFA (cont.) st s* by Neng-Fa Zhou
20
Example (a|b)*a by Neng-Fa Zhou
21
Building Lexical Analyzer
RE NFA DFA Algorithm 3.23 (Thompson's construction) Algorithm 3.32 (Subset construction) Emulator by Neng-Fa Zhou
22
Conversion of an NFA into a DFA
Intuition move(s,a) is a function in a DFA move(s,a) is a mapping in a NFA NFA DFA A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string. by Neng-Fa Zhou
23
Computation of e-Closure
e-Closure(T): The set of NFA states that are reachable from state in T by e-transitions alone. by Neng-Fa Zhou
24
From an NFA to a DFA (The subset construction)
by Neng-Fa Zhou
25
Example NFA DFA by Neng-Fa Zhou
26
Algorithm 3.39 P = {F, S-F}; do begin P0=P;
for each group G in P do begin partition G into subgroups such that two states s and t of G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group; replace G in P by the set of all subgroups formed; end if (P == P0) return;; end; by Neng-Fa Zhou
27
Example a b AC B AC B B D D B E E B AC by Neng-Fa Zhou
28
Construct a DFA Directly from a Regular Expression
by Neng-Fa Zhou
29
Implementation Issues
Input buffering Read in characters one by one Unable to look ahead Inefficient Read in a whole string and store it in memory Requires a big buffer Buffer pairs by Neng-Fa Zhou
30
Buffer Pairs by Neng-Fa Zhou
31
Use Sentinels by Neng-Fa Zhou
32
Lexical Analyzer by Neng-Fa Zhou
33
Lex A tool for automatically generating lexical analyzers
by Neng-Fa Zhou
34
Lex Specifications declarations %% p1 {action1} translation rules
auxiliary procedures p1 {action1} p2 {action2} ... pn {actionn} by Neng-Fa Zhou
35
Lex Regular Expressions
by Neng-Fa Zhou
36
yylex() yylex(){ switch (pattern_match()){ case 1: {action1}
... case n: {actionn} } by Neng-Fa Zhou
37
Example DIGIT [0-9] ID [a-z][a-z0-9]* %%
{DIGIT}+ {printf("An integer:%s(%d)\n",yytext,atoi(yytext));} {DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n",yytext,atof(yytext));} if|then|begin|end|procedure|function {printf("A keyword: %s\n",yytext);} {ID} {printf("An identifier %s\n",yytext);} "+"|"-"|"*"|"/" {printf("An operator %s\n",yytext);} "{"[^}\n]*"}" {/* eat up one-line comments */} [ \t\n] {/* eat up white space */} {printf("Unrecognized character: %s\n", yytext);} int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } by Neng-Fa Zhou
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.