Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability.

Similar presentations


Presentation on theme: "By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability."— Presentation transcript:

1 by Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability

2 by Neng-Fa Zhou Tokens, Patterns, Lexemes –Tokens Terminal symbols in the grammar –Patterns Description of a class of tokens –Lexemes Words in the the source program

3 by Neng-Fa Zhou Languages –Fixed and finite alphabet (vocabulary) –Finite length sentences –Possibly infinite number of sentences 4 Examples –Natural numbers {1,2,3,...10,11,...} –Strings over {a,b} a n ba n 4 Terms on parts of a string –prefix, suffix, substring, proper....

4 by Neng-Fa Zhou Operations on Languages

5 by Neng-Fa Zhou Examples L = {A,B,...,Z,a,b,...,z} D = {0,1,...,9} L  D: the set of letters and digits LD: a letter followed by a digit L 4 : four-letter strings L*: all strings of letters, including  L(L  D)* : strings of letters and digits beginning with a letter D+: strings of one or more digits

6 by Neng-Fa Zhou Regular Expression(RE)   is a RE  a symbol in  is a RE 4 Let r and s be REs. –(r) | (s): or –(r)(s): concatenation –(r) * : zero or more instances –(r) + : one or more instances –(r)?: zero or one instance

7 by Neng-Fa Zhou Precedence of Operators high low r* r + r? rs r|s all left associative 4 Examples  = {a,b} 1. a|b 2. (a|b)(a|b) 3. a* 4. (a|b)* 5. a| a*b

8 by Neng-Fa Zhou Algebraic Properties of RE

9 by Neng-Fa Zhou d 1 r 1 d 2 r 2 d n r n.... d i is a RE over  {d 1,d 2,...,d i-1 } Regular Definitions not recursive

10 by Neng-Fa Zhou Examples 4 Identifiers 4 Decimal integers in Java 4 Hexadecimal integers letter -> A | B |... | Z | a | b |... | z digit -> 0 | 1 |... | 9 id -> letter ( letter | digit )* DecimalNumeral -> 0 | nonZeroDigit digit* HexaNumeral -> (0x | 0X) hexadigit*

11 Example-1 by Neng-Fa Zhou %{ int num_lines = 0, num_chars = 0; %} % \n ++num_lines; ++num_chars;. ++num_chars; % main() { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); } yywrap(){return 0;}

12 by Neng-Fa Zhou Example-2 D [0-9] INT {D}{D}* % {INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);}. {printf("unrecognized %s\n",yytext);} % int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); } yywrap(){return 0;}

13 java.util.regex by Neng-Fa Zhou import java.util.regex.*; class Number { public static void main(String[] args){ String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?"; if (Pattern.matches(regExNum,args[0])) System.out.println("valid"); else System.out.println("invalid"); }

14 String Pattern Matching in Perl by Neng-Fa Zhou print "Input a string :"; $_ = ; chomp($_); if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n"; } else { print "invalid\n"; }

15 by Neng-Fa Zhou Finite Automata 4 Nondeterministic finite automaton (NFA) NFA = (S,T,s 0,F) –S: a set of states –T: a transition mapping –s 0 : the start state –F: final states or accepting states

16 by Neng-Fa Zhou Example

17 by Neng-Fa Zhou Deterministic Finite Automata (DFA) T: a transition function There is only one arc going out from each node on each symbol.

18 by Neng-Fa Zhou Simulating a DFA s = s0; c = nextchar; while (c != eof) { s = move(s,c); c = nextchar; } if (s is in F) return "yes"; else return "no";

19 by Neng-Fa Zhou From RE to NFA –  –a in  –s|t

20 by Neng-Fa Zhou From RE to NFA (cont.) –st –s*

21 by Neng-Fa Zhou Example (a|b)*a

22 by Neng-Fa Zhou Building Lexical Analyzer RENFADFA Emulator Algorithm 3.23 (Thompson's construction) Algorithm 3.32 (Subset construction)

23 by Neng-Fa Zhou Conversion of an NFA into a DFA 4 Intuition –move(s,a) is a function in a DFA –move(s,a) is a mapping in a NFA NFA DFA A state reachable from s0 in the DFA on an input string corresponds to a set of states in NFA that are reachable on the same string.

24 by Neng-Fa Zhou Computation of  -Closure  -Closure(T): Set of NFA states reachable from some NFA state s in T by  transition alone.

25 by Neng-Fa Zhou From an NFA to a DFA (The subset construction)

26 by Neng-Fa Zhou Example NFA DFA

27 by Neng-Fa Zhou Algorithm 3.39  F, S-F}; do begin    for each group G in  do begin partition G into subgroups such that two states s and t of G are in the same subgroup iff for all input symbols a, s and t have transitions on a to states in the same group; replace G in  by the set of all subgroups formed; end if (   ) return; ; end;

28 by Neng-Fa Zhou Example ab ACBAC BBD DBE EBAC

29 Construct a DFA Directly from a Regular Expression by Neng-Fa Zhou

30 Implementation Issues 4 Input buffering –Read in characters one by one Unable to look ahead Inefficient –Read in a whole string and store it in memory Requires a big buffer –Buffer pairs

31 by Neng-Fa Zhou Buffer Pairs

32 by Neng-Fa Zhou Use Sentinels

33 by Neng-Fa Zhou Lexical Analyzer

34 by Neng-Fa Zhou Lex 4 A tool for automatically generating lexical analyzers

35 by Neng-Fa Zhou Lex Specifications declarations % translation rules % auxiliary procedures p 1 {action 1 } p 2 {action 2 }... p n {action n }

36 by Neng-Fa Zhou Lex Regular Expressions

37 by Neng-Fa Zhou yylex() yylex(){ switch (pattern_match()){ case 1:{action 1 } case 2:{action 2 }... case n:{action n } }

38 by Neng-Fa Zhou Example DIGIT [0-9] ID[a-z][a-z0-9]* % {DIGIT}+{printf("An integer:%s(%d)\n",yytext,atoi(yytext));} {DIGIT}+"."{DIGIT}*{printf("A float: %s (%g)\n",yytext,atof(yytext));} if|then|begin|end|procedure|function{printf("A keyword: %s\n",yytext);} {ID}{printf("An identifier %s\n",yytext);} "+"|"-"|"*"|"/"{printf("An operator %s\n",yytext);} "{"[^}\n]*"}" {/* eat up one-line comments */} [ \t\n]+ {/* eat up white space */}. {printf("Unrecognized character: %s\n", yytext);} % int main(int argc, char *argv[]){ ++argv, --argc; if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin; yylex(); }


Download ppt "By Neng-Fa Zhou Lexical Analysis 4 Why separate lexical and syntax analyses? –simpler design –efficiency –portability."

Similar presentations


Ads by Google