1 Lex
2 Lex is a lexical analyzer Var = ; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn: ; Keyword: if Paren: ( Ident: test Oper: >.... Input Output
3 For each kind of strings there is a regular expression “if” “then” “+” “-” “=“ /* operators */ /* keywords */ Lex Regular expressions
4 (0|1|2|3|4|5|6|7|8|9)+ /* integers */ /* identifiers */ Lex Regular expressions (a|b|..|z|A|B|...|Z)+
5 integers [0-9]+(0|1|2|3|4|5|6|7|8|9)+
6 (a|b|..|z|A|B|...|Z)+ [a-zA-Z]+ identifiers
7 Each regular expression has an action: Examples: \n Regular expressionAction linenum++ [a-zA-Z]+ printf(“identifier”); [0-9]+ prinf(“integer”);
8 Default action: ECHO; Print the string identified to the output
9 A small program % [a-zA-Z]+printf(“Identifier\n”); [0-9]+prinf(“Integer\n”); [ \t\n] ; /*skip spaces*/
test var Input Output Integer Identifier Integer
11 % [a-zA-Z]+ printf(“Identifier\n”); [0-9]+ prinf(“Integer\n”); [ \t] ; /*skip spaces*/. printf(“Error in line: %d\n”, linenum); Another program %{ int linenum = 1; %} \nlinenum++;
test var temp Input Output Integer Identifier Integer Error in line 3 Identifier
13 Lex matches the longest input string “if” “ifend” Regular Expressions Input: ifend if ifn Matches: “ifend” “if” nomatch
14 Internal Structure of Lex Lex Regular expressions NFADFA Minimal DFA The final states of the DFA are associated with actions
15 Compilers
16 Compiler Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v ELSE: WHILE: cmp x,3... Machine Code
17 Lexical analyzer parser Compiler program machine code
18 Parser knows the grammar of the programming language
19 Parser PROGRAM -> STMT_LIST STMT_LIST -> STMT STMT_LIST | STMT; STMT -> EXPR ; | IF_STMT | WHILE_STMT | { STMT_LIST } EXPR -> EXPR + EXPR | EXPR - EXPR | ID IF_STMT -> if (EXPR) then STMT | if (EXPR) then STMT else STMT WHILE_STMT-> while (EXPR) do STMT
20 The parser constructs the derivation for the particular input program * 5 Parser E -> E + E | E * E | INT E => E + E => E + E * E => 10 + E*E => * E => * 5 input derivation
21 10 E 25 E => E + E => E + E * E => 10 + E*E => * E => * 5 derivation derivation tree EE EE + *
22 10 E 25 derivation tree EE EE + * mult t1, 10, 5 add t2, 10, t1 machine code
23 Parsing
24 grammar Parser input string derivation
25 Example: Parser derivation input ?
26 Exhaustive Search Phase 1:
27
28 Phase 2 Phase 1
29 Phase 2 Phase 1
30 Phase 2 Phase 3
31 Final result of exhaustive search Parser derivation input (Top-down parsing)
32 Time complexity of exhaustive search Suppose there are no productions of the form Number of phases for string :
33 Time for phase 1: possible derivations For grammar with rules
34 Time for phase 2: possible derivations
35 Time for phase : possible derivations
36 Total time needed for string : Extremely bad!!!
37 There exist faster algorithms for specialized grammars S-grammar: symbol string of variables appears once
38 S-grammar example: Each string has a unique derivation
39 In the exhaustive search parsing there is only one choice in each phase For S-grammars: Total time for parsing string : Time for a phase: 1
40 For general context-free grammars: There exists a parsing algorithm that parses a string in time