Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.

Similar presentations


Presentation on theme: "1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code."— Presentation transcript:

1 1 Lex & Yacc

2 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code

3 3

4 4 Lexical Analyzer (Scanner, Lexer) A pattern matcher Extracts lexeme in a given string Produces corresponding token Detects errors in token Front-end of a syntax analyzer Serves as a finite state automata

5 5 Syntax Analyzer (Parser) Checks the syntactic correctness of input string Produces a complete parse tree or trace structure of it In case of error displays message and tries recover to detect as many errors as it can

6 6 LEX A tool for automatically generating a lexer or scanner given a lex specification (.l file) A lexer or scanner is used to perform lexical analysis, or the breaking up of an input stream into meaningful units, or tokens. Takes a set of descriptions of possible tokens (i.p. Regular expressions)

7 7 Skeleton of a lex specification (.l file) x.l %{ %} [DEFINITION SECTION] % [RULES SECTION] % C auxiliary subroutines lex.yy.c is generated after running > lex x.l This part will be embedded into lex.yy.c substitutions, code and start states; will be copied into lex.yy.c define how to scan and what action to take for each token any user code. For example, a main function to call the scanning function yylex().

8 8 The rules section % [RULES SECTION] { } … % Patterns are specified by regular expressions. For example: % [A-Za-z]*{ printf(“this is a word”); } %

9 9 The rule of lex specification file { corresponding actions } … … … Rule section is list of rules [1-9][0-9]* { yylval = atoi (yytext); return NUMBER; } [1-9][0-9]* { yylval = atoi (yytext); return NUMBER; } Pattern in regular expr form Actions are C statements

10 10

11 11

12 12 Lex Reg Exp (cont) x|yx or y {i}definition of i x/yx, only if followed by y (y not removed from input) x{m,n}m to n occurrences of x  xx, but only at beginning of line x$x, but only at end of line "s"exactly what is in the quotes (except for "\" and following character) A regular expression finishes with a space, tab or newline

13 13 Meta-characters –meta-characters (do not match themselves, because they are used in the preceding reg exps): ( ) [ ] { } + /, ^ * |. \ " $ ? - % –to match a meta-character, prefix with "\" –to match a backslash, tab or newline, use \\, \t, or \n

14 14 Two Rules 1.lex will always match the longest (number of characters) token possible. 2. If two or more possible tokens are of the same length, then the token with the regular expression that is defined first in the lex specification is favored.

15 15 LEX Rules Disambiguation “ali” “aliye”“[a-zA-Z]+” Rules defined before have precedence over rules defined after An input is matched with at most one pattern Action for the longest possible match among the patterns executed

16 16

17 17 LEX: A Simple Example %{/* firstlexer.l : Our First Lexer */ #include }% % [\t ]+;/* Ignore whitespace */ like|need|love|care{printf(“%s: verb\n”, yytext); } [a-zA-Z]+{ printf(“%s: yet not defined\n”, yytex); }.|\n{ ECHO; } % main() { yylex(); }

18 18 LEX Compilation >lex firstlexer.l# Generates lex.yy.c >cc lex.yy.c -lfl

19 19 LEX Built-in Functions & Variables yymore() –append next string matched to current contents of yytext yyless(n) –remove from yytext all but the first n characters unput(c) –return character c to input stream yywrap() –may be replaced by user –The yywrap method is called by the lexical analyser whenever it inputs an EOF as the first character when trying to match a regular expression

20 20 LEX Built-in Functions & Variables yytext –where text matched most recently is stored yyleng –number of characters in text most recently matched yylval –associated value of current token yyin - points current file parsed by the lexer yyout - points file that output of the lexer will be written


Download ppt "1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code."

Similar presentations


Ads by Google