Systems Programming & Operating Systems Unit – III International Institute of Information Technology, Pune Department of Computer Engineering Systems Programming & Operating Systems Unit – III Case Study: Overview of LEX and YACC
Prof. Deptii Chaudhari, I2IT, Pune LEX & YACC What is Lex? Lex is officially known as a "Lexical Analyser". It's main job is to break up an input stream into more usable elements. Or in, other words, to identify the "interesting bits" in a text file. What is Yacc? Yacc is officially known as a "parser". In the course of it's normal work, the parser also verifies that the input is syntactically sound. YACC stands for "Yet Another Compiler Compiler". This is because this kind of analysis of text files is normally associated with writing compilers. Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune
LEX Program Structure %{ C global variables, prototype, Comments %} Definitions Production Rules %% ------------------------------------%% User Subroutine Section (Optional) Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune In the rules section, each rule is made up of two parts : a pattern and an action separated by whitespace. The lexer that lex generates will execute the action when it recognizes the pattern. The user subroutine section, consists of any legal C code. Lex copies it to the C file after the end of the lex generated code. Lex translates the Lex specification into C source file called lex.yy.c which we compile and link with lex library –ll. Then we can execute the resulting program to check that it works as we expected. Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune Example %{ #include <stdio.h> %} %% [0123456789]+ printf("NUMBER\n"); [a-zA-Z][a-zA-Z0-9]* printf("WORD\n"); Running the Program $ lex example_lex.l gcc lex.yy.c –ll ./a.out Prof. Deptii Chaudhari, I2IT, Pune
Pattern Matching Primitives Metacharacter Matches . any character except newline \n newline * zero or more copies of the preceding expression + one or more copies of the preceding expression ? zero or one copy of the preceding expression ^ beginning of line $ end of line a|b a or b (ab)+ one or more copies of ab (grouping) "a+b" literal "a+b" (C escapes still work) [] character class Prof. Deptii Chaudhari, I2IT, Pune
Pattern Matching Examples Expression Matches abc abc* ab abc abcc abccc ... abc+ abc, abcc, abccc, abcccc, ... a(bc)+ abc, abcbc, abcbcbc, ... a(bc)? a, abc [abc] one of: a, b, c [a-z] any letter, a through z [a\-z] one of: a, -, z [-az] one of: - a z [A-Za-z0-9]+ one or more alphanumeric characters [ \t\n]+ whitespace [^ab] anything except: a, b [a^b] a, ^, b [a|b] a, |, b a|b a, b Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune Operation of yylex() When lex compiles the input specification, it generates the C file lex.yy.c that contains the routine int yylex(void). This routine reads the input string trying to match it with any of the token patterns specified in the rules section. On a match associated action is executed. When we call yylex() function, it starts the process of pattern matching. Lex keeps the matched string into the address pointed by pointer yytext. Matched string's length is kept in yyleng while value of token is kept in variable yylval. Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune $ cc lex.yy.c -ll $ ./a.out Write a C program #include<stdio.h> int main() { int a, b; /*float c;*/ printf(“Hi”); /*printf(“Hello”);*/ } Comment=2 $ cat output %{ int com=0; %} %% "/*"[^\n]+"*/" {com++;fprintf(yyout, " ");} int main() { printf("Write a C program\n"); yyout=fopen("output", "w"); yylex(); printf("Comment=%d\n",com); return 0; } Prof. Deptii Chaudhari, I2IT, Pune
Lex Predefined Variables Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune YACC YACC is a parser generator that takes an input file with an attribute-enriched BNF (Backus – Naur Form) grammar specification. It generates the output C file y.tab.c containing the function int yyparse(void) that implements its parser. This function automatically invokes yylex() everytime it needs a token to continue parsing. Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune
Structure of YACC Program %{ C global variables, prototype, Comments %} Definitions Context free grammar & action for each production %% ------------------------------------%% Subroutines/Functions Prof. Deptii Chaudhari, I2IT, Pune
Prof. Deptii Chaudhari, I2IT, Pune Arithmatic.l %{ #include<stdio.h> #include "y.tab.h" extern int yylval; %} %% [0-9]+ { yylval=atoi(yytext); return NUMBER; } [\t] ; [\n] return 0; . return yytext[0]; %% int yywrap() { return 1;} How To Run: $yacc -d arithmatic.y $lex arithmatic.l $gcc lex.yy.c y.tab.c $./a.out Prof. Deptii Chaudhari, I2IT, Pune