CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat
CS 404Ahmed Ezzat 2 Lex Introduction A lexical analyzer generator A Unix utility Generates C code Takes regular expressions for patterns Takes additional C code Generates a combined NFA, then converts to DFA to recognize all patterns
CS 404Ahmed Ezzat 3 Using Lex Specifications Lex specifications is often put in a file *.l To compile, use lex spec_file.l (creates lex.yy.c) cc –o scan lex.yy.c –ll (compile and link with lex lib) scan (run this program)
CS 404Ahmed Ezzat 4 Lex Input Specification File % { Declarations /* optional, copied to the C program */ %} Definitions /* optional, can give names to R.E.s */ % Rules /* required, regular expressions and actions */ % User subroutines /* optional, copied to the C program */
CS 404Ahmed Ezzat 5 Yacc Introduction A parser generator Stands for “yet another compiler compiler” Generate C code Use in conjunction with lex Use shift-reduce parsing algorithms (LALR)
CS 404Ahmed Ezzat 6 Using Yacc Specifications Lex specifications is often put in a file *.y To compile, use yacc spec_file.y (creates y.tab.c, maybe y.tab.h) gcc –o parse y.tab.o lex.yy.o –ll -ly(compile and link with libraries) parse (run this program)
CS 404Ahmed Ezzat 7 Yacc Input Specification File % { Declarations /* optional, copied to the C program */ %} Definitions /* optional, can give names to R.E.s */ % Productions /* required, CFG rules */ % User subroutines /* optional, copied to the C program */
CS 404Ahmed Ezzat 8 Parser-Lexer Communications Define the same set of constant in lex and yacc programs (same tokens, can be defined in either place) – For example, parser program defines token IDENTIFIER and lexer include “y.tab.h” They may share the same symbol table
CS 404Ahmed Ezzat 9 Global Variables yytext – the text that just matched a pattern Yyleng – the length of the lexeme stored in yytext yylval – value associated with a token, returned from lexer to parser. It is of UNION type: – E.g., yylval.token = 100 – E.g., yylcal.string = yytext yylloc – the location (line and column) of a token
CS 404Ahmed Ezzat 10 Global Functions yylex() – The lexer C routine produced by lex – May return the token just found – May be called by the parser yyparse() – The parser function – Returns 0 if successful, 1 if failed
CS 404Ahmed Ezzat 11 Syntax Directed Translation Perform analysis and translation while parsing – Generate intermediate code – Save information in the symbol table – Issue error message May use a separate pass, but parse trees are often big and traverse it takes time
CS 404Ahmed Ezzat 12 How to Perform Syntax Directed Parsing Associate “attributes” to symbols in CFG One symbol may have many attributes – Integer value, e.g, 10 – String value, e.g, “island” – Variable type, e.g., int – Function return type, e.g., boolean – Function number of args Compute attribute value from existing values using “semantic rules”
CS 404Ahmed Ezzat 13 Synthesized and Inherited Attributes Synthesized attribute: the value is computed from the values of the children – E.g., values of terminals – E.g., values of arithmetic expressions Inherited attribute: the value is computed from the values of the siblings or parent – E.g., type
CS 404Ahmed Ezzat 14 More Concepts A side effect is a change that cannot be reversed – E.g., printing a value – E.g., updating a global variable A syntax-directed definition is a CFG with semantic rules An attribute grammar is a syntax-directed definition in which functions in semantic rules has no side effect An annotated parse tree is a parse tree with the values of the attributes
CS 404Ahmed Ezzat 15 How to Evaluate the Value of an Attribute Need to traverse the tree in the order that preserves dependency For synthesized attributes only, may use depth first traversals For more complex dependencies, use dependency graph (DAG: directed acyclic graph)