Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.

Slides:



Advertisements
Similar presentations
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
CPSC Compiler Tutorial 9 Review of Compiler.
Yacc YACC BNF grammar example.y Other modules example.tab.c Executable
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Compiler Tools Lex/Yacc – Flex & Bison.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Introduction To Yacc and Semantics © Allan C. Milne Abertay University v
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 Using Yacc: Part II. 2 Main() ? How do I activate the parser generated by yacc in the main() –See mglyac.y.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
CPS 506 Comparative Programming Languages Syntax Specification.
Introduction to Lex Ying-Hung Jiang
Introduction to Yacc Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
Syntactic Analysis Tools
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.
LECTURE 7 Lex and Intro to Parsing. LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens)
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Elements of Computing Systems, Nisan & Schocken, MIT Press, Chapter 10: Compiler I: Syntax Analysis slide 1www.nand2tetris.org Building.
YACC (Yet Another Compiler-Compiler) Chung-Ju Wu
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Compiler Tools Lex/Yacc – Flex & Bison.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
TDDD55- Compilers and Interpreters Lesson 2
Bison: Parser Generator
Syntax Analysis Part III
Subject Name:Sysytem Software Subject Code: 10SCS52
Compiler Structures 3. Lex Objectives , Semester 2,
Compiler Structures 7. Yacc Objectives , Semester 2,
Compiler Design Yacc Example "Yet Another Compiler Compiler"
More on flex.
CMPE 152: Compiler Design December 4 Class Meeting
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Compiler Tools Lex/Yacc – Flex & Bison

Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words  Basic unit of syntax  x = x + y ; becomes The actual words are its lexeme Its part of speech (or syntactic category ) is called its token type Scanner discards white space & (often) comments Source code Scanner Intermediate Representation Parser Errors tokens Speed is an issue in scanning  use a specialized recognizer

The Front End (from Engineering a Compiler) Parser Checks stream of classified words (parts of speech) for grammatical correctness Determines if code is syntactically well-formed Guides checking at deeper levels than syntax Builds an IR representation of the code Parsing is harder than scanning. Better to put more rules in scanner (whitespace etc). Source code Scanner IR Parser Errors tokens

Flex – Fast Lexical Analyzer FLEX scanner (program to recognize patterns in text) regular expressions & C-code rules lex.yy.c contains yylex() compile executable – analyzes and executes input Here’s where we’ll put the regular expressions to good use! (Scanner generator)

Flex input file 3 sections definitions % rules % user code

Definition Section Examples name definition DIGIT [0-9] ID [a-z][a-z0-9]* A subsequent reference to {DIGIT}+"."{DIGIT}* is identical to: ([0-9])+"."([0-9])*

C Code Can include C-code in definitions %{ /* This is a comment inside the definition */ #include // may need headers #include // for printf in BB #include // for exit(0) in BB %}

Rules The rules section of the flex input contains a series of rules of the form: pattern action In the definitions and rules sections, any indented text or text enclosed in %{ and %} is copied verbatim to the output (with the %{ %}'s removed). The %{ %}'s must appear unindented on lines by themselves.

Example: Simple Pascal-like recognizer Definitions section: /* scanner for a toy Pascal-like language */ %{ /* need for the call to atof() below */ #include %} DIGIT [0-9] ID [a-z][a-z0-9]* Remember these are on a line by themselves, unindented! } Lines inserted as-is into resulting code } Definitions that can be used in rules section

Example continued Rules section: % {DIGIT}+ { printf("An integer: %s (%d)\n", yytext, atoi(yytext ));} {DIGIT}+"."{DIGIT}* {printf("A float: %s (%g)\n", yytext, atof(yytext));} if|then|begin|end|procedure|function {printf("A keyword: %s\n", yytext);} {ID} { printf( "An identifier: %s\n", yytext ); } "+"|"-"|"*"|"/" { printf( "An operator: %s\n", yytext ); } "{"[^}\n]*"}" /* eat up one-line comments */ [ \t\n]+ /* eat up whitespace */. { printf( "Unrecognized character: %s\n", yytext ); } pattern action text that matched the pattern (a char*)

Example continued User code (required for flex, in library for lex) % yywrap() {} // needed to link, unless libfl.a is available // OR put %option noyywrap at the top of a flex file. int main(int argc, char ** argv ) { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); } lexer function produced by lex lex input file

Lex techniques Hardcoding lists not very effective. Often use symbol table. Example in lec & yacc, not covered in class but see me if you’re interested.

Bison – like Yacc (yet another compiler compiler) Context-free Grammar in BNF form, LALR(1)* Bison Bison parser (c program) group tokens according to grammar rules Bison parser provides yyparse You must provide: the lexical analyzer (e.g., flex) an error-handling routine named yyerror a main routine that calls yyparse *LookAhead Left Recursive

Bison Parser Same sections as flex (yacc came first): definitions, rules, C-Code We’ll discuss rules first, then definitions and C-Code

Bison Parser – Rule Section Consider CFG -> ID = Would be written in bison “rules” section as: statement: NAME ‘=‘ expression | expression { printf("= %d\n", $1); } ; expression: NUMBER ‘+’ NUMBER { $$ = $1 + $3; } | NUMBER ‘-’ NUMBER { $$ = $1 + $3; } | NUMBER { $$ = $1; } ; Use : between lhs and rhs, place ; at end. What are $$? next slide… white space ; at end NOTE: The first rule in statement won’t be operational yet…

More on bison Rules and Actions $1, $3 refer to RHS values. $$ sets value of LHS. In expression, $$ = $1 + $3 means it sets the value of lhs (expression) to NUMBER ($1) + NUMBER ($3) A rule action is executed when the parser reduces that rule (will have recognized both NUMBER symbols) lexer should have returned a value via yylval (next slide) statement: NAME ‘=‘ expression | expression { printf("= %d\n", $1); } ; expression: NUMBER ‘+’ NUMBER { $$ = $1 + $3; } | NUMBER ‘-’ NUMBER { $$ = $1 - $3; } ; $$ $1 $2 $3 when is this executed?

Coordinating flex and bison Example to return int value: [0-9]+{ yylval = atoi(yytext); return NUMBER;} returns recognized token sets value for use in actions This one just returns the numeric value of the string stored in yytext atoi is C function to convert string to integer In prior flex examples we just returned tokens, not values Also need to skip whitespace, return symbols [ \t];/* ignore white space */ \nreturn 0;/* logical EOF */. return yytext[0];

Bison Rule Details Unlike flex, bison doesn’t care about line boundaries, so add white space for readability Symbol on lhs of first rule is start symbol, can override with %start declaration in definition section Symbols in bison have values, must be “declared” as some type  YYSTYPE determines type  Default for all values is int  We’ll be using different types for YYSTYPE in the SimpleCalc exercises

Bison Parser – Definition Section Definition Section  Tokens used in grammar should be defined. Example rule: expression: NUMBER ‘+’ NUMBER { $$ = $1 + $3; } The token NUMBER should be defined. Later we’ll see cases where expression should also be defined, and how to define tokens with other data types. %token must be lowercase, e.g.,: %token NUMBER  From the tokens that are defined, Bison will create an appropriate header file  Single quoted characters can be used as tokens without declaring them, e.g., ‘+’, ‘=‘ etc.

Lex - Definition Section Must include the header created by bison Must declare yylval as extern %{ #include "simpleCalc.tab.h extern int yylval; #include %}

Bison Parser – C Section At a minimum, provide yyerror and main routines yyerror(char *errmsg) { fprintf(stderr, "%s\n", errmsg); } main() { yyparse(); }

Bison Intro Exercise Download SimpleCalc.y, SimpleCalc.l and mbison.bat Create calculator executable  mbison simpleCalc FYI, mbison includes these steps:  bison -d simpleCalc.y  flex -L -osimpleCalc.c simpleCalc.l  gcc -c simpleCalc.c  gcc -c simpleCalc.tab.c  gcc -Lc:\progra~1\gnuwin32\lib simpleCalc.o simpleCalc.tab.o -osimpleCalc.exe -lfl –ly Test with valid sentences (e.g., 3+6-4) and invalid sentences.

Understanding simpleCalc %{ #include "simpleCalc.tab.h" extern int yylval; %} % [0-9]+{ yylval = atoi(yytext); return NUMBER; } [ \t]; /* ignore white space */ \nreturn 0;/* logical EOF */.return yytext[0]; % /* */ /* 5. Other C code that we need. */ yyerror(char *errmsg) { fprintf(stderr, "%s\n", errmsg); } main() { yyparse(); } #ifndef YYTOKENTYPE # define YYTOKENTYPE /* Put the tokens into the symbol table, so that GDB and other debuggers know about them. */ enum yytokentype { NAME = 258, NUMBER = 259 }; #endif /* Tokens. */ #define NAME 258 #define NUMBER 259 simpleCalc.tab.h simpleCalc.l Explanation: When the lexer recognizes a number [0-9]+ it returns the token NUMBER and sets yylval to the corresponding integer value. When the lexer sees a carriage return it returns 0. If it sees a space or tab it ignores it. When it sees any other character it returns that character (the first character in the yytext buffer). If the yyparse recognizes it – good! Otherwise the parser can generate an error.

Understanding simpleCalc, continued %token NAME NUMBER % statement:NAME '=' expression |expression{ printf("= %d\n", $1); } ; expression:expression '+' NUMBER { $$ = $1 + $3; } |expression '-' NUMBER{ $$ = $1 - $3; } |NUMBER{ $$ = $1; } ; Explanation Execute simpleCalc and enter expression 1+2 main program calls yyparse. This calls lex to recognize 1 as a NUMBER (puts 1 in yylval), sets $$ = $1 Calls lex which returns +, matches ‘+’ in first expression rhs Calls lex to recognize 2 as a NUMBER (puts 2 in yylval) Recognize expression + NUMBER and “reduce” this rule, does action {$$ = $1 + $3}. Recognizes expression as a statement, so it does the printf action.

Adding other variable types* YYSTYPE determines the data type of the values returned by the lexer. If lexer returns different types depending on what is read, include a union: %union { // C feature, allows one memory area to char cval; // be interpreted in different ways. char *sval; // For bison, will be used with yylval int ival; } The union will be placed at the top of your.y file (in the definitions section) Tokens and non-terminals should be defined using the union * relates to SimpleCalc exercise 2

Adding other variable types - Example Definitions in simpleCalc.y: %union { float fval; int ival; } %token NUMBER %token FNUMBER %type expression Use union in rules in simpleCalc.l: {DIGIT}+ { yylval.ival = atoi(yytext); return NUMBER;}

Processing lexemes in flex* Sometimes you want to modify a lexeme before it is passed to bison. This can be done by putting a function call in the flex rules Example: to convert input to lower case  put a prototype for your function in the definition section (above first %)  write the function definition in the C-code section (bottom of file)  call your function when the token is recognized. Use strdup to pass the value to bison. * relates to SimpleCalc exercise 3

Example continued %{ #include “example.tab.h“ void make_lower(char *text_in); %} % [a-zA-Z]+ {make_lower(yytext); yylval.sval = strdup(yytext); return KEYWORD; } % void make_lower(char *text_in) { int i; for (i=0; i<strlen(yytext); ++i) yytext[i]=tolower(yytext[i]); } need prototype here function code in C section function call to process text make duplicate using strdup return token type

Adding actions to rules * For more complex processing, functions can be added to bison. Remember to add a prototype at the top, and the function at the bottom * relates to SimpleCalc exercise 4

Processing more than one line * To process more than one line, ensure the \n is simply ignored Use a recursive rule to allow multiple inputs * relates to SimpleCalc exercise 4

Summary of steps (from online manual) The actual language-design process using Bison, from grammar specification to a working compiler or interpreter, has these parts: 1. Formally specify the grammar in a form recognized by Bison (i.e., machine-readable BNF). For each grammatical rule in the language, describe the action that is to be taken when an instance of that rule is recognized. The action is described by a sequence of C statements. 2. Write a lexical analyzer to process input and pass tokens to the parser. 3. Write a controlling function (main) that calls the Bison-produced parser. 4. Write error-reporting routines.