9-December-2002cse413-23-Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages

Slides:



Advertisements
Similar presentations
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Advertisements

Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Yacc YACC BNF grammar example.y Other modules example.tab.c Executable
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Lecture 2: Lexical Analysis CS 540 George Mason University.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
LEX and YACC work as a team
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.
Lexical Analysis – Part I EECS 483 – Lecture 2 University of Michigan Monday, September 11, 2006.
Introduction to Lex Ying-Hung Jiang
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
Introduction to Yacc Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
Yacc. Yacc 2 Yacc takes a description of a grammar as its input and generates the table and code for a LALR parser. Input specification file is in 3 parts.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
2-1. LEX & YACC. 2 Overview  Syntax  What its program looks like –Context-free grammar, BNF  Syntax-directed translation –A grammar-oriented compiling.
1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
YACC SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
Overview of Compilation The Compiler Front End
Compiler Construction
TDDD55- Compilers and Interpreters Lesson 2
Compiler Lecture 1 CS510.
Bison: Parser Generator
Syntax Analysis Part III
Compiler Lecture Note, Miscellaneous
Compiler Structures 7. Yacc Objectives , Semester 2,
More on flex.
Systems Programming & Operating Systems Unit – III
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages

9-December-2002cse Tools © 2002 University of Washington2 References »Modern Compiler Implementation in Java, Appel »lex & yacc, Levine, Mason, Brown »The Lex & Yacc Page (C) »GNU flex and bison (C) »JLex and CUP (Java)

9-December-2002cse Tools © 2002 University of Washington3 Structure of a Compiler First approximation »Front end: analysis Read source program and understand its structure and meaning »Back end: synthesis Generate equivalent target language program SourceTarget Front EndBack End

Front End Split into two parts »Scanner: Responsible for converting character stream to token stream Also strips out white space, comments »Parser: Reads token stream; generates IR Both of these can be generated automatically or by hand »Source language specified by a formal grammar »Tools read the grammar and generate scanner & parser (either table-driven or hard coded) ScannerParser sourcetokensIR

9-December-2002cse Tools © 2002 University of Washington5 Lex and Yacc ScannerParser source tokens parse tree Lexical analyzer generated by Lex rest of the application Parser generated by Yacc yylex()yyparse() Scanner source data structures rest of the application yylex()

9-December-2002cse Tools © 2002 University of Washington6 Lex Lex is a lexical analyzer generator Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream »editor-script type transformations »segmenting input in preparation for a parsing routine, ie, tokenizing You define the scanner by providing the patterns to recognize and the actions to take

9-December-2002cse Tools © 2002 University of Washington7 Lex Input Declarations - optional user supplied header code Definitions - optional definitions to simplify the Rules section Rules - token definition patterns and associated actions in C User subroutines - optional helper functions %{ Declarations %} Definitions % Rules % User Subroutines

9-December-2002cse Tools © 2002 University of Washington8 Lex Output Generates a scanner function written in C »the file is lex.yy.c »the function is yylex() »yylex() implements the DFA that corresponds to the regular expressions you supplied using: transition table for the DFA action code invoked at the accept states

9-December-2002cse Tools © 2002 University of Washington9 Lex Rules Lines in the rules section have the form expression action expression is the pattern to recognize »regular expressions defined as we have in class with extensions for convenience action is the action to take when the pattern is recognized »arbitrary C code that becomes part of the DFA code

9-December-2002cse Tools © 2002 University of Washington10 pattern definition match operators xthe character "x" [xy]any character selected from the list given (in this case, x or y) [x-z]any character from the range given (in this case, x, y or z) [^x]any character but x, ie the complement of the character(s) specified..any single character but newline. x?an optional x, ie, 0 or 1 occurrences of x x* 0 or more instances of x. x+1 or more instances of x, equivalent to xx* x{m,n}m to n occurrences of x ^xan x at the beginning of a line. x$an x at the end of a line. \xan "x", even if x otherwise has some special meaning "xy"string "xy", even if xy would otherwise have some special meaning x|yan x or a y. x/yan x but only if followed by y. {xx}the translation of xx from the definitions section.

9-December-2002cse Tools © 2002 University of Washington11 actions When an expression is matched, Lex executes the corresponding action. »the action is defined as one or more C statements »if a section of the input is not matched by any pattern, then the default action is taken which consists of copying the input to the output

9-December-2002cse Tools © 2002 University of Washington12 context for the action code There are several global variables defined at the point when the action code is called »yytext - null terminated string containing the lexeme »yyleng - the length of yytext string »yylval - structured variable holding attributes of the recognized token »yylloc - structured variable holding location information about the recognized token

9-December-2002cse Tools © 2002 University of Washington13 count.l - count chars, words, lines %{ int numchar = 0, numword = 0, numline = 0; %} % \n{numline++; numchar++;} [^ \t\n]+{numword++; numchar+= yyleng;}.{numchar++;} % main() { yylex(); printf("%d\t%d\t%d\n", numchar, numword, numline); } declarations rules user code

9-December-2002cse Tools © 2002 University of Washington14 create and run count cse413]$ flex count.l cse413]$ gcc -o count lex.yy.c -ll cse413]$./count < count.l cse413]$./count < lex.yy.c create the scanner source file lex.yy.c build the executable program count run the program on the definition file count.l run the program on the generated source file lex.yy.c

9-December-2002cse Tools © 2002 University of Washington15 histo.l - histogram of word lengths int lengs[100]; % [a-zA-Z]+lengs[yyleng]++;.|\n; % yywrap() { int i; printf("Length No. words\n"); for(i=0; i<100; i++) if (lengs[i] > 0) printf("%5d%10d\n",i,lengs[i]); return(1); } declarations rules user code

9-December-2002cse Tools © 2002 University of Washington16 Create and run histo cse413]$ flex histo.l cse413]$ gcc -o histo lex.yy.c -ll cse413]$./histo < histo.l Length No. words create the scanner source file lex.yy.c build the executable program histo run the program on the definition file histo.l

9-December-2002cse Tools © 2002 University of Washington17 Yacc: Yet-Another Compiler Compiler Yacc provides a general tool for describing the input to a computer program. »ie, Yacc helps write programs whose actions are directed by a language generated by some grammar The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. »Yacc turns the specification into a subroutine that handles the input process

9-December-2002cse Tools © 2002 University of Washington18 Yacc Input Declarations - optional user supplied header code Definitions - optional definitions to simplify the Rules section Productions - the grammar to parse and the associated actions User subroutines - optional helper functions %{ Declarations %} Definitions % Productions % User Subroutines

9-December-2002cse Tools © 2002 University of Washington19 Yacc Output Generates a parser function written in C »the file is y.tab.c »the function is yyparse() »yyparse() implements a bottom-up, LALR(1) parser for the grammar you supplied

9-December-2002cse Tools © 2002 University of Washington20 Yacc Productions Lines in the productions section have the form production action production is the grammar expression »almost exactly the same as the productions we have defined in class action is the action to take when the pattern is recognized »arbitrary C code that becomes part of the DFA code

9-December-2002cse Tools © 2002 University of Washington21 example productions parameters :parameter | parameters ',' parameter ; parameter :T_KW_INT T_ID {printf("PARSED PARAMETER %s\n",$2);} ; declarations: declaration | declarations declaration ; declaration: T_KW_INT T_ID ';' {printf ("PARSED LOCAL VARIABLE %s\n",$2);} ;

9-December-2002cse Tools © 2002 University of Washington22 Production format A grammar production has the form »A : BODY ; »A is the non-terminal »BODY is a sequence of zero or more non- terminals and terminals »':' takes the place of '::=' or '  ' in our grammars »';' means the end of the production or set of productions

9-December-2002cse Tools © 2002 University of Washington23 actions When a production is matched, Yacc executes the corresponding action. »the action is defined as one or more C statements »the statements can do anything »if no action code is specified, then the default action is to return the value of the first item in right hand side of the production for use in higher level accumulations

9-December-2002cse Tools © 2002 University of Washington24 context for the action code The value of the items in the production is available when the action code is called You can set the value of this non-terminal by setting $$ You can access the values of the parsed items with $1, $2, etc parameter :T_KW_INT T_ID {printf("PARSED PARAMETER %s\n",$2);} ;

9-December-2002cse Tools © 2002 University of Washington25 dp.y: declarations and definitions %{ #define YYERROR_VERBOSE 1 void yyerror (char *s); %} %union { int intValue; char *stringValue; } %token T_ID %token T_INT %token T_KW_INT %token T_KW_RETURN %token T_KW_IF %token T_KW_ELSE %token T_KW_WHILE %token T_OP_EQ %token T_OP_ASSIGN miscellaneous C declarations token attributes tokens and keywords

9-December-2002cse Tools © 2002 University of Washington26 dp.y: productions and actions % program: functionDefinition | program functionDefinition ; functionDefinition : T_KW_INT T_ID '(' ')' '{' statements '}' {printf("PARSED FUNCTION %s\n",$2);} | T_KW_INT T_ID '(' parameters ')' '{' statements '}' {printf("PARSED FUNCTION %s\n",$2);} | T_KW_INT T_ID '(' ')' '{' declarations statements '}' {printf("PARSED FUNCTION %s\n",$2);} | T_KW_INT T_ID '('parameters ')' '{' declarations statements '}' {printf("PARSED FUNCTION %s\n",$2);} ; etc, etc look familiar?

9-December-2002cse Tools © 2002 University of Washington27 dp.y: user subroutines % void yyerror (char *s) { fprintf (stderr, "Err: %s\n", s); } int main (void) { return yyparse (); } simple error reporting function simple main program

9-December-2002cse Tools © 2002 University of Washington28 dp.l: lexical (part 1) %{ #include "dp.tab.h" %} alpha[a-zA-Z] numeric[0-9] comment "//".*$ whitespace [ \t\n\r\v\f]+ % [!>+\-*(){},;] {return yytext[0]; } "int"{ return T_KW_INT; } "return"{ return T_KW_RETURN; } "if"{ return T_KW_IF; } "else"{ return T_KW_ELSE; } "while"{ return T_KW_WHILE; } "=="{ return T_OP_EQ; } "="{ return T_OP_ASSIGN; }

9-December-2002cse Tools © 2002 University of Washington29 dp.l: lexical (part 2) {whitespace} { } {alpha}({alpha}|{numeric}|_)*{ yylval.stringValue = strdup(yytext); return T_ID; } {numeric}+{ yylval.intValue = atoi(yytext); return T_INT; } {comment}{ }

9-December-2002cse Tools © 2002 University of Washington30 sample run of the dp parser cse413]$./parse < scanx.d PARSED PARAMETER x PARSED PARAMETER y PARSED FUNCTION someFunction PARSED PARAMETER x PARSED LOCAL VARIABLE aSimpleInt PARSED LOCAL VARIABLE a_S_I_2 PARSED LOCAL VARIABLE k PARSED LOCAL VARIABLE z PARSED FUNCTION main cse413]$

9-December-2002cse Tools © 2002 University of Washington31 Summary The actions can be any code you need If you ever need to build a program that parses character strings into data structures »think Lex/Yacc, flex/bison, JLex/CUP You can quickly build a solid parser with well defined tokens and grammar »tokens are regular expressions »grammar is standard Backus-Naur Form