Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 4(01/26/98)
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
Lecture 2: Lexical Analysis
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Lex & Yacc By Hathal Alwageed & Ahmad Almadhor. References *Tom Niemann. “A Compact Guide to Lex & Yacc ”. Portland, Oregon. 18 April 2010 *Levine, John.
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 Steps to use Flex Ravi Chotrani New York University Reviewed By Prof. Mohamed Zahran.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Lexical Analysis.
Chapter 3 Lexical Analysis.
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
CSc 453 Lexical Analysis (Scanning)
Using SLK and Flex++ Followed by a Demo
RegExps & DFAs CS 536.
TDDD55- Compilers and Interpreters Lesson 2
JLex Lecture 4 Mon, Jan 26, 2004.
Subject Name:Sysytem Software Subject Code: 10SCS52
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Compiler Structures 3. Lex Objectives , Semester 2,
Appendix B.1 Lex Appendix B.1 -- Lex.
Compiler Design Yacc Example "Yet Another Compiler Compiler"
More on flex.
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
NFAs, scanners, and flex.
Compiler Design 3. Lexical Analyzer, Flex
Lexical Analysis - Scanner-Contd
Lex Appendix B.1 -- Lex.
Presentation transcript:

Tools for building compilers Clara Benac Earle

Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator: yacc Java –Lexical Analyzer generators: JLex, JFlex, –Syntax Analyzer generator: CUP These tools with their documentation can be found on the internet

Lex: Lexical Analyzer Generator Lex Compiler C compiler example.llex.yy.c a.exe

Description A tool for generating scanners The scanner is described as pairs of regular expressions and C code Flex generates as output a C source file, lex.yy.c, which defines a routine yylex(). This file produces an executable When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code

Format of the input file The flex input file consists of three sections separated by % Definitions%Rules% User Code

Skeleton of a lex specification (.l file) %{ %} [DEFINITION SECTION] % [RULES SECTION] % This part will be embedded into *.c substitutions, code and start states; will be copied into *.c any user code. For example, a main function to call the scanning function yylex(). define how to scan and what action to take for each token

The definition section Contains name definitions and declarations of start conditions Name definitions have the form: name definition name definitionExamples: DIGIT[0-9] ID[a-z][a-z0-9]*

The rules section Form: % { } … % Patterns are specified by regular expressions Examples: % [A-Za-z]*{ printf(“this is a word”); } %

Extended regular expressions xmatch the character “x”.any character except newline []a character class [xy]match either an “x” or a “y” [a-z]match any letter from “a” to “z” [^a-z]any character but those in the class r*zero or more r´s r+one or more r´s r?zero or one r {name}the expansion of the name definition {name}the expansion of the name definition

Extended regular expressions x|yx or y x/yx, only if followed by y (y not removed from input) x{m,n}m to n occurrences of x  xx, but only at beginning of line x$x, but only at end of line "s"exactly what is in the quotes (except for "\" and following character) A regular expression finishes with a space, tab or newline

Meta-characters –meta-characters (do not match themselves, because they are used in the preceding reg exps): ( ) [ ] { } + /, ^ * |. \ " $ ? - % ( ) [ ] { } + /, ^ * |. \ " $ ? - % –to match a meta-character, prefix with "\" –to match a backslash, tab or newline, use \\, \t, or \n

Regular Expression Examples an integer: [1-9][0-9]* a word: cat [a-zA-Z]+ a (possibly) signed integer: or [-+]?[1-9][0-9]* a floating point number: [0-9]*”.”[0-9]+

Two Rules 1.lex will always match the longest (number of characters) token possible. 2. If two or more possible tokens are of the same length, then the token with the regular expression that is defined first in the lex specification is favored.

How the input is matched Once the match is determined, the text corresponding to the match is made available in the global character pointer yytext, and its length in the global integer yyleng. The action corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match Once the match is determined, the text corresponding to the match is made available in the global character pointer yytext, and its length in the global integer yyleng. The action corresponding to the matched pattern is then executed, and then the remaining input is scanned for another match

Actions Can be any arbitrary C statement Normally they are written between {} If the action is empty, then when the pattern is matched the input token is simply discarded The action “|” means “same as the action for the next rule”

Actions: examples % [ \t \n]+ ; ":=" return ASIG; "<“ return MINOR; "if" return IF;

Start conditions A mechanism for conditionally activating rules %s comment % “/*” { BEGIN comment; } ”*/” { END comment; /* = BEGIN 0; */ } ”*/” { END comment; /* = BEGIN 0; */ }. { }. { }

Special Functions yytext –where text matched most recently is stored yyleng –number of characters in text most recently matched yylval –associated value of current token yymore() –append next string matched to current contents of yytext yyless(n) –remove from yytext all but the first n characters unput(c) –return character c to input stream yywrap() –may be replaced by user –The yywrap method is called by the lexical analyzer whenever it inputs an EOF as the first character when trying to match a regular expression

Let us run a lex program