Lexical Analysis with lex(1) and flex(1) © 2011 Clinton Jeffery.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
CS252: Systems Programming Ninghui Li Topic 4: Regular Expressions and Lexical Analysis.
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched.
CS 445 Lecture #2 Lexical Analysis. Regular Expressions ε is a r.e. Any char in the alphabet is a r.e. If r and s are r.e.’s then r | s is a r.e. If r.
Welcome to CS 445 Compiler and Translator Design Clinton Jeffery JEB 230.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Lecture 2: Lexical Analysis CS 540 George Mason University.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Lexical Analysis – Part I EECS 483 – Lecture 2 University of Michigan Monday, September 11, 2006.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Set 27 HANDLING COMMENTS IN LEX & SEARCHING & SORTING IN C.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
Lexical Analysis.
Tutorial On Lex & Yacc.
Lexical Analysis (Sections )
CSc 453 Lexical Analysis (Scanning)
Using SLK and Flex++ Followed by a Demo
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
Regular Languages.
Review: Compiler Phases:
Lecture 5: Lexical Analysis III: The final bits
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Compiler Structures 3. Lex Objectives , Semester 2,
Regular Expressions and Lexical Analysis
Systems Programming & Operating Systems Unit – III
CSc 453 Lexical Analysis (Scanning)
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

Lexical Analysis with lex(1) and flex(1) © 2011 Clinton Jeffery

Reading Read Sections 3-5 of Lexical Analysis with Flex Check out the class lecture notes Ask questions from either source – Preferred venues: in-class, or in CS Forums

Traits of Scanners Function: convert from chars to tokens Identify and categorize kinds of tokens Detect boundaries between tokens Discard comments and whitespace Remember line/col #’s for error reporting Report lexical errors Run as fast as possible

Regular Expressions ε is a r.e. Any char in the alphabet is a r.e. If r and s are r.e.’s then r | s is a r.e. If r and s are r.e.’s then r s is a r.e. If r is a r.e. then r* is a r.e. If r is a r.e. then (r) is a r.e.

Common extensions to regular expression notation r+ is equivalent to rr* r? is equivalent to r|ε [abc] is equivalent to a|b|c [a-z] is equivalent to a | b| … |z [^abc] is equivalent to anything but a,b, or c

Lex’s extended regular expressions \cescapes for most operators “s”match C string as-is (superescape) r{m,n}match r between m and n times r/smatch r when s follows ^rmatch r when at beginning of line r$match r when at end of line

Lexical Attributes A lexical attribute is a piece of information about a token Compiler writer can define as needed Typically: – Categoryinteger code, used in parsing – Lexemeactual string as appears in source – Line, columnlocation in source code – Valuefor literals, the binary they represent

Meanings of the word “token” A single word from the source code An integer code that categorizes a word A set of lexical attributes that are computed from a single word of input An instance of a class (given by category)

Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched by yylex() */ int yywrap();/* end-of-file handler */

.l file format header % body % helper functions

Lex header C code inside %{ … %} – prototypes for helper functions – #include’s that #define integer token categories Macro definitions, e.g. letter[a-zA-Z] digit[0-9] ident{letter}({letter}|{digit})* Warning: macros are fraught with peril

Lex body Regular expressions with semantic actions “ “{ /* discard */ } {ident}{ return IDENT; } “*”{ return ASTERISK; } “.”{ return PERIOD; } Match the longest r.e. possible Break ties with whichever appears first If it fails to match: copy unmatched to stdout

Lex helper functions Follows rules of ordinary C code Compute lexical attributes Do stuff the regular expressions can’t do Write a yywrap() to switch files on EOF

struct token – typical compiler struct token { int category; char *text; int linenumber; int column; char *filename; union literal value; }

“string removal tool” % “zap me”

whitespace trimmer % [ \t]+putchar(‘ ‘); [ \t]+/* drop entirely */

string replacement % usernameprintf(“%s”, getlogin() );

Line/word counter int lines=0, chars=0; % \n++lines; ++chars;.++chars; % main() { yylex(); printf(“lines: %d chars: %d\n”, lines, chars); }

Example: C reals Is it: [0-9]*.[0-9]* Is it: ([0-9]+.[0-9]* | [0-9]*.[0-9]+)