Tutorial On Lex & Yacc.

Slides:

Advertisements

Similar presentations

Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.

Advertisements

Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.

Structure of a YACC File Has the same three-part structure as Lex Each part is separated by a % symbol The three parts are even identical: – definition.

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.

 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.

Yacc YACC BNF grammar example.y Other modules example.tab.c Executable

176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.

Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:

A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ

LEX and YACC work as a team

1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.

1 Using Yacc: Part II. 2 Main() ? How do I activate the parser generated by yacc in the main() –See mglyac.y.

Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.

1 YACC Parser Generator. 2 YACC YACC (Yet Another Compiler Compiler) Produce a parser for a given grammar.  Compile a LALR(1) grammar Original written.

Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.

PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.

Scanning & FLEX CPSC 388 Ellen Walker Hiram College.

FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.

Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.

LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.

Compiler Tools Lex/Yacc – Flex & Bison. Compiler Front End (from Engineering a Compiler) Scanner (Lexical Analyzer) Maps stream of characters into words.

JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.

Introduction to Lex Ying-Hung Jiang

Introduction to Yacc Ying-Hung Jiang

1 Using Lex. 2 Introduction When you write a lex specification, you create a set of patterns which lex matches against the input. Each time one of the.

IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.

1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.

Introduction to Lex Fan Wu

Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v

Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.

Practical 1-LEX Implementation

1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.

1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.

Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.

YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.

ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.

Applications of Context-Free Grammars (CFG) Parsers. The YACC Parser-Generator. by: Saleh Al-shomrani.

1 LEX & YACC Tutorial February 28, 2008 Tom St. John.

PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.

LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.

More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.

1 Syntax Analysis Part III Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,

9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.

Language processing: introduction to compiler construction

Syntax Analysis Part III

Lexical Analysis.

NFAs, scanners, and flex.

Sung-Dong Kim, Dept. of Computer Engineering, Hansung University

Context-free Languages

Regular Languages.

TDDD55- Compilers and Interpreters Lesson 2

JLex Lecture 4 Mon, Jan 26, 2004.

Syntax Analysis Part III

Bison: Parser Generator

Syntax Analysis Part III

Syntax Analysis Part III

Review: Compiler Phases:

Subject Name:Sysytem Software Subject Code: 10SCS52

Syntax Analysis Part III

CS 3304 Comparative Languages

Appendix B.1 Lex Appendix B.1 -- Lex.

Compiler Lecture Note, Miscellaneous

Appendix B.2 Yacc Appendix B.2 -- Yacc.

Compiler Design Yacc Example "Yet Another Compiler Compiler"

Systems Programming & Operating Systems Unit – III

Compiler Design 3. Lexical Analyzer, Flex

Lex Appendix B.1 -- Lex.

Presentation transcript:

Tutorial On Lex & Yacc

Purpose of Tutorial Provide a brief, non-technical, black-box introduction to lex and yacc. 2. How to run lex and yacc.

Lex: what is it? Lex: a tool for automatically generating a lexer or scanner given a lex specification (.l file) A lexer or scanner is used to perform lexical analysis, or the breaking up of an input stream into meaningful units, or tokens. For example, consider breaking a text file up into individual words.

Skeleton of a lex specification (.l file) x.l %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [RULES SECTION] < C auxiliary subroutines> *.c is generated after running This part will be embedded into *.c substitutions, code and start states; will be copied into *.c define how to scan and what action to take for each token any user code. For example, a main function to call the scanning function yylex().

The rules section %% [RULES SECTION] <pattern> { <action to take when matched> } … Patterns are specified by regular expressions. For example: [A-Za-z]* { printf(“this is a word”); }

Regular Expression Basics . : matches any single character except \n * : matches 0 or more instances of the preceding regular expression + : matches 1 or more instances of the preceding regular expression ? : matches 0 or 1 of the preceding regular expression | : matches the preceding or following regular expression [ ] : defines a character class () : groups enclosed regular expression into a new regular expression “…”: matches everything within the “ “ literally

Lex Reg Exp (cont) x|y x or y {i} definition of i x/y x, only if followed by y (y not removed from input) x{m,n} m to n occurrences of x  x x, but only at beginning of line x$ x, but only at end of line "s" exactly what is in the quotes (except for "\" and following character) A regular expression finishes with a space, tab or newline

Meta-characters meta-characters (do not match themselves, because they are used in the preceding reg exps): ( ) [ ] { } < > + / , ^ * | . \ " $ ? - % to match a meta-character, prefix with "\" to match a backslash, tab or newline, use \\, \t, or \n

Regular Expression Examples an integer: 12345 [1-9][0-9]* a word: cat [a-zA-Z]+ a (possibly) signed integer: 12345 or -12345 [-+]?[1-9][0-9]* a floating point number: 1.2345 [0-9]*”.”[0-9]+

Lex Regular Expressions Lex uses an extended form of regular expression: (c: character, x,y: regular expressions, s: string, m,n integers and i: identifier). c any character except meta-characters (see below) [...] the list of enclosed chars (may be a range) [...] the list of chars not enclosed . any ASCII char except newline xy concatenation of x and y x* same as x* x+ same as x+ (i.e. x* but not ) x? an optional x (same as x+ )

Regular Expression Examples a delimiter for an English sentence “.” | “?” | ! OR [“.””?”!] C++ comment: // call foo() here!! “//”.* white space [ \t]+ English sentence: Look at this! ([ \t]+|[a-zA-Z]+)+(“.”|”?”|!)

Special Functions yytext where text matched most recently is stored yyleng number of characters in text most recently matched yylval associated value of current token yymore() append next string matched to current contents of yytext yyless(n) remove from yytext all but the first n characters unput(c) return character c to input stream yywrap() may be replaced by user The yywrap method is called by the lexical analyser whenever it inputs an EOF as the first character when trying to match a regular expression

Let us run a lex program

Yacc: what is it? Yacc: a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) A grammar specifies a set of production rules, which define a language. A production rule specifies a sequence of symbols, sentences, which are legal in the language.

Skeleton of a yacc specification (.y file) *.c is generated after running x.y %{ < C global variables, prototypes, comments > %} [DEFINITION SECTION] %% [PRODUCTION RULES SECTION] < C auxiliary subroutines> This part will be embedded into *.c contains token declarations. Tokens are recognized in lexer. define how to “understand” the input language, and what actions to take for each “sentence”. any user code. For example, a main function to call the parser function yyparse()

Structure of yacc File Definition section Rules section User code declarations of tokens type of values used on parser stack Rules section list of grammar rules with semantic routines User code

The Production Rules Section %% production : symbol1 symbol2 … { action } | symbol3 symbol4 … { action } | … production: symbol1 symbol2 { action }

An example %% statement : expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | NUMBER { $$ = $1; } statement expression number + - 2 3 4 5 According these two productions, 5 + 4 – 3 + 2 is parsed into:

Choosing a Grammar S -> E S -> E E -> E + T E -> E + E T -> T * F T -> T / F T -> F F -> ( E ) F -> ID S -> E E -> E + E E ->E - E E -> E * E E -> E / E E -> ( E ) E -> ID

Precedence and Associativity %right ‘=' %left '-' '+' %left '*' '/' %right '^'

Defining Values expr : expr '+' term { $$ = $1 + $3; } term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM

$1 Defining Values expr : expr '+' term { $$ = $1 + $3; } term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM

$2 Defining Values expr : expr '+' term { $$ = $1 + $3; } term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM $2

$3 Defining Values Default: $$ = $1; expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor : '(' expr ')' { $$ = $2; } | ID | NUM $3 Default: $$ = $1;

Example: Lex scanner.l %{ #include <stdio.h> #include "y.tab.h" %} id [_a-zA-Z][_a-zA-Z0-9]* wspc [ \t\n]+ semi [;] comma [,] %% int { return INT; } char { return CHAR; } float { return FLOAT; } {comma} { return COMMA; } /* Necessary? */ {semi} { return SEMI; } {id} { return ID;} {wspc} {;}

Example: Definitions decl.y %{ #include <stdio.h> #include <stdlib.h> %} %start line %token CHAR, COMMA, FLOAT, ID, INT, SEMI %%

Example: Rules decl.y /*This production is not part of the "official" grammar. It's primary purpose is to recover from parser errors, so it's probably best if you leave ot here. */ line : /* lambda */ | line decl | line error { printf("Failure :-(\n"); yyerrok; yyclearin; } ;

Example: Rules decl.y decl : type ID list { printf("Success!\n"); } ; list : COMMA ID list | SEMI ; type : INT | CHAR | FLOAT %%

Example: Supplementary Code decl.y Example: Supplementary Code extern FILE *yyin; main() { do { yyparse(); } while(!feof(yyin)); } yyerror(char *s) /* Don't have to do anything! */

Let us Run a Program