Regular Languages.

Slides:



Advertisements
Similar presentations
Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Advertisements

Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
CS252: Systems Programming
Lexical Analysis with lex(1) and flex(1) © 2011 Clinton Jeffery.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Languages, grammars, and regular expressions
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Grammars CPSC 5135.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
Lecture 2: Lexical Analysis
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Introduction to Lex Ying-Hung Jiang
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
1 LEX & YACC Tutorial February 28, 2008 Tom St. John.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Elements of Computing Systems, Nisan & Schocken, MIT Press, Chapter 10: Compiler I: Syntax Analysis slide 1www.nand2tetris.org Building.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
CS 3304 Comparative Languages
NFAs, scanners, and flex.
Tutorial On Lex & Yacc.
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Using SLK and Flex++ Followed by a Demo
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Natural Language Processing - Formal Language -
RegExps & DFAs CS 536.
PROGRAMMING LANGUAGES
Context-free Languages
TDDD55- Compilers and Interpreters Lesson 2
Bison: Parser Generator
ENERGY 211 / CME 211 Lecture 15 October 22, 2008.
Regular Grammar.
Review: Compiler Phases:
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Compiler Structures 3. Lex Objectives , Semester 2,
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Compiler Lecture Note, Miscellaneous
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Discrete Maths 13. Grammars Objectives
More on flex.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Regular Expressions and Lexical Analysis
CMPE 152: Compiler Design December 4 Class Meeting
Systems Programming & Operating Systems Unit – III
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
COMPILER CONSTRUCTION
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

Regular Languages

Grammar G = {S, N, T, R} where G is the grammar, S is the nonterminal starting symbol, N is the set of all nonterminals: for example: {S, A, B}, T is the set of terminals: for example: {a, b} R is the set of production rules, for example: S -> AB AB -> aABb A -> a B -> b

Chomsky Hierarchy Type 3: regular grammar Type 2: context-free grammar Type 1: context-sensitive grammar Type 0: unrestricted grammar

Regular Grammar Regular grammar places severe restrictions on the production rules: A single nonterminal element allowed on the left side On the right side a rule is allowed to have: nothing a single terminal element a single nonterminal element a single terminal element and a single nonterminal element The order of nonterminal and terminal elements must remain the same within the grammar, it could be either: nonterminal followed by terminal (left linear) terminal followed by nonterminal (right linear)

Regular Language Language generated by regular grammar Language parser can be implemented using FSM Fast parsing Rules too simplistic (very poor support for nesting statements) Can be used as a scanner

Scanner / Lexer Scanner is a tool for matching the input for described patterns Lexer is a scanner for lexical elements of (some) language For example 111 – numeric constant (integer) 1.11 – numeric constant (real) a11 – variable "a11" – string constant

Regular Expressions (REGEXP) Wildcard characters instead of rules "+" – previous character (or character class) repeated 1 or more times a+ – a, aa, aaa, etc "*" – previous character (or character class) repeated 0 or more times ab* – a, ab, abb, etc "[]" – for describing character class [abc] – a, b, c "-" – for character range [a-z] – any lowercase latin character "^" – any character except the following character [^abc] – any character except a, b, c

Regular Expressions (REGEXP) "." – any character except newline "|" – choice a|b – a, b "{}" – string length a{3} – aaa a{2,4} – aa, aaa, aaaa a{3,} – aaa, aaaa, aaaaa, etc to match a wildcard character it needs to be escaped \+ \- \*

Flex Open source rewrite of AT&T tool lex Generates a scanner based on rules section Uses regular expressions for pattern matching Can be used as a lexer in conjunction with a parser Offers C functions and global variables for operating the scanner

Flex source file structure 4 sections: top definitions / options %% patterns / rules code Some sections may be empty

Flex file %top{ #include "langtypes.h" } %option case-insensitive NUM [0-9] %x STR %% {NUM}+ { printf("This looks like an integer: %d\n", atoi(yytext)); } "\"" { BEGIN(STR); } <STR>[^\"]+ { printf("quoted text: %s", yytext); } <STR>"\"" { BEGIN(INITIAL); } int main(void){ return yylex();

Flex: useful stuff Options %option noyywrap - stops after scanning current file %option prefix="smth" - renames functions and variables from yy* to smth* %option case-insensitive - scanner is case insensitive %option yylineno - counts linenumbers, must have a rule for \n to function properly %option warn - prints warnings %option stack - allows explicit manipulation of states via a stack Variables / Functions FILE *yyin - input file, by default set to stdin char *yytext - the currently matched input int yylineno - the line number in current input file int yylex() - starts the scanning process, returns when return statements present in rules or end of input BEGIN(); - macro for explicit machine state Pushdown stack manipulation yy_push_state() - pushes current state to top of stack and switches to supplied state yy_pop_state() - pops state from stack and switches to it yy_top_state() - returns the top state without altering stack