Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic 66.648 Compiler Design Lecture 3(01/21/98)

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis (2 Lectures). CSE244 Compilers 2 Overview Basic Concepts Regular Expressions –Language Lexical analysis by hand Regular Languages Tools.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 4(01/26/98)
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Compiler design Computer Science Rensselaer Polytechnic Lecture 1.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source program) – divides it into tokens.
CPSC 388 – Compiler Design and Construction
CS 540 Spring CS 540 Spring 2013 GMU2 The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Exercise 1 Consider a language with the following tokens and token classes: ident ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Syntax Analysis - LR(0) Parsing Compiler Design Lecture (02/04/98) Computer Science Rensselaer Polytechnic.
Scanner Introduction to Compilers 1 Scanner.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
The Role of Lexical Analyzer
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Syntax-Directed Definitions and Attribute Evaluation Compiler Design Lecture (02/18/98) Computer Science Rensselaer Polytechnic.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Compiler Design (40-414) Main Text Book:
Lecture 2 Lexical Analysis
Scanner Scanner Introduction to Compilers.
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
CSc 453 Lexical Analysis (Scanning)
The time complexity for e-closure(T).
Two issues in lexical analysis
Recognizer for a Language
Syntax Analysis - LR(1) and LALR(1) Parsing
Scanner Scanner Introduction to Compilers.
Syntax Analysis - Parsing
Scanner Scanner Introduction to Compilers.
4b Lexical analysis Finite Automata
Compiler Structures 2. Lexical Analysis Objectives
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
Scanner Scanner Introduction to Compilers.
CSc 453 Lexical Analysis (Scanning)
Compiler Design 3. Lexical Analyzer, Flex
Lexical Analysis - Scanner-Contd
Presentation transcript:

Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 3(01/21/98)

Lecture Outline l More on Lexical Analyzer l Examples and Algorithms l Administration

Non-regular Languages Regular Expressions can be used to denote only a fixed number or unspecified number of repetitions. Examples of nonregular languages: 1. The set of all strings of balanced parentheses e.g.., (()), (()()(())), etc. - nested comments are also nonregular. 2. The set of all palindromes. {wv| v is the reverse of w, w is a string over the alphabet.} 3. Repeating Strings { ww| w a string over the alphabet}.

Examples of Constructing NFA from a reg. expr A NFA for a regular expression can be constructed as follows: 1. There is a single transition labeled with an alphabet. (this includes an epsilon symbol). There are two states, the start state and the final state and one edge/transition. 2.For E1.E2, construct a new start state and a new final state. From the start state, add an edge labeled with epsilon to start state of E1. From the final state of E1, add an epsilon transition to Start state of E2.

NFA Counted. Add a transition/edge from the final state of E2 to the constructed Final state. 3. For E1|E2, Construct new start state, new final state. Add a transition from the start state to the start states of E1 and E2. These transitions are labeled with epsilon symbol 4. For E*, Construct new start state and new final state. Add an epsilon transition from the start state to the start state of E, and epsilon transition from the final state

NFA Contd of E to the constructed final state. Finally add an epsilon transition from the final state of E to the start state of E. This gives an algorithm to construct the transition graph from a regular expression. e.g.. identifier, comments, floating constants.

Simulation of NFA An epsilon closure of a state x is the set of states that can be reached (including itself) by making just transition labeled with epsilon. We want to get the next token from the input stream. Properties: 1. The longest sequence of characters starting at the current position that matches a regular exp. for a token. 2. Input buffer is repositioned to the first character following the token. 3. Nothing gets read after the end-of-file.

Algorithm page 126 of text alg.3.3 getNextToken() { t.error = true; // t is a token that will be found S = epsilon_closure({start}); while(true) { if (S is empty} break; if (S contains a final state) { t.eror=false; //fill in t.line and other attributes.} if (end_of_file) break; c= getchar(): T=move(S,c); S=epsilon_closure(T);} reset_inputbuffer(t.line,t.lastcol+1); return t}

Analysis of the Alg Simulation time = O(size of input string) Simulation Space=O(size of NFA). It is inefficient to read the entire program as scanner input. The scanner converts the characters into token on the fly. The scanner keeps an internal buffer of bounded size to hold the largest possible token size and largest lookahead needed. This is usually much smaller than the entire program.

Discussion contd Often, in practice, parser requests a scanner to provide with a token. The parser tries to construct a parse tree (by doing a shift/reduce operations) to get the parse tree.

High-level Structure of a scanner repeat { t= getNextToken(); if (t.error) { print error message; exit from compiler or recover from the error;} output_token(t);} until(t.EOF)

Output tokens for sample program Token Attribline tok_public1 tok_class1 tok_idfirst1 tok_lbrace1 tok_public2 tok_static2 tok_void2 tok_main2 tok_lparen2

Lex- program format Lex- program format Format %{ included as is %} defintions % patterns actions % program

Sample lex program %{ char reserved_word[12][20]; %} % [a-z]+ { if (lookup(yytext)==-1) { printf(“tok_id\t%s\t%d\n”,yytext,yylineno); } else {printf(“tok_%s\t\t%d\n”, reseved_word[I],yylineno);} [0-9]+ { printf(“tok_intconst\t%s\t%d\n”, yytext,yylineno); }

Program Contd “=“printf(“tok_eq\t\t%d\n”,yylineno); “;”printf(“tok_semi\t\t%d\n”,yylineno); “(“printf(“tok_lparen\t\t%d\n”,yylineno); “)”printf(“tok_rparen\t\t%d\n”,yylineno); “{“printf(“tok_lbrace\t\t%d\n”,yylineno); “}”printf(“tok_rbrace\t\t%d\n”,yylineno); “[“printf(“tok_lsqb\t\t%d\n”,yylineno); “]”printf(“tok_rsqb\t\t%d\n”,yylineno); %

Administration l We are in Chapter 3 of Aho, Sethi and Ullman’s book. Please read that chapter and chapter 1 which we covered in Lectures1 and 2. l Work out the first few exercises of chpater 3. l Lex and Yacc Manuals are handed out. Please read them.

First Project is in the web. It consists of three parts. 1) To write a lex program 2) To write a YACC program. 3) To write five sample Java programs. They can be either applets or application programs

Comments and Feedback l Please let me know if you have not found a project partner. l A sample Java compiler is in the class home page.