Com2010 - Functional Programming Lexical Analysis Marian Gheorghe Lecture 15 Module homepage Mole & ©University of Sheffieldcom2010.

Slides:



Advertisements
Similar presentations
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
YES-NO machines Finite State Automata as language recognizers.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 Languages and Finite Automata or how to talk to machines...
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
Lecture 23: Finite State Machines with no Outputs Acceptors & Recognizers.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Compiler Construction Lexical Analysis. The word lexical means textual or verbal or literal. The lexical analysis implemented in the “SCANNER” module.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Natural Language Processing Lecture 6 : Revision.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Theory of Computation - Lecture 3 Regular Languages What is a computer? Complicated, we need idealized computer for managing mathematical theories... Hence:
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview of scanner A scanner transforms a character stream of source file into a token stream. It is also.
LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
©University of Sheffieldcom2010 Com Functional Programming Syntax Analysis Marian Gheorghe Lecture 16 Module homepage Mole &
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Com Functional Programming Lazy Evaluation Marian Gheorghe Lecture 13 Module homepage Mole & ©University of Sheffieldcom2010.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
©University of Sheffieldcom2010 Com Functional Programming Demos: LexPrs & PrsRes & Software Engineering Design and Coding Marian Gheorghe Lecture.
Department of Software & Media Technology
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Fall 2004COMP 3351 Finite Automata. Fall 2004COMP 3352 Finite Automaton Input String Output String Finite Automaton.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Lecture Three: Finite Automata Finite Automata, Lecture 3, slide 1 Amjad Ali.
Finite Automata.
Finite automate.
Languages.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Finite-State Machines (FSMs)
Two issues in lexical analysis
Recognizer for a Language
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Review: Compiler Phases:
Finite Automata.
Compiler Construction
Lecture 5 Scanning.
LECTURE # 07.
Presentation transcript:

Com Functional Programming Lexical Analysis Marian Gheorghe Lecture 15 Module homepage Mole & ©University of Sheffieldcom2010

17.1 Finite State Machine (FSM) 17.2 Translator 17.3 Parser Recognisers and Translators ©University of Sheffieldcom2010

For a given set S we may define sequences of symbols over S. M ore precisely, for S={s_1, …, s_n} x = x_1…x_p is a sequence of symbols over S if x_k is from S for any k=1..p. We denote by Seq(S) the set of all sequences over S. Letters is the alphabet, {‘a’..’z’, ‘A’..‘Z’}, then the following sequences are sequences of symbols over Letters ( belong to Seq(Letters) ) “John” “ home” “word” “word1” “long_sentence” don’t belong to Seq(Letters). Words ©University of Sheffieldcom2010

In general not all sequences are of a particular interest and in the set Seq(S) is identified a (proper) subset called language which has some specific properties. Example: the language of words starting with capital letters “John” “Identifier”; “word” How to specify, recognise, process a language? Languages ©University of Sheffieldcom2010 There are some specific mechanisms for recognising words or sentences (regular expressions, automata, syntax diagrams, formal grammars) or for translating them into other things (extensions of the first ones).

Finite State Machine (FSM). Ex. ©University of Sheffieldcom a b c b a c c a a b 0 – initial state 3,7 – final states a,b,c– transition labels

Sequences accepted: aba, abcbab, ababbbbb bac, bca… Sequences rejected ab, abc, ba … A FSM recognizes a language – all the sequences accepted by it (paths from the initial state to final states) We will use only deterministic FSMs FSM - recognizer ©University of Sheffieldcom2010

type SetOf a =[a] data Automaton = FSM (SetOf State) (SetOf Label) (SetOf Transition) InitialState (SetOf State) –- set of final states Example above: States = 0,1, …, 7; Labels = a,b,c; Transition – transition diagram Initial state = 0; Final states = 3,7 FSM is Haskell (1) ©University of Sheffieldcom2010

Where the components are: type State = Int type Label = Char data Transition = Move State Label State type InitialState = State automatonEx = FSM [0..7] ['a','b','c'] [Move 0 'a' 1, Move 1 'b' 2, Move 2 'c' 1, Move 2 'a' 3, Move 3 'b' 3, Move 0 'b' 4, Move 4 'a' 5, Move 5 'c' 7, Move 4 'c' 6, Move 6 'a' 7] 0 [3,7] FSM is Haskell (2) ©University of Sheffieldcom2010

Matching an input against a FSM ©University of Sheffieldcom a b c b a c c a a b “abcbab” is recognized by automatonEx

Various components of a FSM are obtained through select functions: tr :: Automaton -> SetOf Transition -- all transitions of an automaton tr (FSM _ _ t _ _) = t inState :: Transition -> State -- input transition state inState (Move s _ _) = s outState :: Transition -> State -- output transition state outState (Move _ _ s) = s label :: Transition -> Label -- transition label label (Move _ x _ ) = x Selecting components of a FSM ©University of Sheffieldcom2010

All the transitions emerging from a state s and labelled with the same given symbol x : oneMove :: Automaton -> State -> Label -> SetOf Transition oneMove a s x = [t| t <- tr a, inState t == s, label t == x] If it’s deterministic FSM then how many elements are in such a list? Extracting transitions ©University of Sheffieldcom2010

A recogniser that matches an input string against a FSM starting from a state s, is recursively defined thus recogniser :: Automaton -> State -> String -> State recogniser a s xs -- 0 or > 1 transition; returns a dummy state (-1) | length ts /= 1 = no further inputs; returns next state | tail_xs == [] = os -- still inputs to be processed | otherwise = recogniser a os tail_xs where ts = oneMove a s (head xs); tail_xs = tail xs; os = outState (head ts) FSM recogniser in Haskell ©University of Sheffieldcom2010

FSM Translator = Automaton with outputs ©University of Sheffieldcom a/x b/y c/z b/y a/x c/z a/x b/y Inputs: a,b,c; Outputs: x,y,z “abcbab”  “xyzyxy”

data AutomatonO = FSMO(SetOf State) (SetOf InputLabel) (SetOf OutputLabel) (SetOf Transition) InitialState (SetOf State) –- set of final states where type InputLabel = Char type OutputLabel = Char data Transition = Move State InputLabel OutputLabel State Automaton with outputs in Haskell ©University of Sheffieldcom2010

A translator may be thus defined translator :: AutomatonO ->(State,OutString) ->InString -> (State,OutString) where InString and OutString are defined as String and denote the input and output strings, respectively. In this case any of the equations defining translator contains tuples instead of states. The tuples are of the form (state,outSymbols), where outSymbols is a string collecting the output label of the current transition. Exercise. Define translator and the associated select functions FSM Translator in Haskell ©University of Sheffieldcom2010

The most basic level of a programming language definition – lexical level. It consists of lexical units or tokens. Examples: identifiers (oneMove, automatonEx) constants or literals - numeric (1, 235), alpha-numeric (“string”) operators (+,-, * /), delimiters (;) … etc Another type of translator is defined by aggregating some inputs and sending them out in certain states. These translators are largely used to recognise lexical units and are called lexical analysers or scanners From Translator to Scanners ©University of Sheffieldcom2010

Lexical analyser as a FSM Translator ©University of Sheffieldcom letter ‘ digit { 4 letterDigit character - { letter is any of ‘a’..’z’ or ‘A’..’Z’ and digit is any of ‘0’..’9’; letterDigit is either letter or digit ; character is any acceptable character

For a string like “ident 453 Id7t” the above automaton may translate it into the following lexical units ident and Id7t which are recognised in the final state 1 and 453 recognised in state 2. When a comment is recognised, a sequence starting with ’{-‘, ending with ‘-}’ and containing any characters in between, in final state 6, then it is discarded. For example the string “34 {-comment-}” produces only one token, 34 Important! In order to ease the process of recognising lexical units assume the tokens are always separated by spaces (‘ ‘) and consequently from every final state we should have a transition to the initial state, 0, labelled by ‘ ‘ Recognising lexical units ©University of Sheffieldcom2010

The following definition is an extension of that given for a FSM data ExtAutomaton = EFSM (SetOf State) (SetOf Label) (SetOf Transition) InitialState (SetOf State) (SetOf FinalStateType) —new!! where type FinalStateType = (State, TokenUnit) type TokenUnit = (Int,String) In our example only states 1 and 2 occur in the list of finalStateType. The final state 6 is not in this list. Extended automaton ©University of Sheffieldcom2010

The translator, which is called lexical analyser, will use a translation function defined thus translation :: ExtAutomaton -> (State,SetOf TokenUnit)-> InputSequence -> String-> (State, SetOf TokenUnit) translation takes an extended automaton a tuple with the first component a state – in general the initial state – and the second part a list of token units – in general empty – an input sequence of characters a string where the current lexical unit will be collected; initially it is empty It produces the last state where the translation process stops and the sequence of token units recognised. Translator ©University of Sheffieldcom2010

If the input string is "ident {-comment-} 346 lastIdent " then translate extAutomaton (0,[]) "ident {-comment-} 346 lastIdent"[] ⇒ (1,[(1,"ident"),(2,"346"),(1,"lastIdent")]) So the translation stops in state 1, which is a final state where lastIdent has been recognised and produces the following token units: (1,"ident") (2,"346") (1,"lastIdent") Example ©University of Sheffieldcom2010

translation function is defined by the following algorithm: when the input string is empty it stops by producing the current state and the list of token units otherwise (input string is not empty) –if the character in the top of the input string is not ‘ ‘ then it is added to the string collecting the current lexical unit and translation resumes from next state, current string collected, and the rest of the input string –(current character is ‘ ‘) the previous state is in the list of FinalStateType then a token unit is recognised and added to the list of token units and translation resumes from the next state, with an empty string where next lexical unit will be collected, and the rest of input string –otherwise (the previous state is not in that list) the collected token is discarded and translation resumes from the next state, with an empty string where next lexical unit will be collected, and the rest of input string Translation function ©University of Sheffieldcom2010