Download presentation
Presentation is loading. Please wait.
1
Department of Software & Media Technology
Scanning, or Lexical Analysis. Regular Grammars Non-terminals (arbitrary names) Terminals (characters) Productions limited to the following: Non-terminal ::= terminal Non-terminal ::= terminal Non-terminal Treat character class (e.g. digit) as terminal Regular grammars cannot count: cannot express size limits on identifiers, literals Cannot express proper nesting (parentheses) 8 January 2004 Department of Software & Media Technology
2
Department of Software & Media Technology
Regular Grammars grammar for real literals with no exponent digit :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 REALVAL ::= digit REALVAL1 REALVAL1 ::= digit REALVAL (arbitrary size) REALVAL1 ::= . INTEGERVAL INTEGERVAL ::= digit INTEGERVAL (arbitrary size) INTEGERVAL ::= digit Start symbol is ? 8 January 2004 Department of Software & Media Technology
3
Department of Software & Media Technology
Regular Expressions RE are defined by an alphabet (terminal symbols) and three operations: Alternation RE1 | RE2 Concatenation RE1 RE2 Repetition RE* (zero or more RE’s) Language of RE’s = regular grammars Regular expressions are more convenient for some applications 8 January 2004 Department of Software & Media Technology
4
Finite State Machines or Finite Automata (FSM or FA)
A language defined by a grammar is a (possibly infinite) set of strings An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that recognize regular languages (regular expressions) Simplest automaton: memory is single number (state) 8 January 2004 Department of Software & Media Technology
5
Specifying an Finite State Machine (FA)
A set of labeled states, directed arcs between states labeled with character One or more states may be terminal (accepting) Start is a distinguished state Automaton makes transition from state S1 to S2 If and only if arc from S1 to S2 is labeled with next character in input Token is legal if automaton stops on terminal state 8 January 2004 Department of Software & Media Technology
6
Department of Software & Media Technology
FA from Grammar One state for each non-terminal A rule of the form Nt1 ::= terminal, generates transition from a state to final state Nt1 ::= terminal Nt2 Generates transition from state 1 to state 2 on an arc labeled by the terminal 8 January 2004 Department of Software & Media Technology
7
Graphic representation of FA
digit letter underscore identifier 8 January 2004 Department of Software & Media Technology
8
Department of Software & Media Technology
FA from RE Each RE corresponds to a grammar For all REs A natural translation to FSM exists Alternation often leads to non-deterministic machines 8 January 2004 Department of Software & Media Technology
9
Deterministic Finite Automata (DFA)
For all states S For all characters C There is at most one arc from any state S that is labeled with C Easier to implement No backtracking Conventions for DFA: Error transitions are not explicitly shown Input symbols that result in the same transition are grouped together (this set can even be given a name) Still not displayed: stopping conditions and actions 8 January 2004 Department of Software & Media Technology
10
Non-Deterministic Finite Automata (NFA)
A non-deterministic FA Has at least one state With two arcs to two distinct states Labeled with the same character Example: from start state, a digit can begin an integer literal or a real literal Implementation requires backtracking 8 January 2004 Department of Software & Media Technology
11
Lookahead & Backtracking in NFA
letter start in_id [other] return id finish digit 8 January 2004 Department of Software & Media Technology
12
Department of Software & Media Technology
Implementation of FA letter start in_id [other] return id finish digit 8 January 2004 Department of Software & Media Technology
13
Department of Software & Media Technology
From RE to DFA & RE to NFA letter start in_id [other] return id finish digit 8 January 2004 Department of Software & Media Technology
14
Department of Software & Media Technology
NFA to DFA There is an algorithm for converting a non-deterministic machine to a deterministic one Result may have exponentially more states Intuitively: need new states to express uncertainty about token: int or real Other algorithms for minimizing number of states of FSM, for showing equivalence, etc. 8 January 2004 Department of Software & Media Technology
15
Department of Software & Media Technology
Example DFA 8 January 2004 Department of Software & Media Technology
16
Another view of the same DFA
8 January 2004 Department of Software & Media Technology
17
Yet another view of the same DFA
8 January 2004 Department of Software & Media Technology
18
State Minimization in DFA
8 January 2004 Department of Software & Media Technology
19
Department of Software & Media Technology
TINY DFA: 8 January 2004 Department of Software & Media Technology
20
Department of Software & Media Technology
Lex for Scanner Lex Conventions for RE Format of a Lex Input File 8 January 2004 Department of Software & Media Technology
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.