Download presentation
Presentation is loading. Please wait.
Published byAmanda McCoy Modified over 6 years ago
1
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Cairo University FCI Welcome to a journey to Compilers CS419 Lecture5: Lexical Analysis: Regular Definitions + Transition Diagram Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University
2
Hierarchy of languages
Regular Languages Context-Free Languages Recursive Languages Recursively Enumerable Languages Non-Recursively Enumerable Languages
3
Three Views Three equivalent formal ways to look at this approach
Regular Expressions Regular Languages Finite State Automata Regular Grammars
4
How can we design a LA? Steps: Identify Tokens, Lexemes, Patterns.
Dr. Mohammad Nassef How can we design a LA? Steps: Identify Tokens, Lexemes, Patterns. Write Regular Expressions for patterns. Write Regular Definitions. Draw Transition Diagrams. Design Non-deterministic Finite Automata (NFA). Transform NFA to Deterministic Finite Automata (DFA). 4
5
Review: Takens .. Patterns .. lexemes
Dr. Mohammad Nassef Review: Takens .. Patterns .. lexemes Token Informal description Sample lexemes if Characters i, f if else Characters e, l, s, e else comparison <=, != < or > or <= or >= or == or != id Letter followed by letter and digits pi, score, D2 number Any numeric constant , 0, 6.02e23 literal Anything but “ sorrounded by “ “core dumped” printf(“total = %d\n”, score);
6
Lexeme and Token semicolon ; int_literal 17 plus_op + identifier Count
SEG2101 Chapter 8 Winter 2007 Lexeme and Token Index = 2 * count +17; semicolon ; int_literal 17 plus_op + identifier Count multi_op * 2 equal_sign = Identifier Index Tokens Lexemes
7
How can we design a LA? Steps: Identify Tokens, Lexemes, Patterns.
Dr. Mohammad Nassef How can we design a LA? Steps: Identify Tokens, Lexemes, Patterns. Write Regular Expressions for patterns. Write Regular Definitions. Draw Transition Diagrams. Design Non-deterministic Finite Automata (NFA). Transform NFA to Deterministic Finite Automata (DFA). 7
8
Writing regular definitions
8
9
Regular Definitions SEG2101 Chapter 8 Winter 2007 If is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: d1r1 d2r2 ... dnrn where each di is a distinct name, and each ri is a regular expression over the symbols in {d1,d2,…,di-1}, i.e., the basic symbols and the previously defined names.
10
lec01-lexicalanalyzer June 25, 2018 Regular Definitions We can give names to regular expressions, and use these names as symbols to define other regular expressions. A regular definition is a sequence of definitions in the form: d1 r1 where di is a innovative symbol d2 r2 ri is a regular expression over symbols … in {d1,d2,...,di-1} dn rn 过渡:To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use regular definitions. previously defined symbols alphabet
11
Regular Definitions Example
Example: Identifiers in Pascal letter A | B | ... | Z | a | b | ... | z digit 0 | 1 | ... | 9 id letter (letter | digit )* If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex. (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *
12
Example: Recognition of tokens
lec01-lexicalanalyzer June 25, 2018 Example: Recognition of tokens stmt -> if expr then stmt | if expr then stmt else stmt | Ɛ expr -> term relop term | term term -> id | number Starting point is the language grammar to understand the tokens: digit -> [0-9] Digits -> digit+ number-> digit(.digits)? (E[+-]? Digit)? letter -> [A-Za-z_] id > letter (letter|digit)* If > if Then -> then Else -> else Relop -> < | > | <= | >= | = | <> Grammar 过渡:In the previous section we learned how to express patterns using regular expressions. Now, we will study how to examines the input string and finds a prefix that is a lexeme matching one of the patterns. The next step is to formalize the patterns! Regular Definitions We also need to handle whitespaces: ws -> (blank | tab | newline)+
13
Examples of Regular Definitions
SEG2101 Chapter 8 Winter 2007 Examples of Regular Definitions Example: Pascal Identifiers Example: floating point numbers
14
How can we design a LA? Steps: Identify Tokens, Lexemes, Patterns.
Dr. Mohammad Nassef How can we design a LA? Steps: Identify Tokens, Lexemes, Patterns. Write Regular Expressions for patterns. Write Regular Definitions. Draw Transition Diagrams. Design Non-deterministic Finite Automata (NFA). Transform NFA to Deterministic Finite Automata (DFA). 14
15
Drawing transition diagrams
15
16
Three Views Again! Three equivalent formal ways to look at this approach Specification Regular Expressions Regular Languages Finite State Automata Regular Grammars Implementation Representation
17
lec01-lexicalanalyzer June 25, 2018 Transition Diagram State: represents a condition that could occur during scanning start/initial state: accepting/final state: lexeme found intermediate state: Edge: directs from one state to another, labeled with one or a set of symbols 过渡:As an intermediate step in the construction of a lexical analyzer, we first convert patterns into transition diagrams. Here, we perform the conversion by hand, but later, you will see there is a way to construct these diagrams from collections of regular expressions. Edges补充:If we are in some state s, and the next input symbol is a, we look for an edge out of state s labeled by a. If we find such an edge, we advance the forward pointer and enter the state of the transition diagram to which that edge leads.
18
Simple notations a a* a+ (a|b)* 1 a a 1 a a, b b start a start start
start a start a 1 start a b start a, b
19
Transition Diagram FA can be represented using transition diagrams.
Corresponding to FA definition, a transition diagram has: States represented by circles; An Alphabet (Σ) represented by labels on edges; Transitions represented by labeled directed edges between states. The label is the input symbol; One Start State shown as having an arrow head; One or more Final State(s) represented by double circles. Example transition diagram to recognize (a+b)*abb q0 q3 b q2 q1 a
20
Transition diagrams Transition Diagram for ``relop < | > |< = | >= | = | <>’’
21
Transition-Diagram-Based Lexical Analyzer
lec01-lexicalanalyzer Transition-Diagram-Based Lexical Analyzer June 25, 2018 Each state is represented by a piece of code. A variable state holds the number of the current state for a transition diagram. A switch based on the value of state takes us to code for each of the possible states, where we find the action of that state. Often, the code for a state is itself a switch statement or multi way branch that determines the next state by reading and examining the next input character. Implementation of relop transition diagram
22
Transition diagrams (cont.)
A transition diagram for id's and keywords
23
Transition diagrams (cont.)
A transition diagram for floating point numbers
24
Transition diagrams (cont.)
Transition diagram for whitespace
25
Practice Draw the transition diagram for recognizing the following regular expression. (3 possible answers!) a(a+b)*a b a a|b a a 1 2 3 a a 1 2 3 b
26
Finite automata (Fa) Finite state automata (FSa) Finite state machines (fsm)
26
27
Finite Automaton Input String Output “Accept” Finite or Automaton
“Reject” Finite Automaton
28
Finite automata baaa*!
29
FSA is a 5-tuple consisting of
Formally FSA is a 5-tuple consisting of Q: set of states {q0,q1,q2,q3,q4} : an alphabet of symbols {a,b,!} q0: a start state in Q F: a set of final states in Q {q4} (q,i): a transition function mapping Q x to Q q0 q4 q1 q2 q3 b a !
30
Finite automata An FSA recognizes (accepts) strings of a regular language baa! baaa! baaaa! … Tape metaphor: will this input be accepted? b a A
31
Another View: A State Transition Table for SheepTalk
Input b a ! 1 - 2 3 4 q0 q4 q1 q2 q3 b a !
32
FSA Recognition Possible Goals
Determine whether a string should be accepted by a machine Or… determine whether a string is in the language defined by the automaton Or… determine whether a regular expression matches a string This process can be represented with an input tape
33
Input Tape a b ! q0 REJECT 1 2 3 4 b a ! Slide from Dorr/Monz 33
34
Input Tape b a ! q0 q1 q2 q3 q4 1 2 3 4 b a ! ACCEPT
1 2 3 4 b a ! Slide from Dorr/Monz
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.