Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.

Similar presentations


Presentation on theme: "Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis."— Presentation transcript:

1 Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis

2 Chapter 2 – Part 1: Topics The Scanning Process Regular Expressions
RE Definition RE Examples Extensions of RE Operations Finite Automata Definition and Examples of DFA Stopping Conditions and Actions Implementation of DFA June 10, 2018 Prof. Abdelaziz Khamis

3 The Scanning Process Input: source code as a stream of characters
Output: meaningful units called Tokens Keywords: such as IF and THEN which represent the strings of characters “if” and “then” Symbols: such as PLUS and MINUS which represent the characters “+” and “-” Variable tokens: such as ID and NUM which represent identifiers and numbers June 10, 2018 Prof. Abdelaziz Khamis

4 The Scanning Process (Continued)
Tokens are logical entities that are usually defined as an enumeration type For example, tokens might be defined in C as typedef enum { IF, THEN, PLUS, MINUS, ID, NUM, … } TokenType; Tokens as logical entities must be distinguished from the strings of characters they represent (called sting value or lexeme) Some tokens have only one lexeme: reserved words have this property. Other tokens have many different string values: the tokens ID and NUM have this property. June 10, 2018 Prof. Abdelaziz Khamis

5 The Scanning Process (Continued)
Although the task of the scanner is to convert the entire source program into a sequence of tokens, the scanner will rarely do this all at once Instead, the scanner will operate under the control of the parser, returning the single next token from the input on demand via a function that will have the C declaration: TokenType getToken(); June 10, 2018 Prof. Abdelaziz Khamis

6 The Scanning Process (Continued)
The scanning process is a special case of pattern matching, so we need to study methods of: Pattern specification We will use regular expressions for pattern specification Pattern recognition We will use finite automata for recognizing patterns We turn now to the study of regular expressions and finite automata June 10, 2018 Prof. Abdelaziz Khamis

7 Regular Expressions A regular expression is an expression that matches sets of strings (the “language” of the regular expression) In its basic form, a regular expression is built up out of basic expressions (individual symbols) and the operations: choice (|), concatenation (no operator), and repetition (*) June 10, 2018 Prof. Abdelaziz Khamis

8 Regular Expressions (Continued)
A regular expression may also contain certain other meta-symbols:  for the empty string Parentheses for grouping (to change precedence, just as in arithmetic) Others as needed to extend the operator set in useful ways June 10, 2018 Prof. Abdelaziz Khamis

9 Regular Expressions (Continued)
Examples of Regular Expressions a(b|c)* : strings beginning with a, followed by any number of b’s and c’s ab|c* : the string ab, or the strings , c, cc, …. (a|c)*b(a|c)*: strings contain exactly one b (b|)(a|ab)* : strings of a’s and b’s with no two consecutive b’s June 10, 2018 Prof. Abdelaziz Khamis

10 Regular Expressions (Continued)
Common Extensions of R.E. operations: + : one or more repetitions of (r+ is equivalent to rr*) [] : range of characters ([abcd] = [a-d] = a|b|c|d) . : any character (.*b.*: strings that contain at least one b) ^ : negate a set ([^abc] = any character except a, b, or c) ? : optional sub-expression ((+|-)?[0-9] = [0-9] | +[0-9] | -[0-9]) \ : “escape” an operator or meta-symbol June 10, 2018 Prof. Abdelaziz Khamis

11 Regular Expressions (Continued)
More Examples of Regular Expressions: Numbers nat = [0-9]+ signedNat = [-+]? nat number = signedNat(\. nat)? ([Ee] signedNat)? Comments {this is a Pascal comment} This can be specified as: {[^}]*} -- this is an Ada comment This can be specified as: --[^newline]* Exercises (Textbook: Page 91) 2.1 (a, c, e, g, h), 2.4 June 10, 2018 Prof. Abdelaziz Khamis

12 Finite Automata A scanner is an implementation of a deterministic finite automaton (DFA) which is a machine model of a particular kind of computation (Accepting input strings according to a pattern) The following diagram shows a representation of a DFA that accepts the pattern for identifiers. letter letter | digit start accept Examples: (Textbook: Pages 51-53) 2.6, 2.7, 2.8 and 2.9 Exercises: (Textbook: Page 91) 2.8 (a, c, e, g, h) June 10, 2018 Prof. Abdelaziz Khamis

13 Finite Automata (Continued)
Parts of a DFA An alphabet (S) A finite set of states (S) A transition function (T : S x S  S) One start state (s0  S) A set of accepting states (A  S) This definition does not describe every aspect of behavior of a DFA algorithm Error handling Stopping conditions Actions when reaching an accepting state, or when matching a character during a transition June 10, 2018 Prof. Abdelaziz Khamis

14 Finite Automata (Continued)
Stopping Conditions and Actions A DFA matches the longest possible input string before stopping (Principle of Longest Substring) A typical action when making a transition is to move the character from the input string to a string token A typical action when reaching an accepting state is to return the token just recognized The following diagram expresses the principle of longest substring, the DFA continues to match letters and digits until a delimiter is found. letter start finish Letter | digit [other] return ID in_id June 10, 2018 Prof. Abdelaziz Khamis

15 Implementation of DFA There are several ways to translate a DFA into code Using doubly nested case analysis Use a variable to maintain the current state and write the transitions as a doubly nested case statement inside a loop, where the first case statement tests the current state and the nested one tests the input character, given the state. Using the transition table (Table driven) Express the DFA as a data structure and then write a “generic” code that will take its action from the data structure. A simple data structure that is adequate for this purpose is a two-dimensional array, indexed by state and input character, that expresses the values of the transition function. June 10, 2018 Prof. Abdelaziz Khamis

16 Implementation of DFA (Continued)
Using doubly nested case analysis Sample code for a TINY ID state = start; advance(input); while (state != finish && state != error) switch (state) { case start: if (isalpha(input)) { advance(input); state = in_id;} else state = error; break; case in_id: if (!isalpha(input)) state = finish; else advance(input); break; default: break; } if (state == finish) return ID; else return ERROR; June 10, 2018 Prof. Abdelaziz Khamis

17 Implementation of DFA (Continued)
Using the transition table (Table driven) Data for DFA in C++ class State{ public: string name; int index; //unique integer bool isFinal; //true if final state } State start; State next [MAXSTATES][MAXCHARS]; June 10, 2018 Prof. Abdelaziz Khamis

18 Implementation of DFA (Continued)
Using the transition table (Table driven) Implementing DFA in C++ //Construct all the states, set isFinal //Create next array from transitions bool machine(istream &sin) { for(State S=start; c=sin.get(); S=next[S.index][c]); //Action for S would go here... return S.isFinal; } June 10, 2018 Prof. Abdelaziz Khamis

19 Implementation of DFA (Continued)
Using the transition table (Table driven) Examples: a b s0 s1 s2 June 10, 2018 Prof. Abdelaziz Khamis


Download ppt "Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis."

Similar presentations


Ads by Google