Download presentation
Presentation is loading. Please wait.
Published byCandace Lee Modified over 8 years ago
1
Scanning & Regular Expressions CPSC 388 Ellen Walker Hiram College
2
Scanning Input: characters from the source code Output: Tokens –Keywords: IF, THEN, ELSE, FOR … –Symbols: PLUS, LBRACE, SEMI … –Variable tokens: ID, NUM Augment with string or numeric value
3
TokenType Enumerated type (a c++ construct) Typedef enum {IF, THEN, ELSE …} TokenType IF, THEN, ELSE (etc) are now literals of type TokenType
4
Using TokenType void someFun(TokenType tt){ … switch (tt){ case IF: … break; case THEN: … break; … }
5
Token Class (partial) class Token { public: TokenType tokenval; string tokenchars; double numval; }
6
Interlude: References and Pointers Java has primitives and references –Primitives are int, char, double, etc. –References “point to” objects C++ has only primitives –But, one of the primitives is “address”, which serves the purpose of a reference.
7
Interlude: References and Pointers To declare a pointer, put * after the type char x;// a character char *y;// a pointer to a character Using pointers: x = ‘a’; y = &x; //y gets the address of x *y = ‘b’; //thing pointed at by y becomes ‘b’; //note that x is now also b!
8
Interlude: References and Pointers Continuing the example… cout << x << endl; // prints b cout << *y << endl; // prints b cout << y << endl; // prints a hex address cout << &x << endl; // same as above cout << &y << endl; // a different address - where the pointer is stored
9
GetToken(): A scanning function Token *getToken(istream &sin) –Read characters from sin until a complete token is extracted, return (a pointer to) the token –Usually called by the parser –Note: version in the book uses global variables and returns only the token type
10
Using GetToken Token *myToken = GetToken(cin); while (myToken != NULL){ //process the token switch (myToken->TokenType){ //cases for each token type } myToken = GetToken(cin); }
11
Result of GetToken
12
Tokens and Languages The set of valid tokens of a particular type is a Language (in the formal sense) More specifically, it is a Regular Language
13
Language Formalities Language: set of strings String: sequence of symbols Alphabet: set of legal symbols for strings –Generally is used to denote an alphabet
14
Example Languages L1 = {aa, ab, bb}, = {a, b} L2 = { ,ab, abab, … }, = {a, b} L3 = {strings of N a’s where N is an odd integer}, = {a} L4 = { } (one string with no symbols) L5 = { } (no strings at all) L5 = Ø
15
Denoting Languages Expressions (regular languages only) Grammars –Set of rewrite rules that express all and only the strings in the language Automata –Machines that “accept” all and only the strings in the language
16
Primitive Regular Expressions –L( ) = {}(no strings) –L( ) = { }(one string, no symbols) a where a is a member of –L(a) = {a}(one string, one symbol)
17
Combining Regular Expressions Choice: r | s (sometimes r+s) –L(r | s) = L(r ) L(s) Concatenation: rs –L(rs) = L(r)L(s) –All combinations of 1 from r and 1 from s Repetition: r* –L(r*) = L(r ) L(rr) L(rrr ) … –0 or more strings from r concatenated
18
Precedence Repetition before concatenation Concatenation before choice Use parentheses to override aa* vs. (aa)* ab|c vs. a(b|c)
19
Example Languages L1 = {aa, ab, bb}, = {a, b} L2 = { ,ab, abab, … }, S = {a, b} L3 = {strings of N a’s where N is an odd integer}, S = {a} L4 = { } (one string with no symbols) L5 = { } (no strings at all) L5 = Ø
20
R.E.’s for Examples L1 = aa | ab | bb L1 = a(a|b) | bb L1 = aa | (a|b) b L2 = (ab)* not ab* ! L3 = a(aa)*
21
What are these languages? a* | b* | c* a*b*c* (a*b*)* a(a|b)*c (a|b|c)*bab(a|b|c)*
22
What are the RE’s? In the alphabet {a,b,c}: –All strings that are in alphabetical order –All strings that have the first a before the first b, before the first c, e.g. ababbabca –All strings that contain “abc” –All strings that do not contain “abc”
23
Extended Reg. Exp’s Additional operations for convenience r+ = rr* (one or more reps). ( any character in the alphabet).* = any possible string from the alphabet [a-z] = a|b|c|…|z [^aeiou] = b|c|d|f|g|h|j...
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.