Download presentation
Presentation is loading. Please wait.
1
Regular Definition and Transition Diagrams
CSE-415
2
Contents Regular Definition Transition Diagrams - Node + Edges
Identifiers Keywords Relational Operators Constants Unsigned Number White Spaces
3
Regular Definition Giving names to regular expressions is referred to as a Regular definition. If Σ is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: d1→r1 d2→r2 ……… dn→rn where, Each di is a new symbol, not in Σ . Each ri is a regular expression.
4
Example Strings of letters, digits and underscores are C Identifiers. Regular definition for the language of C identifiers: letter →A|B|….|Z|a|b|….|z|_ digit→0|1|….|9 id→letter(letter|digit)*
5
Transition Diagrams We represent patterns in form of flow charts, called transition diagram. Consists of set of nodes and edges that connect one state to another.
6
White Space delim * other delim start q0 q1 q2
delim -> blank| new line | tab
7
Node Nodes represents states.
State represents what characters we have seen between lexemeBegin pointer and forward pointer. Transition diagram always begin with start state q0. One or more states are final state(s). We may attach action to final states to indicate that token is returned to parser.
8
Continue… Retraction: lexeme does not include the symbol that got us to accepting state, then we retract forward pointer one position. We place a * near the final state.
9
Identifiers/Keywords
letter or digit * other letter return(getToken(), InstallID()) start q0 q1 q2 Pattern for identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])* Regular Definition: Id -> letter(letter | digit)* First Approach:
10
More… Install reserved words in symbol table initially as special entries. installID(): When an identifier is found , a call to installID() places it in symbol table if it is not already there and returns a pointer to symbol table entry for the lexeme found. Any identifier not in symbol table during LA can not be a reserved word so it’s token is id. getToken(): The function getToken() examines the symbol table entry for the lexeme found and return whatever token name the symbol table says this lexeme represents- either id or one of the keyword token initially installed in table.
11
Keywords Second Approach: blank or newline * return(1 ,) start E N D
q0 q1 q2 q3 q4 blank or newline L * return(2 ,) S E q5 q6 q7 q8 blank or newline I F * return(3 ,) q9 q10 q11 Pattern for keywords: e( nd | lse ) |if Second Approach:
12
Relational Operators * start < Not = or < return(relop ,LT ) q0
return(relop , LE) = q8 * return(relop,EQ) q11
13
Constants digit * not digit digit return(number, INSTALL()) start q0
Pattern for constant: [0-9] ([0-9])* Regular Definition: Id -> digit(digit)*
14
Unsigned Numbers Speed of Light: 3.0 E+8 m/sec
digit digit digit + or - * . digit other start digit digit E q7 q0 q1 q2 q3 q4 q5 q6 E * * digit other other q8 q9 return(number, pointer to table) Pattern for Unsigned Number: [0-9] ([0-9])* (.[0-9] ([0-9])*)) ? (E[+|-]? [0-9] ([0-9])*))?
15
End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.