Download presentation
Presentation is loading. Please wait.
Published byMervyn Claud Morton Modified over 9 years ago
1
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion 8Interpretation 9Review Supplementary material: Theoretical foundations (Regular expressions)
2
2 Regular Expressions finite state machine is a good “visual” aid –but it is not very suitable as a specification (its textual description is too clumsy) regular expressions are a suitable specification –a more compact way to define a language that can be accepted by an FSM used to give the lexical description of a programming language –define each “token” (keywords, identifiers, literals, operators, punctuation, etc) –define white-space, comments, etc these are not tokens, but must be recognized and ignored
3
3 Example: Pascal identifier Lexical specification (in English): –a letter, followed by zero or more letters or digits Lexical specification (as a regular expression): –letter. (letter | digit)* |means "or".means "followed by“ (dot may be omitted) *means zero or more instances of ( )are used for grouping
4
4 Operands of a regular expression Operands are same as labels on the edges of an FSM –single characters, or –the special character (the empty string) "letter" is a shorthand for –a | b | c |... | z | A | B | C |... | Z "digit“ is a shorthand for –0 | 1 | 2 | … | 9 sometimes we put the characters in quotes –necessary when denoting |. * ( )
5
5 Precedence of |. * operators. Consider regular expressions: –letter.letter | digit* –letter.(letter | digit)* Regular Expression Operator Analogous Arithmetic Operator Precedence |pluslowest.timesmiddle *exponentiationhighest
6
6 TEST YOURSELF Question 1: Describe (in English) the language defined by each of the following regular expressions: –letter (letter* | digit*) –(letter | _ ) (letter | digit | _ )* –digit* "." digit* –digit digit* "." digit digit*
7
7 TEST YOURSELF Question 2: Write a regular expression for each of these languages: –The set of all C ++ reserved words Examples: if, while, for, class, int, case, char, true, false –C ++ string literals that begin with ” and end with ” and don’t contain any other ” except possibly in the escape sequence \” Example: ”The escape sequence \” occurs in this string” –C ++ comments that begin with /* and end with */ and don’t contain any other */ within the string Example: /* This is a comment * still the same comment */
8
8 Example: Integer Literals An integer literal with an optional sign can be defined in English as: –“(nothing or + or -) followed by one or more digits” The corresponding regular expression is: –(+|-| ) (digit.digit*) A new convenient operator ‘+’ –same precedence as ‘*’ –digit digit* is the same as –digit + which means "one or more digits"
9
9 Language Defined by a Regular Expression Recall: language = set of strings Language defined by an automaton –the set of strings accepted by the automaton Language defined by a regular expression –the set of strings that match the expression Regular Exp.Corresponding Set of Strings {""} a{"a"} a.b.c{"abc"} a | b | c{"a", "b", "c"} (a | b | c)*{"", "a", "b", "c", "aa", "ab",..., "bccabb"...}
10
10 Concept of Reg Exp Generating a String Rewrite regular expression until have only a sequence of letters (string) left Example (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 1 (0|1)* 2 (0|1)* 1 2 (0|1)* 1 2 (0|1) (0|1)* 1 2 (0|1) 1 2 0 Replacement Rules 1) r 1 | r 2 ––> r 1 2) r 1 | r 2 ––> r 2 3) r* ––> r r* 4) r* ––>
11
11 Non–determinism in Generation Different rule applications may yield different final results Example 1 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 1 (0|1)* 2 (0|1)* 1 2 (0|1)* 1 2 (0|1) (0|1)* 1 2 (0|1) 1 2 0 Example 2 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 0 (0|1)* 2 (0|1)* 0 2 (0|1)* 0 2 (0|1) (0|1)* 0 2 (0|1) 0 2 1
12
12 Concept of Language Generated by Reg Exp Set of all strings generated by a regular expression is the language of the regular expression In general, language may be infinite String generated by regular expression language is often called a “token”
13
13 Examples of Languages and Reg Exp = { 0, 1,. } –(0 | 1) + "." (0 | 1)* | (0 | 1)* "." (0 | 1) + binary floating point numbers –(0 0)* even-length all-zero strings –1* (0 1* 0 1*)* binary strings with even number of zeros = { a,b,c, 0, 1, 2 } –(a|b|c)(a|b|c|0|1|2)* alphanumeric identifiers –(0|1|2) + trinary numbers
14
14 Reg Exp Notational Shorthand R + one or more strings of R: R(R*) R?optional R: (R| ) [abcd] one of listed characters: (a|b|c|d) [a-z] one character from this range: (a|b|c|d...|z) [^abc] anything but one of the listed chars [^a-z] any one character not from this range
15
15 Equivalence of FSM and Regular Expressions Theorem: –For each finite state machine M, we can construct a regular expression R such that M and R accept the same language. –[proof omitted] Theorem: –For each regular expression R, we can construct a finite state machine M such that R and M accept the same language. –[proof outline follows]
16
16 Regular Expressions to NFSM (1) For each kind of reg exp, define a NFSM –Notation: NFSM for reg exp M M For For input a a
17
17 Regular Expressions to NFSM (2) For A. B A B For A | B B A
18
18 Regular Expressions to NFSM (3) For A* A
19
19 Example of RegExp -> NFSM conversion Consider the regular expression (1|0)*1 The NFSM is 1 0 1 A B C D E F G H IJ
20
20 Converting NFSM to DFSM Simulate the NFSM Each state of DFSM – is a non-empty subset of states of the NFSM Start state of DFSM – is the set of NFSM states reachable from the NFSM start state using only -moves Add a transition S a > S’ to DFSM iff –S’ is the set of NFSM states reachable from any state in S after consuming only the input a, considering -moves as well
21
21 Remarks on converting NFSM to DFSM An NFSM may be in many states at any time How many different states ? If there are N states, the NFSM must be in some subset of those N states How many subsets are there? 2 N = finitely many For example, if N = 5 then 2 N = 32 subsets
22
22 NFSM -> DFSM Example 1 0 1 A B C D E F G H IJ ABCDHI FGHIABCD EJGHIABCD 0 1 0 1 0 1
23
23 TEST YOURSELF Question 3: First convert each of these regular expressions to a NFSM –(a | b | ) (a | b) –(ab | ba)* (aa | bb) Question 4: Next convert each resulting NFSM to a DFSM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.