Download presentation
Presentation is loading. Please wait.
1
Lecture 2 Lexical Analysis Topics Sample Simple Compiler Operations on strings Regular expressions Finite AutomataReadings: January 11, 2006 CSCE 531 Compiler Construction
2
– 2 – CSCE 531 Spring 2006 Overview Last Time A little History Compilers vs Interpreter Data-Flow View of Compilers Regular Languages Course Pragmatics Today’s Lecture Why Study Compilers? xxReferences Chapter 2, Chapter 3 Assignment Due Wednesday Jan 18 3.3a; 3.5a,b; 3.6a,b,c; 3.7a; 3.8b
3
– 3 – CSCE 531 Spring 2006 A Simple Compiler for Expressions Chapter Two Overview Structure of the simple compiler, really just translator for infix expressions postfix Structure of the simple compiler, really just translator for infix expressions postfix Grammars Grammars Parse Trees Parse Trees Syntax directed Translation Syntax directed Translation Predictive Parsing Predictive Parsing Translator for Simple Expressions Translator for Simple Expressions Grammar Rewritten grammar (equivalent one better for pred. parsing) Parsing modules fig 2.24 Specification of Translator fig 2.35 Structure of translator fig 2.36
4
– 4 – CSCE 531 Spring 2006 Grammars Grammar (or a context free grammar more correctly) has A set of tokens also known as terminals A set of tokens also known as terminals A set of nonterminals A set of nonterminals A set of productions of the form nonterminal sequence of tokens and/or nonterminals A set of productions of the form nonterminal sequence of tokens and/or nonterminals A special nonterminal the start symbol. A special nonterminal the start symbol.Example E E + E E E * E E digit
5
– 5 – CSCE 531 Spring 2006 Derivations A derivation is a sequence of rewriting of a string of grammar symbols using the productions in a grammar. We use the symbol to denote that one string of grammar symbols is obtained by rewritting another using a production X Y if there is a production N β where The nonterminal N occurs in the sequence X of Grammar symbols And Y is the same as X except β replaces the NExample E E+E d+E d+ E*E d+ E+E*E d+d+E*E d+d+d*E d+d+d*d
6
– 6 – CSCE 531 Spring 2006 Parse Trees A graphical presentation of a derivation, satisfying Root is the start symbol Root is the start symbol Each leaf is a token or ε (note different font from text) Each leaf is a token or ε (note different font from text) Each interior node is a nonterminal Each interior node is a nonterminal If A is a parent with children X 1, X 2 … X n then A X 1 X 2 … X n is a production If A is a parent with children X 1, X 2 … X n then A X 1 X 2 … X n is a production
7
– 7 – CSCE 531 Spring 2006 Syntax directed Translation Frequently the rewritting by a production will be called a reduction or reducing by the particular production. Syntax directed translation attaches action (code) that are done when the reductions are performed Example E E + T{print(‘+’);} E E - T{print(‘-’);} E T T 0 {print(‘0’);} T 1 {print(‘1’);} … T 9 {print(‘9’);}
8
– 8 – CSCE 531 Spring 2006 Equivalent Grammars
9
– 9 – CSCE 531 Spring 2006 Specification of the translator S L eoffigure 2.38 L E ; L L Є E T E’ E’ + T { print(‘+’); } E’ E’ - T { print(‘-’); } E’ E Є T F T’ T’ * F { print(‘*’); } T’ T’ / F { print(‘/’); } T’ T Є F ( E ) F id{ print(id.lexeme);} F num{ print(num.value);}
10
– 10 – CSCE 531 Spring 2006 Translating to code E T E’ E’ + T { print(‘+’); } E’ E’ - T { print(‘-’); } E’ E Є Expr(){ int t; term();while(1) switch(lookahead){ switch(lookahead){ case ‘+’: case ‘-’: case ‘+’: case ‘-’: t = lookahead; match(lookahead); term(); emit(t, NONE); continue;…
11
– 11 – CSCE 531 Spring 2006 Overview of the Code Figure 2.36 /class/csce531-001
12
– 12 – CSCE 531 Spring 2006 Operations on Strings A language over an alphabet is a set of strings of characters from the alphabet. Operations on strings: let x=x 1 x 2 …x n and t=t 1 t 2 …t m then Concatenation: xt =x 1 x 2 …x n t 1 t 2 …t m Concatenation: xt =x 1 x 2 …x n t 1 t 2 …t m Alternation: x|t = either x 1 x 2 …x n or t 1 t 2 …t m Alternation: x|t = either x 1 x 2 …x n or t 1 t 2 …t m
13
– 13 – CSCE 531 Spring 2006 Operations on Sets of Strings Operations on sets of strings: For these let S = {s 1, s 2, … s m } and R = {r 1, r 2, … r n } Alternation: S | T = S U T = {s 1, s 2, … s m, r 1, r 2, … r n } Alternation: S | T = S U T = {s 1, s 2, … s m, r 1, r 2, … r n } Concatenation: Concatenation: ST ={st | where s Є S and t Є T} = { s 1 r 1, s 1 r 2, … s 1 r n, s 2 r 1, … s 2 r n, … s m r 1, … s m r n } = { s 1 r 1, s 1 r 2, … s 1 r n, s 2 r 1, … s 2 r n, … s m r 1, … s m r n } Power: S 2 = S S, S 3 = S 2 S, S n =S n-1 S Power: S 2 = S S, S 3 = S 2 S, S n =S n-1 S What is S 0 ? Kleene Closure: S* = U ∞ i=0 S i, note S 0 = is in S* Kleene Closure: S* = U ∞ i=0 S i, note S 0 = is in S*
14
– 14 – CSCE 531 Spring 2006 Operations cont. Kleene Closure Powers: Powers: S 2 = S S S 3 = S 2 S … S n =S n-1 S What is S 0 ? Kleene Closure: S* = U ∞ i=0 S i, note S 0 = is in S* Kleene Closure: S* = U ∞ i=0 S i, note S 0 = is in S*
15
– 15 – CSCE 531 Spring 2006 Examples of Operations on Sets of Strings Operations on sets of strings: For these let S = {a,b,c} and R = {t,u} Alternation: S | T = S U T = {a,b,c,t,u } Alternation: S | T = S U T = {a,b,c,t,u } Concatenation: Concatenation: ST ={st | where s Є S and t Є T} = { at, au, bt, bu, ct, cu} = { at, au, bt, bu, ct, cu} Power: S 2 = { aa, ab, ac, ba, bb, bc, ca, cb, cc} Power: S 2 = { aa, ab, ac, ba, bb, bc, ca, cb, cc} S 3 = { aaa, aab, aac, … ccc} 27 elements Kleene closure: S* = {any string of any length of a’s, b’s and c’s} Kleene closure: S* = {any string of any length of a’s, b’s and c’s}
16
– 16 – CSCE 531 Spring 2006 Examples of Operations on Sets of Strings
17
– 17 – CSCE 531 Spring 2006 Regular Expressions For a given alphabet Σ the following are regular expressions: If a Є Σ then a is a regular expression and L(a) = { a } If a Є Σ then a is a regular expression and L(a) = { a } Є is a regular expression and L(Є) = { Є } Є is a regular expression and L(Є) = { Є } Φ is a regular expression and L(Φ) = Φ Φ is a regular expression and L(Φ) = Φ And if s and t are regular expressions denoting languages L(s) and L(t) respectively then And if s and t are regular expressions denoting languages L(s) and L(t) respectively then st is a regular expression and L(st) = L(s) L(t) s | t is a regular expression and L(s | t) = L(s) U L(t) s* is a regular expression and L(s*) = L(s)*
18
– 18 – CSCE 531 Spring 2006 Why Regular Expressions? We use regular expressions to describe the tokens Examples: Reg expr for C identifiers Reg expr for C identifiers C identifiers? Any string of letters, underscores and digits that start with a letter or underscore ID reg expr = (letter | underscore) (letter | underscore | digit)* Or more explicitly ID reg expr = ( a|b|…|z|_)(a|b|…z|_|0|1…|9)*
19
– 19 – CSCE 531 Spring 2006 Pop Quiz Given r and s are regular expressions then What is rЄ ? r | Є ? What is rЄ ? r | Є ? Describe the Language denoted by 0*110* Describe the Language denoted by 0*110* Describe the Language denoted by (0|1)*110* Describe the Language denoted by (0|1)*110* Give a regular expression for the language of 0’s and 1’s such that end in a 1 Give a regular expression for the language of 0’s and 1’s such that end in a 1 Give a regular expression for the language of 0’s and 1’s such that every 0 is followed by a 1 Give a regular expression for the language of 0’s and 1’s such that every 0 is followed by a 1
20
– 20 – CSCE 531 Spring 2006 Recognizers of Regular Languages To develop efficient lexical analyzers (scanners) we will rely on a mathematical model called finite automata, similar to the state machines that you have probably seen. In particular we will use deterministic finite automata, DFAs. The construction of a lexical analyzer will then proceed as: Identify all tokens Develop regular expressions for each Convert the regular expressions to finite automata Use the transition table for the finite automata as the basis for the scanner We will actually use the tools lex and/or flex for steps 3 and 4.
21
– 21 – CSCE 531 Spring 2006 Transition Diagram for a DFA Start in state s 0 then if the input is “f” make transition to state s 1. The from state s 1 if the input is “o” make transition to state s 2. And from state s 2 if the input is “r” make transition to state s 3. The double circle denotes an “accepting state” which means we recognized the token. Actually there is a missing state and transition f or s0s0 s1s1 s2s2 s3s3
22
– 22 – CSCE 531 Spring 2006 Now what about “fort” The string “fort” is an identifier, not the keyword “for” followed by “t.” Thus we can’t really recognize the token until we see a terminator – whitespace or a special symbol ( one of,;(){}[]
23
– 23 – CSCE 531 Spring 2006 Deterministic Finite Automata A Deterministic finite automaton (DFA) is a mathematical model that consists of 1. a set of states S 2. a set of input symbols ∑, the input alphabet 3. a transition function δ: S x ∑ S that for each state and each input maps to the next state 4. a state s 0 that is distinguished as the start state 5. a set of states F distinguished as accepting (or final) states
24
– 24 – CSCE 531 Spring 2006 DFA to recognize keyword “for” Σ= {a,b,c …z, A,B,…Z,0,…9,’,’, ‘;’, …} S = {s 0, s 1, s 2, s 3, s dead } s 0, is the start state S F = {s 3 } δ given by the table below forOthers s0s0s0s0 s1s1s1s1 s dead s1s1s1s1 s2s2s2s2 s3s3s3s3
25
– 25 – CSCE 531 Spring 2006 Language Accepted by a DFA A string x 0 x 1 …x n is accepted by a DFA M = (Σ, S, s 0, δ, S F ) if s i+1 = δ(s i, x i ) for i=0,1, …n and s n+1 Є S F i.e. if x 0 x 1 …x n determines a path through the state diagram for the DFA that ends in an Accepting State. i.e. if x 0 x 1 …x n determines a path through the state diagram for the DFA that ends in an Accepting State. Then the language accepted by the DFA M = (Σ, S, s 0, δ, S F ), denoted L(M) is the set of all strings accepted by M.
26
– 26 – CSCE 531 Spring 2006 What is the Language Accepted by…
27
– 27 – CSCE 531 Spring 2006 DFA1.c /* * Deteministic Finite Automata Simulation * Deteministic Finite Automata Simulation * * One line of input is read and then processed character by character. * One line of input is read and then processed character by character. * Thus '\n' (EOL) is treated as the end of input. * Thus '\n' (EOL) is treated as the end of input. * The major functions are: * The major functions are: *delta(s,c) - that implements the tranistion function, and *delta(s,c) - that implements the tranistion function, and *accept(s) - that tells whether state s is an accepting state or not. *accept(s) - that tells whether state s is an accepting state or not. * The particular DFA recognizes strings of digits that end in 000. * The particular DFA recognizes strings of digits that end in 000. * The DFA has: * The DFA has: * S = {0, 1, 2, 3, DEAD_STATE} * S = {0, 1, 2, 3, DEAD_STATE} * Transitions on 0: S0=>S1, S1=>S2, S2=>S3, S3=>S3 * Transitions on 0: S0=>S1, S1=>S2, S2=>S3, S3=>S3 * Transitions on non-zero digits: S0=>S0, S1=>S0, S2=>S0, S3=>S0 * Transitions on non-zero digits: S0=>S0, S1=>S0, S2=>S0, S3=>S0 * Transitions on non-digits: Si=> DEAD_STATE * Transitions on non-digits: Si=> DEAD_STATE * */ */
28
– 28 – CSCE 531 Spring 2006 #include #include #define DEAD_STATE -1 #define ACCEPT 1 #define DO_NOT 0 #define EOL '\n' main(){ int c; int c; int state; int state; state = 0; state = 0; while((c = getchar()) != EOL && state != DEAD_STATE){ while((c = getchar()) != EOL && state != DEAD_STATE){ state = delta(state, c); } if(accept(state)){ if(accept(state)){printf("Accept!\n"); }else{ }else{ printf("Do not accept!\n"); }}
29
– 29 – CSCE 531 Spring 2006 /* DFA Transition function delta */ /* delta(s,c) = transition from state s on input c */ int delta(int s, int c){ switch (s){ switch (s){ case 0: if (c == '0') return 1; case 0: if (c == '0') return 1; else if((c > '0') && (c '0') && (c <= '9')) return 0; else return(DEAD_STATE); else return(DEAD_STATE);break; case 1: if (c == '0') return 2; case 1: if (c == '0') return 2; else if((c > '0') && (c '0') && (c <= '9')) return 0; else return(DEAD_STATE); else return(DEAD_STATE);break; case 2: if (c == '0') return 3; case 2: if (c == '0') return 3; else if((c > '0') && (c '0') && (c <= '9')) return 0; else return(DEAD_STATE); else return(DEAD_STATE);break; case 3: if (c == '0') return 3; case 3: if (c == '0') return 3; else if((c > '0') && (c '0') && (c <= '9')) return 0; else return(DEAD_STATE); else return(DEAD_STATE);break; case DEAD_STATE: return DEAD_STATE; case DEAD_STATE: return DEAD_STATE;break; default: default: printf("Bad State\n"); return(DEAD_STATE); return(DEAD_STATE); }}
30
– 30 – CSCE 531 Spring 2006 int accept(state){ if (state == 3) return ACCEPT; if (state == 3) return ACCEPT; else return DO_NOT; else return DO_NOT;}
31
– 31 – CSCE 531 Spring 2006 Non-Deterministic Finite Automata What does deterministic mean? In a Non-Deterministic Finite Automata (NFA) we relax the restriction that the transition function maps every state and every element of the alphabet to a unique state, i.e. In a Non-Deterministic Finite Automata (NFA) we relax the restriction that the transition function δ maps every state and every element of the alphabet to a unique state, i.e. δ: S x ∑ S An NFA can: Have multiple transitions from a state for the same input Have Є transitions, where a transition from one state to another can be accomplished without consuming an input character Not have transitions defined for every state and every input Note for NFAs where is the power set of S Note for NFAs δ: S x ∑ 2 S where is the power set of S
32
– 32 – CSCE 531 Spring 2006 Language Accepted by an NFA A string x 0 x 1 …x n is accepted by an NFA M = (Σ, S, s 0, δ, S F ) if s i+1 = δ(s i, x i ) for i=0,1, …n and s n+1 Є S F i.e. if x 0 x 1 …x n can determines a path through the state diagram for the NFA that ends in an Accepting State, taking Є where ever necessary. i.e. if x 0 x 1 …x n can determines a path through the state diagram for the NFA that ends in an Accepting State, taking Є where ever necessary. Then the language accepted by the DFA M = (Σ, S, s 0, δ, S F ), denoted L(M) is the set of all strings accepted by M.
33
– 33 – CSCE 531 Spring 2006 Language Accepted by an NFA
34
– 34 – CSCE 531 Spring 2006 Thompson Construction For any regular expression R construct an NFA, M, that accepts the language denoted by R, i.e., L(M) = L(R).
35
– 35 – CSCE 531 Spring 2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.