Download presentation
Presentation is loading. Please wait.
Published byRachel Jordan Modified over 9 years ago
1
Lexical Analysis - An Introduction
2
The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source language? Is the program well-formed (semantically) ? Build an IR version of the code for the rest of the compiler Source code Front End Errors Machine code Back End IR
3
The Front End Scanner Maps stream of characters into words Basic unit of syntax x = x + y ; becomes Characters that form a word are its lexeme Its part of speech (or syntactic category) is called its token type Scanner discards white space & (often) comments Source code Scanner IR Parser Errors tokens Speed is an issue in scanning use a specialized recognizer
4
The Front End Parser Checks stream of classified words (parts of speech) for grammatical correctness Determines if code is syntactically well- formed Guides checking at deeper levels than syntax Builds an IR representation of the code Source code Scanner IR Parser Errors tokens
5
The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal expr 2. expr expr op term 3. | term 4. term number 5. | id 6. op + 7. | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7}
6
The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal expr 2. expr expr op term 3. | term 4. term number 5. | id 6. op + 7. | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7} No words here! Parts of speech, not words!
7
The Big Picture Scanner Generator specifications source codeparts of speech & words tables or code Specifications written as “regular expressions” Represent words as indices into a global table
8
The Big Picture Why study lexical analysis? Goals: To simplify specification & implementation of scanners To understand the underlying techniques and technologies
9
How to implement a scanner Regular Expressions NFADFA
10
Regular Expressions Lexical patterns form a regular language *** any finite language is regular *** Regular expressions ( RE s) describe regular languages Ever type “rm *.o a.out” ?
11
Regular Expressions Regular Expression (over alphabet ) is a RE denoting the set { } If a is in , then a is a RE denoting {a} If x and y are RE s denoting L(x) and L(y) then x |y is an RE denoting L(x) L(y) xy is an RE denoting L(x)L(y) x * is an RE denoting L(x)*
12
Set Operations ( review )
13
Examples of Regular Expressions Identifiers: Letter (a|b|c| … |z|A|B|C| … |Z) Digit (0|1|2| … |9) Identifier Letter ( Letter | Digit ) * Numbers: Integer (+|-| ) (0| (1|2|3| … |9)(Digit * ) ) Decimal Integer. Digit * Real ( Integer | Decimal ) E (+|-| ) Digit * Complex ( Real, Real )
14
Regular Expressions ( the point ) Regular expressions can be used to specify the words to be translated to parts of speech by a lexical analyzer Using results from automata theory and theory of algorithms, we can automatically build recognizers from regular expressions We study REs and associated theory to automate scanner construction !
15
Consider the problem of recognizing ILOC register names Register r (0|1|2| … | 9) (0|1|2| … | 9) * Allows registers of arbitrary number Requires at least one digit RE corresponds to a recognizer (or DFA) Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register
16
DFA operation Start in state S 0 & take transitions on each input character DFA accepts a word x iff x leaves it in a final state (S 2 ) So, r17 takes it through s 0, s 1, s 2 and accepts r takes it through s 0, s 1 and fails Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register
17
Example ( continued ) To be useful, recognizer must turn into code Char next character State s 0 while (Char EOF ) State (State,Char) Char next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE r 0,1,2,3, 4,5,6,7, 8,9 All other s s0s0 s1s1 sese sese s1s1 sese s2s2 sese s2s2 sese s2s2 sese sese sese sese sese
18
Example ( continued ) r 0,1,2,3,4,5,6, 7,8,9 All other s s0s0 s 1 start s e error s e error s1s1 s e erro r s 2 add s e error s2s2 s e erro r s 2 add s e error sese s e erro r s e error s e error Char next character State s 0 while (Char EOF ) State (State,Char) perform specified action Char next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE
19
Algorithm Project 1 Open a file to read from Open a file to write to Open a file to read from Open a file to write to Create a scanner object Call a method from Scanner class to scan, classify each token and write to the output file
20
Algorithm scanner Read a line from input file c i = first character of this line Read a line from input file c i = first character of this line Recognize the meta character Current Token = meta character Print out token with new line Recognize the meta character Current Token = meta character Print out token with new line c i ==‘#’ || (c i ==‘/’ && c i+1 ==‘/’ false true C i ==‘”’ true Recognize the string (read until you reach another “) Current Token = string Print out the token Recognize the string (read until you reach another “) Current Token = string Print out the token C i is a digit Recognize the number Current Token = number Print out the token Recognize the number Current Token = number Print out the token C i is a letter Recognize the id token is not a keyword true false true Token is an ID, print with tag true false Token is a keyword Print the token Token is a keyword Print the token C i is symbol true Recognize the symbol Print Recognize the symbol Print false i+= len of token Print c i i++ Print c i i++ i+= len of token C i is symbol Not the end of file the end of line the end of file Read another line
21
Sample:test1.c #include void sample() { int b=4; printf("Helloworld %d",b); } int main() { sample(); }
22
Token list #include ---- meta void ---- keyword sample ---- id ( ---- symbol ) ---- symbol { ---- symbol int ---- 2 b ---- id = ---- symbol 4 ---- number ; ---- symbol printf ---- keyword ( ---- symbol "Helloworld %d" ---- string, ---- symbol b ---- id ) ---- symbol ; ---- symbol } ---- symbol int ---- keyword main ---- id ( ---- symbol ) ---- symbol { ---- symbol sample ---- symbol ( ---- symbol ) ---- symbol ; ---- symbol }---- symbol
23
Result #include void CS322sample() { int CS322b=4; printf("Helloworld %d",CS322b); } int CS322main() { CS322sample(); }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.