Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.

Similar presentations


Presentation on theme: "Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source."— Presentation transcript:

1 Lexical Analysis - An Introduction

2 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the program well-formed (semantically) ? Build an IR version of the code for the rest of the compiler Source code Front End Errors Machine code Back End IR

3 The Front End Scanner Maps stream of characters into words Basic unit of syntax x = x + y ; becomes Characters that form a word are its lexeme Its part of speech (or syntactic category) is called its token type Scanner discards white space & (often) comments Source code Scanner IR Parser Errors tokens Speed is an issue in scanning  use a specialized recognizer

4 The Front End Parser Checks stream of classified words (parts of speech) for grammatical correctness Determines if code is syntactically well- formed Guides checking at deeper levels than syntax Builds an IR representation of the code Source code Scanner IR Parser Errors tokens

5 The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal  expr 2. expr  expr op term 3. | term 4. term  number 5. | id 6. op  + 7. | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7}

6 The Big Picture Language syntax is specified with parts of speech, not words Syntax checking matches parts of speech against a grammar 1. goal  expr 2. expr  expr op term 3. | term 4. term  number 5. | id 6. op  + 7. | – S = goal T = { number, id, +, - } N = { goal, expr, term, op } P = { 1, 2, 3, 4, 5, 6, 7} No words here! Parts of speech, not words!

7 The Big Picture Scanner Generator specifications source codeparts of speech & words tables or code Specifications written as “regular expressions” Represent words as indices into a global table

8 The Big Picture Why study lexical analysis? Goals: To simplify specification & implementation of scanners To understand the underlying techniques and technologies

9 How to implement a scanner Regular Expressions NFADFA

10 Regular Expressions Lexical patterns form a regular language *** any finite language is regular *** Regular expressions ( RE s) describe regular languages Ever type “rm *.o a.out” ?

11 Regular Expressions Regular Expression (over alphabet  )  is a RE denoting the set {  } If a is in , then a is a RE denoting {a} If x and y are RE s denoting L(x) and L(y) then x |y is an RE denoting L(x)  L(y) xy is an RE denoting L(x)L(y) x * is an RE denoting L(x)*

12 Set Operations ( review )

13 Examples of Regular Expressions Identifiers: Letter  (a|b|c| … |z|A|B|C| … |Z) Digit  (0|1|2| … |9) Identifier  Letter ( Letter | Digit ) * Numbers: Integer  (+|-|  ) (0| (1|2|3| … |9)(Digit * ) ) Decimal  Integer. Digit * Real  ( Integer | Decimal ) E (+|-|  ) Digit * Complex  ( Real, Real )

14 Regular Expressions ( the point ) Regular expressions can be used to specify the words to be translated to parts of speech by a lexical analyzer Using results from automata theory and theory of algorithms, we can automatically build recognizers from regular expressions  We study REs and associated theory to automate scanner construction !

15 Consider the problem of recognizing ILOC register names Register  r (0|1|2| … | 9) (0|1|2| … | 9) * Allows registers of arbitrary number Requires at least one digit RE corresponds to a recognizer (or DFA) Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

16 DFA operation Start in state S 0 & take transitions on each input character DFA accepts a word x iff x leaves it in a final state (S 2 ) So, r17 takes it through s 0, s 1, s 2 and accepts r takes it through s 0, s 1 and fails Example S0S0 S2S2 S1S1 r (0|1|2| … 9) accepting state (0|1|2| … 9) Recognizer for Register

17 Example ( continued ) To be useful, recognizer must turn into code Char  next character State  s 0 while (Char  EOF ) State   (State,Char) Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE  r 0,1,2,3, 4,5,6,7, 8,9 All other s s0s0 s1s1 sese sese s1s1 sese s2s2 sese s2s2 sese s2s2 sese sese sese sese sese

18 Example ( continued )  r 0,1,2,3,4,5,6, 7,8,9 All other s s0s0 s 1 start s e error s e error s1s1 s e erro r s 2 add s e error s2s2 s e erro r s 2 add s e error sese s e erro r s e error s e error Char  next character State  s 0 while (Char  EOF ) State   (State,Char) perform specified action Char  next character if (State is a final state ) then report success else report failure Skeleton recognizer Table encoding RE

19 Algorithm Project 1 Open a file to read from Open a file to write to Open a file to read from Open a file to write to Create a scanner object Call a method from Scanner class to scan, classify each token and write to the output file

20 Algorithm scanner Read a line from input file c i = first character of this line Read a line from input file c i = first character of this line Recognize the meta character Current Token = meta character Print out token with new line Recognize the meta character Current Token = meta character Print out token with new line c i ==‘#’ || (c i ==‘/’ && c i+1 ==‘/’ false true C i ==‘”’ true Recognize the string (read until you reach another “) Current Token = string Print out the token Recognize the string (read until you reach another “) Current Token = string Print out the token C i is a digit Recognize the number Current Token = number Print out the token Recognize the number Current Token = number Print out the token C i is a letter Recognize the id token is not a keyword true false true Token is an ID, print with tag true false Token is a keyword Print the token Token is a keyword Print the token C i is symbol true Recognize the symbol Print Recognize the symbol Print false i+= len of token Print c i i++ Print c i i++ i+= len of token C i is symbol Not the end of file the end of line the end of file Read another line

21 Sample:test1.c #include void sample() { int b=4; printf("Helloworld %d",b); } int main() { sample(); }

22 Token list #include ---- meta void ---- keyword sample ---- id ( ---- symbol ) ---- symbol { ---- symbol int ---- 2 b ---- id = ---- symbol 4 ---- number ; ---- symbol printf ---- keyword ( ---- symbol "Helloworld %d" ---- string, ---- symbol b ---- id ) ---- symbol ; ---- symbol } ---- symbol int ---- keyword main ---- id ( ---- symbol ) ---- symbol { ---- symbol sample ---- symbol ( ---- symbol ) ---- symbol ; ---- symbol }---- symbol

23 Result #include void CS322sample() { int CS322b=4; printf("Helloworld %d",CS322b); } int CS322main() { CS322sample(); }


Download ppt "Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source."

Similar presentations


Ads by Google