Download presentation
Presentation is loading. Please wait.
Published byHelen Foster Modified over 9 years ago
1
1 November 1, 2015 1 November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS400 Compiler Construction
2
2 The Reason Why Lexical Analysis is a Separate Phase Simplifies the design of the compiler –LL(1) or LR(1) parsing with 1 token lookahead would not be possible (multiple characters/tokens to match) Provides efficient implementation –Systematic techniques to implement lexical analyzers by hand or automatically from specifications –Stream buffering methods to scan input Improves portability –Non-standard symbols and alternate character encodings can be normalized (e.g. trigraphs) November 1, 2015 2 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
3
3 Interaction of the Lexical Analyzer with the Parser Lexical Analyzer Parser Source Program Token, tokenval Symbol Table Get next token error November 1, 2015 3 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
4
4 Attributes of Tokens Lexical analyzer y := 31 + 28*x Parser token tokenval (token attribute) November 1, 2015 4 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
5
5 Formalization November 1, 2015 5 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators Regular Expressions Finite Automata RE Conversion FA Lexer Design
6
6 November 1, 2015 6 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Keep in mind following questions Token –Lexical units –Atom parse element –Abstracted in syntax: e.g. Id Lexeme –Specific string making up token –Value / attribute related to a token –Concrete in language, e.g., Amt Spec of patterns for tokens –Alphabet - a finite set –String s - a finite sequence from –Language – a specific set of strings
7
7 Tokens, Patterns, and Lexemes A token is a classification of lexical units –For example: id and num Lexemes are the specific character strings that make up a token –For example: abc and 123 Patterns are rules describing the set of lexemes belonging to a token –For example: “letter followed by letters and digits” and “non-empty sequence of digits” November 1, 2015 7 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
8
8 An alphabet is a finite set of symbols (characters) A string s is a finite sequence of symbols from – s denotes the length of string s – denotes the empty string, thus = 0 A language is a specific set of strings over some fixed alphabet November 1, 2015 8 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Definitions
9
9 Specification of Patterns for Tokens: String Operations The concatenation of two strings x and y is denoted by xy The exponentation of a string s is defined by s 0 = s i = s i-1 s for i > 0 note that s = s = s November 1, 2015 9 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
10
10 Union L M = {s s L or s M} Concatenation LM = {xy x L and y M} Exponentiation L 0 = { }; L i = L i-1 L Kleene closure L * = i=0,…, L i Positive closure L + = i=1,…, L i November 1, 2015 10 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Language Operations
11
11 Basis symbols: – is a regular expression denoting language { } –a is a regular expression denoting {a} If r and s are regular expressions denoting languages L(r) and M(s) respectively, then –r s is a regular expression denoting L(r) M(s) –rs is a regular expression denoting L(r)M(s) –r * is a regular expression denoting L(r) * –(r) is a regular expression denoting L(r) A language defined by a regular expression is called a regular set November 1, 2015 11 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Regular Expressions
12
12 Nondeterministic Finite Automata An NFA is a 5-tuple (S, , , s 0, F) where S is a finite set of states is a finite set of symbols, the alphabet is a mapping from S to a set of states s 0 S is the start state F S is the set of accepting (or final) states November 1, 2015 12 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
13
13 Conversion of an NFA into a DFA The subset construction algorithm converts an NFA into a DFA using: -closure(s) = {s} {t s … t} -closure(T) = s T -closure(s) move(T,a) = {t s a t and s T} The algorithm produces: Dstates is the set of states of the new DFA consisting of sets of states of the NFA Dtran is the transition table of the new DFA November 1, 2015 13 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
14
14 November 1, 2015 14 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Got it with following questions Tokens –L–Lexical units –A–Atom parse element –A–Abstracted in syntax: e.g. Id Lexeme –S–Specific string making up token –V–Value / attribute related to a token –C–Concrete in language, e.g., Amt Spec of patterns for tokens –A–Alphabet - a finite set –S–String s - a finite sequence from –L–Language – a specific set of strings
15
15 Thank you very much! Questions? November 1, 2015 15 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.