Download presentation
Presentation is loading. Please wait.
Published byElaine Arnold Modified over 9 years ago
1
1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS400 Compiler Construction
2
2 Formalization January 18, 2016 2 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators Regular Expressions Finite Automata RE Conversion FA Lexer Design
3
3 January 18, 2016 3 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Keep in mind with following questions Regular Expressions –a–a concise and flexible means for identifying strings of text –w–written in a formal language –I–Interpreted by a RegEx processor Why RegEx –P–Precise definition of language –L–Layered definition of language –L–Lexical/Syntax/Semantic Further use of RegEx –S–Supportive foundation of Lexer –F–Formal communication –C–Common application ***
4
4 January 18, 2016 4 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Why: Language Definition Problem How to precisely define language Layered structure of language definition –Start with a set of letters in language –Lexical structure - identifies “words” in language (each word is a sequence of letters) –Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) –Semantics - meaning of program (specifies what result should be for each input) –Today’s topic: lexical and syntactic structures
5
5 Basis symbols: – is a regular expression denoting language { } –a is a regular expression denoting {a} If r and s are regular expressions denoting languages L(r) and M(s) respectively, then –r s is a regular expression denoting L(r) M(s) –rs is a regular expression denoting L(r)M(s) –r * is a regular expression denoting L(r) * –(r) is a regular expression denoting L(r) A language defined by a regular expression is called a regular set January 18, 2016 5 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Regular Expressions
6
6 January 18, 2016 6 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
7
7 January 18, 2016 7 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
8
8 Specification of Patterns for Tokens: Regular Definitions Regular definitions introduce a naming convention: d 1 r 1 d 2 r 2 … d n r n where each r i is a regular expression over {d 1, d 2, …, d i-1 } Any d j in r i can be textually substituted in r i to obtain an equivalent set of definitions January 18, 2016 8 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
9
9 Specification of Patterns for Tokens: Regular Definitions Example: letter A B … Z a b … z digit 0 1 … 9 id letter ( letter digit ) * Regular definitions are not recursive: digits digit digits digitwrong! January 18, 2016 9 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
10
10 Specification of Patterns for Tokens: Notational Shorthand The following shorthands are often used: r + = rr * r? = r [ a - z ] = a b c … z Examples: digit [ 0 - 9 ] num digit + (. digit + )? ( E (+ -)? digit + )? January 18, 2016 10 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
11
11 Regular Definitions and Grammars stmt if expr then stmt if expr then stmt else stmt expr term relop term term term id num if if then then else else relop > >= = id letter ( letter | digit ) * num digit + (. digit + )? ( E (+ -)? digit + )? Grammar Regular definitions January 18, 2016 11 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
12
12 Coding Regular Definitions in Transition Diagrams January 18, 2016 12 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
13
13 Coding Regular Definitions in Transition Diagrams: Code token nexttoken() { while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<’) state = 1; else if (c==‘=’) state = 5; else if (c==‘>’) state = 6; else state = fail(); break; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break; … int fail() { forward = token_beginning; swith (start) { case 0: start = 9; break; case 9: start = 12; break; case 12: start = 20; break; case 20: start = 25; break; case 25: recover(); break; default: /* error */ } return start; } Decides the next start state to check January 18, 2016 13 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
14
14 Common Application of Regular Expressions January 18, 2016 14 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Validate passwords and email addresses Extract specific sections from an HML page Parse data files Replace values (strings)
15
15 January 18, 2016 15 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction Got it with following questions Regular Expressions –a–a concise and flexible means for identifying strings of text –w–written in a formal language –I–Interpreted by a RegEx processor Why RegEx –P–Precise definition of language –L–Layered definition of language –L–Lexical/Syntax/Semantic Further use of RegEx –S–Supportive foundation of Lexer –F–Formal communication –C–Common application ***
16
16 Thank you very much! Questions? January 18, 2016 16 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, http://www.apu.edu/clas/computerscience/http://www.apu.edu/clas/computerscience/ CS@APU: CS400 Compiler Construction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.