Download presentation
Presentation is loading. Please wait.
1
Compiler Construction
Sohail Aslam Lecture 6 compiler: intro
2
How to Describe Tokens? Regular Languages are the most popular for specifying tokens Simple and useful theory Easy to understand Efficient implementations
3
Languages Let S be a set of characters. S is called the alphabet.
A language over S is set of strings of characters drawn from S.
4
Example of Languages Alphabet = English characters
Language = English sentences Alphabet = ASCII Language = C++ programs, Java, C#
5
Notation Languages are sets of strings (finite sequence of characters)
Need some notation for specifying which sets we want
6
Notation For lexical analysis we care about regular languages.
Regular languages can be described using regular expressions.
7
Regular Languages Each regular expression is a notation for a regular language (a set of words). If A is a regular expression, we write L(A) to refer to language denoted by A.
8
Regular Expression A regular expression (RE) is defined inductively
a ordinary character from S e the empty string
9
Regular Expression R|S = either R or S
RS = R followed by S (concatenation) R* = concatenation of R zero or more times (R*= e |R|RR|RRR...)
10
RE Extentions R? = e | R (zero or one R) R+ = RR* (one or more R)
(R) = R (grouping)
11
RE Extentions [abc] = a|b|c (any of listed) [a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything but ‘a’‘b’)
12
Regular Expression RE Strings in L(R) a “a” ab “ab” a|b “a” “b”
(a|e)b “ab” “b”
13
Example: integers integer: a non-empty string of digits
integer = digit digit*
14
Example: identifiers identifier: string or letters or digits starting with a letter C identifier: [a-zA-Z_][a-zA-Z0-9_]*
15
Recap Tokens: strings of characters representing lexical units of programs such as identifiers, numbers, operators.
16
Recap Regular Expressions: concise description of tokens. A regular expression describes a set of strings.
17
Recap Language L(R): set of strings represented by a regular expression R. L(R) is the language denoted by regular expression R.
18
How to Use REs We need mechanism to determine if an input string w belongs to L(R), the language denoted by regular expression R.
19
Acceptor Such a mechanism is called an acceptor. input string w
yes, if w e L acceptor no, if w e L language L
20
Finite Automata (FA) Specification: Regular Expressions
Implementation: Finite Automata
21
Finite Automata Finite Automaton consists of An input alphabet (S)
A set of states A start (initial) state A set of transitions A set of accepting (final) states
22
Finite Automaton State Graphs A state The start state
An accepting state
23
Finite Automaton State Graphs a A transition
24
Finite Automata A finite automaton accepts a string if we can follow transitions labelled with characters in the string from start state to some accepting state.
25
FA Example A FA that accepts only “1” 1
26
FA Example A FA that accepts any number of 1’s followed by a single 0
27
FA Example A FA that accepts ab*a Alphabet: {a,b} b a a
end of lecture 6 compiler: intro
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.