Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Construction

Similar presentations


Presentation on theme: "Compiler Construction"— Presentation transcript:

1 Compiler Construction
Sohail Aslam Lecture 6 compiler: intro

2 How to Describe Tokens? Regular Languages are the most popular for specifying tokens Simple and useful theory Easy to understand Efficient implementations

3 Languages Let S be a set of characters. S is called the alphabet.
A language over S is set of strings of characters drawn from S.

4 Example of Languages Alphabet = English characters
Language = English sentences Alphabet = ASCII Language = C++ programs, Java, C#

5 Notation Languages are sets of strings (finite sequence of characters)
Need some notation for specifying which sets we want

6 Notation For lexical analysis we care about regular languages.
Regular languages can be described using regular expressions.

7 Regular Languages Each regular expression is a notation for a regular language (a set of words). If A is a regular expression, we write L(A) to refer to language denoted by A.

8 Regular Expression A regular expression (RE) is defined inductively
a ordinary character from S e the empty string

9 Regular Expression R|S = either R or S
RS = R followed by S (concatenation) R* = concatenation of R zero or more times (R*= e |R|RR|RRR...)

10 RE Extentions R? = e | R (zero or one R) R+ = RR* (one or more R)
(R) = R (grouping)

11 RE Extentions [abc] = a|b|c (any of listed) [a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything but ‘a’‘b’)

12 Regular Expression RE Strings in L(R) a “a” ab “ab” a|b “a” “b”
(a|e)b “ab” “b”

13 Example: integers integer: a non-empty string of digits
integer = digit digit*

14 Example: identifiers identifier: string or letters or digits starting with a letter C identifier: [a-zA-Z_][a-zA-Z0-9_]*

15 Recap Tokens: strings of characters representing lexical units of programs such as identifiers, numbers, operators.

16 Recap Regular Expressions: concise description of tokens. A regular expression describes a set of strings.

17 Recap Language L(R): set of strings represented by a regular expression R. L(R) is the language denoted by regular expression R.

18 How to Use REs We need mechanism to determine if an input string w belongs to L(R), the language denoted by regular expression R.

19 Acceptor Such a mechanism is called an acceptor. input string w
yes, if w e L acceptor no, if w e L language L

20 Finite Automata (FA) Specification: Regular Expressions
Implementation: Finite Automata

21 Finite Automata Finite Automaton consists of An input alphabet (S)
A set of states A start (initial) state A set of transitions A set of accepting (final) states

22 Finite Automaton State Graphs A state The start state
An accepting state

23 Finite Automaton State Graphs a A transition

24 Finite Automata A finite automaton accepts a string if we can follow transitions labelled with characters in the string from start state to some accepting state.

25 FA Example A FA that accepts only “1” 1

26 FA Example A FA that accepts any number of 1’s followed by a single 0

27 FA Example A FA that accepts ab*a Alphabet: {a,b} b a a
end of lecture 6 compiler: intro


Download ppt "Compiler Construction"

Similar presentations


Ads by Google