CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lexical Analysis Convert source file characters into token stream. Remove content-free characters (comments, whitespace,...) Detect lexical errors (badly-formed literals, illegal characters,...) Output of lexical analysis is input to syntax analysis. Idea: Look for patterns in input character sequence, convert to tokens with attributes, and pass them to parser in stream.
Lexical Analysis Example
Specifying Lexical Analysers Can define lexical analyzer via list of pairs: (regular expression, action) where regular expression describes token pattern and action is a piece of code, parameterized by the matching lexeme, that returns a (token, attribute) pair Example (digit+, {return new Token(NUM,parseInt(lexeme));}) (alpha(alpha|digit) ∗, {return new Token(ID,lexeme);}) (space|tab|newline, {}) (.,.) So R.E’s can help us specify scanners.
Regular Expressions A regular expression (R.E.) is a concise formal characterization of a regular language. Example: The regular language containing all IDENTs is described by the regular expression letter (letter | digit) ∗ where “| ” means “or” and “e ∗ ” means “zero or more copies of e.” Regular languages are one particular kind of formal languages.
6 Finite Automaton Input “Accept” or “Reject” String Finite Automaton Output
7 Transition Graph initial state accepting state transition
8 Initial Configuration Input String
9 Reading the Input
10 Reading the Input
11 Reading the Input
12 Reading the Input
13 accept Input finished Reading the Input
14 Rejection
15 Rejection
16 Rejection
17 Rejection
18 reject Input finished Rejection
19 Another Rejection
20 reject Another Rejection
21 Another Example
22 Another Example
23 Another Example
24 Another Example
25 accept Another Example
26 Rejection Example
27 Rejection Example
28 Rejection Example
29 Rejection Example
30 reject Input finished Rejection Example
31 Languages Accepted by FAs FA The language contains all input strings accepted by = { strings that bring to an accepting state}
32 Example accept
33 Example accept
34 Formal Definition Finite Automaton (FA) : set of states : input alphabet : transition function : initial state : set of accepting states
35 Input Alphabet
36 Set of States
37 Initial State
38 Set of Accepting States
39 Transition Function
40 Transition Function
41 Transition Function
42 Transition Function
44 Extended Transition Function
45 Extended Transition Function
46 Example = { all strings with prefix } accept
47 Example = { all strings without substring }
48 Example
49 Regular Languages Definition: A language is regular if there is FA such that Observation: All languages accepted by FAs form the family of regular languages
50 { all strings with prefix } { all strings without substring } There exist automata that accept these Languages (see previous slides). Examples of Regular Languages There exist languages which are not Regular: There is no FA that accepts such a language.
51 Alphabet = Nondeterministic Finite Automaton (NFA)
52 Nondeterministic Finite Automaton (NFA) Two choices
53 First Choice Nondeterministic Finite Automaton (NFA)
54 Second Choice No transition: the automaton hangs Nondeterministic Finite Automaton (NFA)
55 An NFA accepts a string: when there is a computation of the NFA that accepts the string There is a computation: all the input is consumed and the automaton is in an accepting state An NFA rejects a string: when there is no computation of the NFA that accepts the string. Nondeterministic Finite Automaton (NFA)
56 Lambda Transitions
57 Another NFA Example
58 Another NFA Example (2)