Download presentation
Presentation is loading. Please wait.
Published byUrsula Bates Modified over 9 years ago
1
What is a language? An alphabet is a well defined set of characters. The character ∑ is typically used to represent an alphabet. A string : a finite sequence of alphabet symbols, can be e, the empty string (Some texts use l as the empty string) A language, L, is simply any set of strings (infinite or finite)over a fixed alphabet. can be { }, the empty language.
2
What is a language? (cont’d)
Examples: Alphabet: A-Z Language: English Alphabet: ASCII Language: C++
3
Suppose S = {a,b,c}. Some languages over S could be:
{aa,ab,ac,bb,bc,cc} {ab,abc,abcc,abccc,. . .} { e } { } {a,b,c,e} …
4
What is a language? (cont’d)
Alphabet Languages {0,1} {0,10,100,1000,10000 {0,1,00,11,000,111, {a,b,c} { abc, Aabbcc, Aaab,bbccc } , , } }
5
Regular Languages Formally describe tokens in the language
Regular Expressions NFA DFA
6
Regular Expressions A Regular Expression is a set of rules , techniques for constructing sequences of Symbols (Strings) From an Alphabet. If A is a regular expression, then L(A) is the language defined by that regular expression. L(“c”) is the language with the single word “c”. L(“i” “f”) is the language with just “if” in it.
7
Regular Expressions (cont’d)
L(“if” | “then” | “else”) is the language with just the words “if”, “then”, and “else”. L((“0” | “1”)(“0” | “1”)) is the language consisting of “00”, “01”, “10” and “11”.
8
Regular Expressions (cont’d)
Let Σ Be an Alphabet, r a Regular Expression Then L(r) is the Language That is Characterized by the Rules of r
9
Rules fix alphabet Σ is a regular exp. (denotes the language {})
If a is in Σ , a is a regular expression (that denotes the language {a} if r and s are regular exps. denoting L(r) and L(s) respectively, then so are: (r) | (s) is a regular expression ( denotes the language L(r) L(s) (r)(s) is a regular expression ( denotes the language L(r)L(s) ) (r)* is a regular expression (denotes the language ( L(r)* )
10
Example
11
Regular Expression Operation
There are three basic operations in regular expression : Alternation (union) RE1 | RE2 Concatenation (concatenation) RE1 RE2 Repetition (closure) RE* (zero or more RE’s)
12
Regular Expression Operation
If P and Q are regular expressions over S, then so are: P | Q (union) If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then P + Q denotes the set {a,…,e,0,…,9} PQ (concatenation) If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then PQ denotes the set {a0,…,e0,a1,…,e9} Q* (closure) If Q denotes the set {0,…,9} then Q* denotes the set {0,…,9,00,…99,…}
13
Examples If S = {a,b} (a | b)*b b(a│b)*
14
Regular Expression Overview
Expression Meaning Empty pattern a Any pattern represented by ‘a’ ab Strings with pattern ‘a’ followed by ‘b’ a|b Strings consisting of pattern ‘a’ or ‘b’ a* Zero or more occurrences of patterns in ‘a’ a+ One or more occurrences of patterns in ‘a’ a3 Patterns in ‘a’ repeated exactly 3 times
15
L(R) = the language defined by R
A regular expression R describes a set of strings of characters denoted L(R) L(R) = the language defined by R L(abc) = { abc } L(hello|goodbye) = { hello, goodbye } L(1(0|1)*) = all binary numbers that start with a 1 Each token can be defined using a regular expression
16
RE Notational Shorthand
R+ one or more strings of R: R(R*) R? optional R: (R|) [abcd] one of listed characters: (a|b|c|d) [a-z] one character from this range: (a|b|c|d...|z) [^ab] anything but none of the listed chars [^a-z] any character not from this range
17
Regular Expression, R a ab a|b (ab)* (a| )b digit = [0-9]
posint = digit+ Strings in L(R) “a” “ab” “a”, “b” “”, “ab”, “abab”, ... “ab”, “b” “0”, “1”, “2”, ... “8”, “412”, ... “23”, “34”, ...
18
More Examples All Strings that start with “tab” or end with bat”:
tab{A,…,Z,a,...,z}*|{A,…,Z,a,....,z}*bat All Strings in which {1,2,3} exist in ascending order: {A,…,Z}*1 {A,…,Z}*2 {A,…,Z}*3 {A,…,Z}*
19
Defining Our Language The first thing we can define in our language are keywords. These are easy: if | else | while | find | … When we scan a file, we can either have a single token represent all keywords, or else break them down into groups, such as “commands”, “types”, etc.
20
Language Def (cont’d) float = {digit}+ “.” {digit}+
Next we will define integers in a language: digit = “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9” integer = {digit}+ Note that we can abbreviate ranges using the dash (“-”). Thus, digit = 0-9 Relation = ‘<’ | ‘<=’ | ‘>’ | ‘>=’ | ‘<>’ | ‘=’ Floating point numbers are not much more complicated: float = {digit}+ “.” {digit}+
21
Language Def (cont’d) Identifiers are strings of letters, underscores, or digits beginning with a non-digit. Letter = a-z | A-Z digit = 0-9 Identifier = ({letter})({letter} | “_” | {digit})*
22
Real-world example What is the regular expression that defines all phone numbers? ∑ = { 0-9 } Area = {digit}3 Exchange = {digit}3 Local = {digit}4 Phone_number = “(” {Area} “)” {Exchange} {Local}
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.