Download presentation
Presentation is loading. Please wait.
Published byShana Patrick Modified over 9 years ago
1
Natural Language Processing Lecture 4 : Regular Expressions and Automata
2
2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Definitions
3
3 Alphabets and Strings We will use small alphabets: Strings
4
Regular Expressions In computer science, RE is a language used for specifying text search string. A regular expression is a formula in a special language that is used for specifying a simple class of string. Formally, a regular expression is an algebraic notation for characterizing a set of strings. RE search requires a pattern that we want to search for, and a corpus of texts to search through.
5
5 Basic Regular Expression Patterns The use of the brackets [] to specify a disjunction of characters. The use of the brackets [] plus the dash - to specify a range.
6
6 Basic Regular Expression Patterns Uses of the caret ^ for negation or just to mean ^ The question-mark ? marks optionality of the previous expression. The use of period. to specify any character
7
7 Disjunction, Grouping, and Precedence Disjunction /cat|dog Precedence /gupp(y|ies) To find the English article the /the/ /[tT]he/ /[^a-zA-Z][tT]he[^a-zA-Z]/
8
8 Aliases for common sets of characters
9
9 Regular expression operators for counting
10
10 Some characters that need to be backslashed
11
11 Finite State Automata FSAs recognize the regular languages represented by regular expressions SheepTalk: /baa+!/ Directed graph with labeled nodes and arc transitions Five states: q0 the start state, q4 the final state, 5 transitions q0 q4 q1 q2 q3 ba a a!
12
12 Formally FSA is a 5-tuple consisting of Q: set of states {q0,q1,q2,q3,q4} : an alphabet of symbols {a,b,!} q0 : A start state F : a set of final states in Q {q4} (q,i): a transition function mapping Q x to Q q0 q4 q1 q2 q3 ba a a!
13
13 FSA recognizes (accepts) strings of a regular language baa! baaa! baaaa! … Tape Input: a rejected input aba!b q0 q4 q1 q2 q3 ba a a!
14
14 State Transition Table for SheepTalkSheepTalk State Input ba! 01ØØ 1Ø2Ø 2Ø3Ø 3Ø34 4ØØØ baa! baaa! baaaa! baaaaa !... q0 q4 q1 q2 q3 ba a a!
15
15 Non-Deterministic FSAs for SheepTalk q0 q4 q1 q2 q3 ba a a! q0 q4 q1 q2 q3 baa!
16
16 Finite Accepter Input “Accept” or “Reject” String Finite Automata Output
17
17 Transition Graph initial state final state “accept” state transition abba -Finite Accepter
18
18 Initial Configuration Input String
19
12/21/201519 Reading the Input
20
12/21/201520
21
12/21/201521
22
12/21/201522
23
12/21/201523 Output: “accept”
24
12/21/201524 Rejection
25
12/21/201525
26
12/21/201526
27
12/21/201527
28
12/21/201528 Output: “reject”
29
12/21/201529 Another Example
30
12/21/201530
31
12/21/201531
32
12/21/201532
33
12/21/201533 Output: “accept”
34
12/21/201534 Rejection
35
12/21/201535
36
12/21/201536
37
12/21/201537
38
12/21/201538 Output: “reject”
39
12/21/201539 Formalities Deterministic Finite Accepter (DFA) : set of states : input alphabet : transition function : initial state : set of final states
40
12/21/201540 About Alphabets Alphabets means we need a finite set of symbols in the input. These symbols can and will stand for bigger objects that can have internal structure.
41
12/21/201541 Input Aplhabet
42
12/21/201542 Set of States
43
12/21/201543 Initial State
44
12/21/201544 Set of Final States
45
12/21/201545 Transition Function
46
12/21/201546
47
12/21/201547
48
12/21/201548
49
12/21/201549 Transition Function
50
12/21/201550 Extended Transition Function (Reads the entire string)
51
12/21/201551
52
12/21/201552
53
12/21/201553
54
12/21/201554 Observation: There is a walk from to with label
55
12/21/201555 Example accept
56
12/21/201556 Another Example accept
57
12/21/201557 More Examples accept trap state
58
12/21/201558 = { all substrings with prefix } accept
59
12/21/201559 = { all strings without substring }
60
12/21/201560 Regular Languages A language is regular if there is a DFA such that All regular languages form a language family
61
12/21/201561 Example The language is regular:
62
12/21/201562 Dollars and Cents
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.