Download presentation
Presentation is loading. Please wait.
1
Computational Language Finite State Machines and Regular Expressions
2
Plan Regular expressions Introduction Operators Disjunction, precedence, substitution Finite State Machines Link with regular expressions Determinisitic FSA Non-deterministic FSA Lab session reg ex. implementation in UNIX (egrep)
3
Regular Expressions Basis of all web-based and word- processor-based searches Definition 1. An algebraic notation for describing a string Definition 2. A set of rules that you can use to specify one or more items, such as words in a file, by using a single character string (Sarwar et al.)
4
Regular Expressions regular expression, text corpus regular expression algebra has variants: Perl, Unix tools Unix tools: egrep, sed, awk
5
Regular Expressions Find occurrences of /Nokia/ in the text egrep -n ‘Nokia’ nokia_corpus.txt
6
Regular Expressions egrep -n ‘Nokia’ nokia_corpus.txt
7
Regular Expressions Suppress case distinctions Nokia or nokia
8
Regular Expressions set operator egrep -n ‘[Nn]okia’ nokia_corpus.txt
9
Regular Expressions Suppress other features, for example singular share or plural shares
10
Regular Expressions optional operator egrep -n ‘shares?’ nokia_corpus.txt
11
Regular Expressions egrep -n ‘shares?’ nokia_corpus.txt
12
Regular Expressions Kleene operators: /string*/ “zero or more occurrences of previous character” /string+/ “1 or more occurrences of previous character”
13
Regular Expressions Wildcard operator: /string./ “any character after the previous character”
14
Regular Expressions Wildcard operator: /string./ “any character after the previous character” Combine wildcard and kleene: /string.*/ “zero or more instances of any character after the previous character” /string.+/ “one or more instances of any character after the previous character”
15
Regular Expressions egrep –n ‘profit.*’ nokia_corpus.txt
16
Regular Expressions Anchors Beginning of line operator: ^ egrep ‘^said’ nokia_corpus.txt End of line operator: $ egrep ‘$said’ nokia_corpus.txt
17
Regular Expressions Disjunction: set operator /[Ss]tring/ “a string which begins with either S or s” Range /[A-Z]tring/ “a string beginning with a capital letter” pipe | /string1|string2/ “either string 1 or string 2”
18
Regular Expressions Disjunction egrep –n ‘weak|warning|drop’ nokia_corpus.txt egrep –n ‘weak.*|warn.*|drop.*’ nokia_corpus.txt
19
Regular Expressions Negation: /[^a-z]tring“ any strings that does not begin with a small letter”
20
Regular Expressions Precedence 1. Parantheses 2. Kleene and optional operators *. ? 3. Anchors and sequences 4. Disjunction operator | (a) /supply | iers/ /supply/ /iers/ (b) /suppl(y|iers)//supply/ suppliers/
21
Regular Expressions Substitution sed ‘s/word1/word2/ corpus.txt Me: I am feeling a bit depressed today sed ‘s/I am/sorry to hear that you are/’ corpus.txt
22
Regular Expressions Substitution sed ‘s/word1/word2/ corpus.txt Me: I am feeling a bit depressed today sed ‘s/I am/sorry to hear that you are/’ corpus.txt Eliza: sorry to hear that you are feeling a bit depressed today
23
Regular Expressions Substitution sed ‘s/word1/word2/ corpus.txt Me: I wish I could shake this depression sed Eliza: I am sure you could shake this depression
24
Regular Expressions Substitution sed ‘s/word1/word2/’ corpus.txt Me: I wish I could shake this depression sed ‘s/wish I/am sure you/’ corpus.txt Eliza: I am sure you could shake this depression
25
Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings e.g. egrep –n ‘baa+!’ corpus.txt
26
Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph
27
Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph Set of nodes representing states
28
Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph Set of nodes representing states Set of arcs, links between nodes, representing transitions between states
29
Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph Set of nodes representing states Set of arcs, links between nodes, representing transitions between states Arcs are labelled
30
Finite State Automata How does it work? used to recognise a set of strings
31
Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell
32
Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell String slowly fed into machine
33
Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell String slowly fed into machine If symbol on input matches symbol on arc, then A) move to next state B) advance one symbol on input string C) keep going till final state or input ends
34
Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell String slowly fed into machine If symbol on input matches symbol on arc, then A) move to next state B) advance one symbol on input string C) keep going till final state or input ends Otherwise: stop and reject string
35
Finite State Automata State Transition Table
36
Finite State Automata State Transition Table
37
Finite State Automata State Transition Table
38
Finite State Automata State Transition Table
39
Finite State Automata State Transition Table
40
Finite State Automata Algorithm for FSA (Jurafsky and Martin, p. 37)
41
Finite State Automata FSAs and recognition
42
Finite State Automata FSAs and recognition FSAs and generation At each transition print out label of arc At final state stop printing
43
Finite State Automata Deterministic FSAs An FSA whose recognition behaviour is fully determined by the state it is in and the input symbol it is looking at
44
Finite State Automata Deterministic FSAs An FSA whose recognition behaviour is fully determined by the state it is in and the input symbol it is looking at Non-deterministic FSAs An FSA with decision points
45
Finite State Automata Deterministic FSAs Non-deterministic FSAs An FSA with decision points Self-loop may be in a particular state Arcs may have ε transitions
46
Finite State Automata Deterministic FSAs Non-deterministic FSA Backup: set a marker that can be returned to Look-ahead: look ahead at input Parallelism: look at alternative paths in parallel
47
Finite State Automata Non-deterministic FSA: state transition table
48
Finite State Automata Formal language Set of strings Finite symbol set, alphabet
49
Finite State Automata Formal language Set of strings Finite symbol set, alphabet
50
Finite State Automata Formal language Set of strings Finite symbol set, alphabet L(m) = {baa!, ba!, baaa!,…} “formal language characterised by m” m = model L = formal language
51
Finite State Automata Formal language Set of strings Finite symbol set, alphabet L(m) = {baa!, ba!, baaa!,…} A formal language models a fragment of a natural language
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.