Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Language Finite State Machines and Regular Expressions.

Similar presentations


Presentation on theme: "Computational Language Finite State Machines and Regular Expressions."— Presentation transcript:

1 Computational Language Finite State Machines and Regular Expressions

2 Plan Regular expressions Introduction Operators Disjunction, precedence, substitution Finite State Machines Link with regular expressions Determinisitic FSA Non-deterministic FSA Lab session reg ex. implementation in UNIX (egrep)

3 Regular Expressions Basis of all web-based and word- processor-based searches Definition 1. An algebraic notation for describing a string Definition 2. A set of rules that you can use to specify one or more items, such as words in a file, by using a single character string (Sarwar et al.)

4 Regular Expressions regular expression, text corpus regular expression algebra has variants: Perl, Unix tools Unix tools: egrep, sed, awk

5 Regular Expressions Find occurrences of /Nokia/ in the text egrep -n ‘Nokia’ nokia_corpus.txt

6 Regular Expressions egrep -n ‘Nokia’ nokia_corpus.txt

7 Regular Expressions Suppress case distinctions Nokia or nokia

8 Regular Expressions set operator egrep -n ‘[Nn]okia’ nokia_corpus.txt

9 Regular Expressions Suppress other features, for example singular share or plural shares

10 Regular Expressions optional operator egrep -n ‘shares?’ nokia_corpus.txt

11 Regular Expressions egrep -n ‘shares?’ nokia_corpus.txt

12 Regular Expressions Kleene operators: /string*/ “zero or more occurrences of previous character” /string+/ “1 or more occurrences of previous character”

13 Regular Expressions Wildcard operator: /string./ “any character after the previous character”

14 Regular Expressions Wildcard operator: /string./ “any character after the previous character” Combine wildcard and kleene: /string.*/ “zero or more instances of any character after the previous character” /string.+/ “one or more instances of any character after the previous character”

15 Regular Expressions egrep –n ‘profit.*’ nokia_corpus.txt

16 Regular Expressions Anchors Beginning of line operator: ^ egrep ‘^said’ nokia_corpus.txt End of line operator: $ egrep ‘$said’ nokia_corpus.txt

17 Regular Expressions Disjunction: set operator /[Ss]tring/ “a string which begins with either S or s” Range /[A-Z]tring/ “a string beginning with a capital letter” pipe | /string1|string2/ “either string 1 or string 2”

18 Regular Expressions Disjunction egrep –n ‘weak|warning|drop’ nokia_corpus.txt egrep –n ‘weak.*|warn.*|drop.*’ nokia_corpus.txt

19 Regular Expressions Negation: /[^a-z]tring“ any strings that does not begin with a small letter”

20 Regular Expressions Precedence 1. Parantheses 2. Kleene and optional operators *. ? 3. Anchors and sequences 4. Disjunction operator | (a) /supply | iers/ /supply/ /iers/ (b) /suppl(y|iers)//supply/ suppliers/

21 Regular Expressions Substitution sed ‘s/word1/word2/ corpus.txt Me: I am feeling a bit depressed today sed ‘s/I am/sorry to hear that you are/’ corpus.txt

22 Regular Expressions Substitution sed ‘s/word1/word2/ corpus.txt Me: I am feeling a bit depressed today sed ‘s/I am/sorry to hear that you are/’ corpus.txt Eliza: sorry to hear that you are feeling a bit depressed today

23 Regular Expressions Substitution sed ‘s/word1/word2/ corpus.txt Me: I wish I could shake this depression sed Eliza: I am sure you could shake this depression

24 Regular Expressions Substitution sed ‘s/word1/word2/’ corpus.txt Me: I wish I could shake this depression sed ‘s/wish I/am sure you/’ corpus.txt Eliza: I am sure you could shake this depression

25 Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings e.g. egrep –n ‘baa+!’ corpus.txt

26 Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph

27 Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph Set of nodes representing states

28 Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph Set of nodes representing states Set of arcs, links between nodes, representing transitions between states

29 Finite State Transition Networks Finite State Automata (FSA) Just as a regular expression, used to recognise a set of strings Represented as a directed graph Set of nodes representing states Set of arcs, links between nodes, representing transitions between states Arcs are labelled

30 Finite State Automata How does it work? used to recognise a set of strings

31 Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell

32 Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell String slowly fed into machine

33 Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell String slowly fed into machine If symbol on input matches symbol on arc, then A) move to next state B) advance one symbol on input string C) keep going till final state or input ends

34 Finite State Automata How does it work? used to recognise a set of strings Candidate input string represented as a segmented tape with a symbol for each cell String slowly fed into machine If symbol on input matches symbol on arc, then A) move to next state B) advance one symbol on input string C) keep going till final state or input ends Otherwise: stop and reject string

35 Finite State Automata State Transition Table

36 Finite State Automata State Transition Table

37 Finite State Automata State Transition Table

38 Finite State Automata State Transition Table

39 Finite State Automata State Transition Table

40 Finite State Automata Algorithm for FSA (Jurafsky and Martin, p. 37)

41 Finite State Automata FSAs and recognition

42 Finite State Automata FSAs and recognition FSAs and generation At each transition print out label of arc At final state stop printing

43 Finite State Automata Deterministic FSAs An FSA whose recognition behaviour is fully determined by the state it is in and the input symbol it is looking at

44 Finite State Automata Deterministic FSAs An FSA whose recognition behaviour is fully determined by the state it is in and the input symbol it is looking at Non-deterministic FSAs An FSA with decision points

45 Finite State Automata Deterministic FSAs Non-deterministic FSAs An FSA with decision points Self-loop may be in a particular state Arcs may have ε transitions

46 Finite State Automata Deterministic FSAs Non-deterministic FSA Backup: set a marker that can be returned to Look-ahead: look ahead at input Parallelism: look at alternative paths in parallel

47 Finite State Automata Non-deterministic FSA: state transition table

48 Finite State Automata Formal language Set of strings Finite symbol set, alphabet

49 Finite State Automata Formal language Set of strings Finite symbol set, alphabet

50 Finite State Automata Formal language Set of strings Finite symbol set, alphabet L(m) = {baa!, ba!, baaa!,…} “formal language characterised by m” m = model L = formal language

51 Finite State Automata Formal language Set of strings Finite symbol set, alphabet L(m) = {baa!, ba!, baaa!,…} A formal language models a fragment of a natural language


Download ppt "Computational Language Finite State Machines and Regular Expressions."

Similar presentations


Ads by Google