Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Monz Regular Expressions and Finite State Automata (J&M 2) Prof. Bonnie J. Dorr.

Similar presentations


Presentation on theme: "CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Monz Regular Expressions and Finite State Automata (J&M 2) Prof. Bonnie J. Dorr."— Presentation transcript:

1 CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Monz Regular Expressions and Finite State Automata (J&M 2) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

2 Regular Expressions and Finite State Automata  REs: Language for specifying text strings  Search for document containing a string –Searching for “woodchuck” Finite-state automata (FSA) (singular: automaton) How much wood would a woodchuck chuck if a woodchuck would chuck wood? –Searching for “woodchucks” with an optional final “s”

3 Regular Expressions  Basic regular expression patterns  Perl-based syntax (slightly different from other notations for regular expressions)  Disjunctions /[wW]oodchuck/

4 Regular Expressions  Ranges [A-Z]  Negations [^Ss]

5 Regular Expressions  Optional characters ?, * and + –? (0 or 1) /colou?r/  color or colour –* (0 or more) /oo*h!/  oh! or Ooh! or Ooooh! *+*+ Stephen Cole Kleene –+ (1 or more) /o+h!/  oh! or Ooh! or Ooooh!  Wild cards. - /beg.n/  begin or began or begun

6 Regular Expressions  Anchors ^ and $ –/^[A-Z]/  “Ramallah, Palestine” –/^[^A-Z]/  “¿verdad?” “really?” –/\.$/  “It is over.” –/.$/  ?  Boundaries \b and \B –/\bon\b/  “on my way” “Monday” –/\Bon\b/  “automaton”  Disjunction | –/yours|mine/  “it is either yours or mine”

7 Disjunction, Grouping, Precedence  Column 1 Column 2 Column 3 … How do we express this? /Column [0-9]+ */ /(Column [0-9]+ +)*/  Precedence –Parenthesis () –Counters * + ? {} –Sequences and anchors the ^my end$ –Disjunction |  REs are greedy!

8 Perl Commands While ($line= ){ if ($line =~ /the/){ print “MATCH: $line”; }

9 Writing correct expressions Exercise: Write a Perl regular expression to match the English article “the”: /the/ /[tT]he/ /\b[tT]he\b/ /[^a-zA-Z][tT]he[^a-zA-Z]/ /(^|[^a-zA-Z])[tT]he[^a-zA-Z]/

10 A more complex example Exercise: Write a regular expression that will match “any PC with more than 500MHz and 32 Gb of disk space for less than $1000”: /$[0-9]+/ /$[0-9]+\.[0-9][0-9]/ /\b$[0-9]+(\.[0-9][0-9])?\b/ /\b$[0-9][0-9]?[0-9]?(\.[0-9][0-9])?\b/ /\b[0-9]+ *([MG]Hz|[Mm]egahertz| [Gg]igahertz)\b/ /\b[0-9]+ *(Mb|[Mm]egabytes?)\b/ /\b[0-9](\.[0-9]+) *(Gb|[Gg]igabytes?)\b/

11 Advanced operators should be _

12 Substitutions and Memory  Substitutions s/colour/color/ s/colour/color/g s/([Cc]olour)/$1olor/ /the (.*)er they were, the $1er they will be/ /the (.*)er they (.*), the $1er they $2/ Substitute as many times as possible! Case insensitive matching s/colour/color/i  Memory ( $1, $2, etc. refer back to matches)

13 Eliza [Weizenbaum, 1966] User: Men are all alike ELIZA: IN WHAT WAY User: They’re always bugging us about something or other ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE? User: Well, my boyfriend made me come here ELIZA: YOUR BOYFRIEND MADE YOU COME HERE User: He says I’m depressed much of the time ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

14 Eliza-style regular expressions s/.* YOU ARE (depressed|sad).*/I AM SORRY TO HEAR YOU ARE \1/ s/.* YOU ARE (depressed|sad).*/WHY DO YOU THINK YOU ARE \1/ s/.* all.*/IN WHAT WAY/ s/.* always.*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ Step 1: replace first person with second person references s/\bI(’m| am)\b /YOU ARE/g s/\bmy\b /YOUR/g S/\bmine\b /YOURS/g Step 2: use additional regular expressions to generate replies Step 3: use scores to rank possible transformations

15 Finite-state Automata  Finite-state automata (FSA)  Regular languages  Regular expressions

16 Finite-state Automata (Machines) /^baa+!$/ q0q0 q1q1 q2q2 q3q3 q4q4 baa! a state transition final state baa! baaa! baaaa! baaaaa!...

17 Input Tape aba!b q0q0 0 1234 baa ! a REJECT

18 Input Tape baaa q0q0 q1q1 q2q2 q3q3 q3q3 q4q4 ! 0 1234 baa ! a ACCEPT

19 Finite-state Automata  Q: a finite set of N states –Q = {q 0, q 1, q 2, q 3, q 4 }   : a finite input alphabet of symbols –  = {a, b, !}  q 0 : the start state  F: the set of final states –F = {q 4 }   (q,i): transition function –Given state q and input symbol i, return new state q' –  (q 3,!)  q 4

20 State-transition Tables Input Stateba! 01ØØ 1Ø2Ø 2Ø3Ø 3Ø34 4:ØØØ

21 D-RECOGNIZE function D-RECOGNIZE (tape, machine) returns accept or reject index  Beginning of tape current-state  Initial state of machine loop if End of input has been reached then if current-state is an accept state then return accept else return reject elsif transition-table [current-state, tape[index]] is empty then return reject else current-state  transition-table [current-state, tape[index]] index  index + 1 end

22 Adding a failing state q0q0 q1q1 q2q2 q3q3 q4q4 baa! a qFqF a ! b ! b! b b a !

23 Adding an “all else” arc q0q0 q1q1 q2q2 q3q3 q4q4 baa! a qFqF = = = =

24 Languages and Automata  Can use FSA as a generator as well as a recognizer  Formal language L: defined by machine M that both generates and recognizes all and only the strings of that language. –L(M) = {baa!, baaa!, baaaa!, …}  Regular languages vs. non-regular languages

25 Languages and Automata  Deterministic vs. Non-deterministic FSAs  Epsilon (  ) transitions

26 Using NFSAs to accept strings  Backup: add markers at choice points, then possibly revisit unexplored arcs at marked choice point.  Look-ahead: look ahead in input  Parallelism: look at alternatives in parallel

27 Using NFSAs Input Stateba!  01ØØØ 1Ø2ØØ 2Ø2,3ØØ 3ØØ4Ø 4:ØØØØ

28 Readings for next time  J&M Chapter 3


Download ppt "CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Monz Regular Expressions and Finite State Automata (J&M 2) Prof. Bonnie J. Dorr."

Similar presentations


Ads by Google