LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 13: 10/9.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/27.
LING 388: Language and Computers Sandiway Fong 9/29 Lecture 11.
Finite Automata with Output
LING 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/19.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/4.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/21.
LING 388: Language and Computers Sandiway Fong Lecture 12: 10/5.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/25.
Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Fall 2008.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/26.
LING 388 Language and Computers Lecture 4 9/11/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/13.
LING 388: Language and Computers Sandiway Fong Lecture 11: 10/3.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 16: 10/23.
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
LING 388: Language and Computers Sandiway Fong Lecture 17: 10/25.
LING/C SC/PSYC 438/538 Midterm 10/11. Instructions It is recommended that you attempt all questions –Submit your answers in one file by to
LING 388: Language and Computers Sandiway Fong Lecture 10: 9/26.
LING 388 Language and Computers Lecture 7 9/23/03 Sandiway FONG.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/2.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 14: 10/12.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.
CMPS 3223 Theory of Computation
1 Unit 1: Automata Theory and Formal Languages Readings 1, 2.2, 2.3.
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong.
Finite State Transducers for Morphological Parsing
LING/C SC/PSYC 438/538 Lecture 12 10/4 Sandiway Fong.
LING 388: Language and Computers Sandiway Fong 9/27 Lecture 10.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong. Did you install SWI Prolog?
LING/C SC/PSYC 438/538 Lecture 16 Sandiway Fong. SWI Prolog Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery.
Lecture # 15. Mealy machine A Mealy machine consists of the following 1. A finite set of states q 0, q 1, q 2, … where q 0 is the initial state. 2. An.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
Lecture 14: Theory of Automata:2014 Finite Automata with Output.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Finite State Machines Dr K R Bond 2009
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
Pushdown Automata.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Chapter 9 TURING MACHINES.
Speech and Language Processing
CSCI 5832 Natural Language Processing
Intro to Data Structures
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Morphological Parsing
Mealy and Moore Machines
CSCI 5832 Natural Language Processing
Presentation transcript:

LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 15: 10/16

Administrivia No lecture this Thursday

Today’s Topics Midterm review Finite State Transducers (FST)

Question 1 Download the file wsj.txt (~ 50K lines) Write a Perl program that finds all lines containing any possible form of the idiom take... advantage of... How many are there in wsj.txt? Submit your program Submit the lines returned by your program

Question 1 First hit on Google: –take advantage (of someone) to use someone's weakness to improve your own situation. Mr. Smith often takes advantage of my friendship and leaves the unpleasant tasks for me to do.See also: advantage, take advantagetake –take advantage (of something) to use an opportunity to get or achieve something. He took advantage of the prison's education program to earn a college degree. There are peaches and strawberries grown on the farm, and I sure take full advantage of them.Usage notes: often said of someone who has opportunities that others do not have: The rich can take advantage of clever accounting tricks to avoid taxes.See also: advantage, takeadvantagetake –Cambridge Dictionary of American IdiomsCambridge Dictionary of American Idioms –Cambridge University Press 2003

Answer 1 1.Investors took advantage of Tuesday 's stock rally 2.Like other forms of arbitrage, it merely seeks to take advantage of momentary discrepancies 3.As usually practiced it takes advantage of a rather basic concept 4.So if index arbitrage is simply taking advantage of thin inefficiencies 5.`` If you could get the rhythm of the program trading, you could take advantage of it. '' 6.Mrs. Gorman took advantage of low prices 7.According to Upjohn 's estimates, only 50 % to 60 % of the 1,100 eligible employees will take advantage of the plan. 8.Nissan has increased earnings more than market share by cutting costs and by taking advantage of a general surge 9.Mr. Peladeau took his first big gamble 25 years ago, when he took advantage of a strike at La Presse 10.In addition, the two companies will develop new steam turbine technology, such as the plants ordered by Florida Power, and even utilize each other 's plants at times to take advantage of currency fluctuations. 11.One of GE 's goals when it bought 80 % of Kidder in 1986 was to take advantage of `` syngeries '' 12.I take advantage of this opportunity given to me by The Wall Street Journal And taking more direct action has the advantage of avoiding sharp increases 13.To take advantage of local expertise and custom 14.Several blue-chip companies tapped the new-issue market yesterday to take advantage of falling interest rates. 15.He also noted that a strong sterling market yesterday might have helped cocoa in New York as arbitragers took advantage of the currency move. 16.My kids ' college education looms as perhaps the greatest future opportunity for spending, although I 'll probably have to cash in their toy portfolio to take advantage of it. 17.As the ad 's tone implies, the Texas spirit is pretty xenophobic these days, and Lone Star is n't alone in trying to take advantage of that. 18.IBM, which Gartner Group said generates 22 % of its revenue in this market, should be able to take advantage of its loyal following 19.Erik Keller, a Gartner Group analyst, said organizational changes may still be required to really take advantage of CIM 's capabilities

Answer 1 20.These latter-day scalawags would be ill-advised to take advantage of the situation 21.Most of trading action now is from professional traders who are trying to take advantage of the price swings 22.For instance, First Quadrant Corp., an asset allocator based in Morristown, N.J., said it quickly boosted stock positions in its `` aggressive '' accounts to 75 % from 55 % to take advantage of plunging prices Friday. 23.Others are doing `` index arbitrage '' a strategy of taking advantage of price discrepancies 24.The campaign, created by Omnicom Group 's DDB Needham agency, takes advantage of the eye-catching photography 25.According to industry lawyers, the ruling gives pipeline companies an important second chance to resolve remaining disputes and take advantage of the cost-sharing mechanism. 26.Thanks to a new air-traffic agreement and the ability of Irish travel agents to issue Aeroflot tickets, tourists here are taking advantage of Aeroflot 's reasonable prices 27.But, `` You never can tell, '' he added, `` you have to take advantage of opportunities. 28.A broad rally began when several major processors began buying futures contracts, apparently to take advantage of the price dip. 29.`` We hope to take advantage of it, '' 30.And we hope to take advantage of panics 31.To take full advantage of the financial opportunities 32.Specifically, it must understand how real-estate markets overreact to shifts in regional economies and then take advantage of these opportunities.

Answer 1 Perl Program: –a simple way to exclude the case shown earlier open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ( ) { print $_ if (/\b(take|takes|taking|taken|took)\b(.*) advantage of/ && $2 !~ /\bthe\b/) }

Question 2 Give a regular grammar in Prolog notation that accepts strings with an odd number of a’s (#a’s =1,3,5,...) followed by an even number of b’s (#b’s = 2,4,6,...) i.e. a n b m n odd, m even Examples: –aaabb –abbbb –aaaaabb –*aabb –*aaab Submit your program Show it works on the given examples

Answer 2 Regular grammar in Prolog DCG format: 1.s --> [a], b. 2.s --> [a], d. 3.b --> [a], s. 4.d --> [b], e. 5.e --> [b]. 6.e --> [b], d. Run | ?- s([a,a,a,b,b],[]). yes | ?- s([a,b,b,b,b],[]). yes | ?- s([a,a,a,a,a,b,b],[]). yes | ?- s([a,a,b,b],[]). no | ?- s([a,a,a,b],[]). no

Question 3 Using an extra argument with regular grammar rules in Prolog DCG format, give a grammar that accepts L = a n b m n even (n=2,4,6,...) m is the odd number closest to but not exceeding n/2 Note: L is a non-regular language Examples: –aab –aaaab –*aaaabb –aaaaaabbb –*aaaaaabbbb –aaaaaaaabbb –*aaaaaaaabbbb –*aaaaaaaabbbbb Show your program works on the above examples

Answer 3 Program 1.s(X) --> [a], b(s(X)). 2.b(X) --> [a], c(s(X)). 3.b(X) --> [a], s(s(X)). 4.c(s(s(0))) --> [b]. 5.c(s(s(s(s(0))))) --> [b]. 6.c(s(s(X))) --> [b], d(X). 7.d(s(s(X))) --> [b], c(X). Run | ?- s(0,[a,a,b],[]). yes | ?- s(0,[a,a,a,a,b],[]). yes | ?- s(0,[a,a,a,a,b,b],[]). no | ?- s(0,[a,a,a,a,a,a,b,b,b],[]). yes | ?- s(0,[a,a,a,a,a,a,b,b,b,b],[]). no | ?- s(0,[a,a,a,a,a,a,a,a,b,b,b],[]). yes | ?- s(0,[a,a,a,a,a,a,a,a,b,b,b,b],[]). no | ?- s(0,[a,a,a,a,a,a,a,a,b,b,b,b,b],[]). no

Question 4 Give a regexp for the language described in Question 2 a n b m n odd, m even

Answer 4 a n b m n odd, m even a(aa)*(bb)+

Question 5 Give a regexp for the complement of the following FSA ba ab a b a,b a b

Answer 5 Original machine is deterministic Flip the states ba ab a b a,b a b ba ab a b a b

Answer 5 Notice 5 is a dead- end state Erase ba ab a b a,b a b ba ab a b

Answer 5 Eliminated state 5 Eliminate states 2 and ba ab a b 13 ab ba ab ba (ab|ba)*

Answer 5 Eliminated state 5 Equations –E1 = aE2 | bE4 | λ –E2 = bE3 –E4 = aE3 –E3 = aE2 | bE4 | λ Eliminate E4 –E1 = aE2 | baE3 | λ –E3 = aE2 | baE3 | λ Eliminate E2 –E1 = abE3 | baE3 | λ –E3= abE3 | baE3 | λ Group E3 –E1 = (ab|ba)E3 | λ –E3 = (ab|ba)E3 | λ Solve E3 –E3 = (ab|ba)* –E1 = (ab|ba)(ab|ba)*|λ = (ab|ba)* ba ab a b

Question 6 Give the deterministic FSA corresponding to:

Answer 6 Deterministic machine a 3 c 4 c a b 6 b a 8 a c 7 a

Finite State Transducers Just like Finite State Automata (FSA) except for an output tape Mealy Machine formulation: –at each transition, a FST can read an input symbol and output a (different) symbol onto the tape Background reading –Chapter 3 of the textbook

Morphology morphology –words are composed of morphemes –morpheme: basic semantic unit, e.g. -ee in employee –Inflectional: no change in category, e.g. V -ed  V –can carry information about tense, personal, number, gender, case etc. –Derivational: category-changing, e.g. V -able  A –very productive

Walkers. Standees. © Sandiway Fong sign above travelator at Pittsburgh International Airport

Today’s Topic Finite State Transducers (FST) for morphological processing –... also Prolog implementation

Recall Finite State Automata (FSA) from lecture 8 –(Q,s,f,Σ,  ) 1.set of states (Q): {s,x,y}must be a finite set 2.start state (s): s 3.end state(s) (f): y 4.alphabet ( Σ ): {a, b} 5.transition function  : signature: character × state → state  (a,s)=x  (a,x)=x  (b,x)=y  (b,y)=y sx y a a b b

Modeling English Adjectives using FSA –from section 3.2 of textbook examples –big, bigger, biggest, *unbig –cool, cooler, coolest, coolly –red, redder, reddest, *redly –clear, clearer, clearest, clearly, unclear, unclearly –happy, happier, happiest, happily –unhappy, unhappier, unhappiest, unhappily –real, *realer, *realest, unreal, really fsa (3.4) Initial machine is overly simple need more classes to make finer grain distinctions e.g. *unbig

Modeling English Adjectives using FSA divide adjectives into classes examples –adj-root 2 : big, bigger, biggest, *unbig –adj-root 2 : cool, cooler, coolest, coolly –adj-root 2 : red, redder, reddest, *redly –adj-root 1 : clear, clearer, clearest, clearly, unclear, unclearly –adj-root 1 : happy, happier, happiest, happily –adj-root 1 : unhappy, unhappier, unhappiest, unhappily –adj-root 1 : real, *realer, *realest, unreal, really fsa (3.5) However... Examples uncooler Smoking uncool and getting uncooler. google: 22,800 (2006), 10,900 (2005) *realer google: 3,500,000 (2006) 494,000 (2005) *realest google: 795,000 (2006) 415,000 (2005)

Modeling English Adjectives using FSA e.g. *unbig google: 2,590 hits (2007) morphology is productive morphemes carry (compositional) meaning can be used for dramatic effect unbig vs. small

The Mapping Problem To map between a surface form and the decomposition of a word into its components –e.g. root +  (person/number/gender) and other features using spelling rules Example: (3.11) Notes: ^ marks a morpheme boundary # is the end-of-word marker

Stage 1: Lexical  Intermediate Levels example: –f o x +N +PL (lexical) –f o x ^s# (intermediate) lexical level: –uninflected “dictionary” level intermediate level: –replace abstract morphemes by concrete ones key –+N : noun fox can also be a verb, but fox +V cannot combine with +PL –+PL : (abstract) plural morpheme realized in English as s (basic case) –boundary markers ^ and # for use by the spelling rule machine (later)

Stage 1: Lexical  Intermediate Levels example: –f o x +N +PL (lexical) –f o x ^s# (intermediate) machine idea –character-by-character correspondences –f  f –o  o –x  x –+N   (  = empty string) –+PL  ^s# use a Finite State Machine with input/output mapping –Finite State Transducer (FST)

Stage 1: Lexical  Intermediate Levels Example: –g o o s e +N +PL (lexical) –g e e s e # (intermediate) Example: –g o o s e +N +SG (lexical) –g o o s e # (intermediate) Example: –m o u s e +N +PL (lexical) –m i  c e # (intermediate) Example: –s h e e p +N +PL (lexical) –s h e e p # (intermediate)

Stage 1: Lexical  Intermediate Levels 3.11 Notation: input : output f means f:f

Extension to Finite State Transducers (FST) [Mealy machine extension to FSA] –(Q,s,f,Σ,  ) 1.set of states (Q): {s,x,y}must be a finite set 2.start state (s): s 3.end state(s) (f): y 4.alphabet ( Σ ): pairs I:O –I = input alphabet, O = output alphabet –ε may be included in I and O 5.transition function (or matrix)  : signature: i/o pair × state → state  (a:b,s)=x  (a:b,x)=x  (b:a,x)=y  (b:ε,y)=y sx y a:b b: ε b:a

Finite State Automata (FSA) recall: one possible Prolog encoding strategy –define one predicate for each state taking one argument (the input string) consume input character call next state with remaining input string –query ?- s(L). call start state s

Finite State Automata (FSA) –define one predicate for each state take one argument (the input string), and consume input character call next state with remaining input string –query ?- s(L). i.e. call start state s –state s: (start state) s([a|L]) :- x(L). –state x: x([a|L]) :- x(L). x([b|L]) :- y(L). –state y: (end state) y([]). y([b|L]) :- y(L). sx y a a b b simple extension to FST: each predicate takes two arguments: input and output

Stage 1: Lexical  Intermediate Levels example –s0([f|L1],[f|L2]) :- s1(L1,L2). –s0([c|L1],[c|L2]) :- s3(L1,L2). –s1([o|L1],[o|L2]) :- s2(L1,L2). –s2([x|L1],[x|L2]) :- s5(L1,L2). –s3([a|L1],[a|L2]) :- s4(L1,L2). –s4([t|L1],[t|L2]) :- s5(L1,L2). –s5([‘+N’|L1],L2) :- s6(L1,L2). –s6([‘+PL’|L1],[^,s,#|L2]) :- s7(L1,L2). –s7([],[]).% end state

Stage 1: Lexical  Intermediate Levels FST queries –lexical  intermediate ?- s0([f,o,x,’+N’,’+PL’],X). –X = [f, o, x, ^, s, #] –intermediate  lexical ?- s0(X,[f,o,x,^,s,#]). –X = [f, o, x, '+N', '+PL'] –enumerator ?- s0(X,Y). –X = [f, o, x, '+N', '+PL'] –Y = [f, o, x, ^, s, #] ; –X = [c, a, t, '+N', '+PL'] –Y = [c, a, t, ^, s, #] ; No inversion of a transducer T: T -1 switch input and output labels in Prolog, simply change the call

Stage 1: Lexical  Intermediate Levels Figure 3.17 (top half): tape view of input/output pairs

The Mapping Problem Example: (3.11) (Context-Sensitive) Spelling Rule: (3.5) –   e / { x, s, z } ^ __ s#  rewrites to letter e in left context x^ or s^ or z^ and right context s# i.e. insert e after the ^ when you see x^s# or s^s# or z^s# in particular, we have x^s#  x^es#

Stage 2: Intermediate  Surface Levels also can be implemented using a FST important! machine is designed to pass input not matching the rule through unmodified (rather than fail) implements context-sensitive rule q 0 to q 2 : left context q 3 to q 0 : right context

Stage 2: Intermediate  Surface Levels Example (3.17)

Stage 2: Intermediate  Surface Levels Transition table for FST in 3.14 pg.79

Stage 2: Intermediate  Surface Levels in Prolog (simplified) –q0([],[]). % final state –q0([^|L1],L2) :- !, q0(L1,L2). % ^:  –q0([z|L1],[z|L2]) :- !, q1(L1,L2). –% repeat for s,x –q0([#|L1],[#|L2]) :- !, q0(L1,L2). –q0([X|L1],[X|L2]) :- \+ mentioned(X), q0(L1,L2). % other ! is known as the “cut” predicate –it affects how Prolog backtracks for another solution –it means “cut” the backtracking off –Prolog will not try any other possible matching rule on backtracking

Exercise Ungraded exercise: –Implement 3.14 in Prolog –Make sure you can do e-insertion and the inverse operation, i.e. go from surface form to intermediate form