Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 2005 Lecture Notes #4 EECS 595 / LING 541 / SI 661 Natural Language Processing.

Similar presentations


Presentation on theme: "Fall 2005 Lecture Notes #4 EECS 595 / LING 541 / SI 661 Natural Language Processing."— Presentation transcript:

1 Fall 2005 Lecture Notes #4 EECS 595 / LING 541 / SI 661 Natural Language Processing

2 Features and unification

3 Introduction Grammatical categories have properties Constraint-based formalisms Example: this flights: agreement is difficult to handle at the level of grammatical categories Example: many water: count/mass nouns Sample rule that takes into account features: S  NP VP (but only if the number of the NP is equal to the number of the VP)

4 Feature structures CAT NP NUMBER SINGULAR PERSON 3 CAT NP AGREEMENT NUMBER SG PERSON 3 Feature paths: {x agreement number}

5 Unification [NUMBER SG] [NUMBER SG] + [NUMBER SG] [NUMBER PL] - [NUMBER SG] [NUMBER []] = [NUMBER SG] [NUMBER SG] [PERSON 3] = ?

6 Agreement S  NP VP {NP AGREEMENT} = {VP AGREEMENT} Does this flight serve breakfast? Do these flights serve breakfast? S  Aux NP VP {Aux AGREEMENT} = {NP AGREEMENT}

7 Agreement These flights This flight NP  Det Nominal {Det AGREEMENT} = {Nominal AGREEMENT} Verb  serve {Verb AGREEMENT NUMBER} = PL Verb  serves {Verb AGREEMENT NUMBER} = SG

8 Subcategorization VP  Verb {VP HEAD} = {Verb HEAD} {VP HEAD SUBCAT} = INTRANS VP  Verb NP {VP HEAD} = {Verb HEAD} {VP HEAD SUBCAT} = TRANS VP  Verb NP NP {VP HEAD} = {Verb HEAD} {VP HEAD SUBCAT} = DITRANS

9 Regular Expressions and Automata

10 Regular expressions Searching for “woodchuck” Searching for “woodchucks” with an optional final “s” Regular expressions Finite-state automata (singular: automaton)

11 Regular expressions Basic regular expression patterns Perl-based syntax (slightly different from other notations for regular expressions) Disjunctions [abc] Ranges [A-Z] Negations [^Ss] Optional characters ? and * Wild cards. Anchors ^ and $, also \b and \B Disjunction, grouping, and precedence |

12 Writing correct expressions Exercise: write a Perl regular expression to match the English article “the”: /the/ /[tT]he/ /\b[tT]he\b/ /[^a-zA-Z][tT]he[^a-zA-Z]/ /(^|[^a-zA-Z])[tT]he[^a-zA-Z]/

13 A more complex example Exercise: Write a regular expression that will match “any PC with more than 500MHz and 32 Gb of disk space for less than $1000”: /$[0-9]+/ /$[0-9]+\.[0-9][0-9]/ /\b$[0-9]+(\.[0-9][0-9])?\b/ /\b[0-9]+ *(MHz|[Mm]egahertz|Ghz| [Gg]igahertz)\b/ /\b[0-9]+ *(Mb|[Mm]egabytes?)\b/ /\b[0-9](\.[0-9]+) *(Gb|[Gg]igabytes?)\b/

14 Advanced operators

15 Substitutions and memory Substitutions Memory ( \1, \2, etc. refer back to matches) s/colour/color/ s/([0-9]+)/ /

16 Eliza [Weizenbaum, 1966] User: Men are all alike ELIZA: IN WHAT WAY User: They’re always bugging us about something or other ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE? User: Well, my boyfriend made me come here ELIZA: YOUR BOYFRIEND MADE YOU COME HERE User: He says I’m depressed much of the time ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

17 Eliza-style regular expressions s/.* YOU ARE (depressed|sad).*/I AM SORRY TO HEAR YOU ARE \1/ s/.* YOU ARE (depressed|sad).*/WHY DO YOU THINK YOU ARE \1/ s/.* all.*/IN WHAT WAY/ s/.* always.*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ Step 1: replace first person references with second person references Step 2: use additional regular expressions to generate replies Step 3: use scores to rank possible transformations

18 Finite-state automata Finite-state automata (FSA) Regular languages Regular expressions

19 Finite-state automata (machines) baa! baaa! baaaa! baaaaa!... q0q0 q1q1 q2q2 q3q3 q4q4 baa! a baa+! state transition final state

20 Input tape aba!b q0q0

21 Finite-state automata Q: a finite set of N states q 0, q 1, … q N  : a finite input alphabet of symbols q 0 : the start state F: the set of final states  (q,i): transition function

22 State-transition tables Input Stateba! 0100 1020 2030 3034 4000

23 The FSM toolkit and friends Developed at AT&T Research (Riley, Pereira, Mohri, Sproat) Download: http://www.research.att.com/sw/tools/fsm/tech.html http://www.research.att.com/sw/tools/lextools/ Tutorial available 4 useful parts: FSM, Lextools, GRM, Dot (separate) –/data2/tools/fsm-3.6/bin –/data2/tools/lextools/bin –/data2/tools/dot/bin

24 D-RECOGNIZE function D-RECOGNIZE (tape, machine) returns accept or reject index  Beginning of tape current-state  Initial state of machine loop if End of input has been reached then if current-state is an accept state then return accept else return reject elsif transition-table [current-state, tape[index]] is empty then return reject else current-state  transition-table [current-state, tape[index]] index  index + 1 end

25 Adding a failing state q0q0 q1q1 q2q2 q3q3 q4q4 baa! a qFqF a ! b ! b! b b a !

26 Languages and automata Formal languages: regular languages, non-regular languages deterministic vs. non-deterministic FSAs Epsilon (  ) transitions

27 Using NFSAs to accept strings Backup: add markers at choice points, then possibly revisit underexplored markers Look-ahead: look ahead in input Parallelism: look at alternatives in parallel

28 Using NFSAs Input Stateba!  01000 10200 202,300 30040 40000

29 More about FSAs Transducers Equivalence of DFSAs and NFSAs Recognition as search: depth-first, breadth- search

30 Recognition using NFSAs

31 Regular languages Operations on regular languages and FSAs: concatenation, closure, union Properties of regular languages (closed under concatenation, union, disjunction, intersection, difference, complementation, reversal, Kleene closure)

32 An exercise J&M 2.8. Write a regular expression for the language accepted by the NFSA in the Figure.

33 Morphology and Finite-State Transducers

34 Morphemes Stems, affixes Affixes: prefixes, suffixes, infixes: hingi (borrow) – humingi (agent) in Tagalog, circumfixes: sagen – gesagt in German Concatenative morphology Templatic morphology (Semitic languages) : lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)

35 Morphological analysis rewrites unbelievably

36 Inflectional morphology Tense, number, person, mood, aspect Five verb forms in English 40+ forms in French Six cases in Russian: http://www.departments.bucknell.edu/russian/language/case.html http://www.departments.bucknell.edu/russian/language/case.html Up to 40,000 forms in Turkish (you cause X to cause Y to … do Z)

37 Derivational morphology Nominalization: computerization, appointee, killer, fuzziness Formation of adjectives: computational, embraceable, clueless

38 Finite-state morphological parsing Cats: cat +N +PL Cat: cat +N +SG Cities: city +N +PL Geese: goose +N +PL Ducks: (duck +N +PL) or (duck +V +3SG) Merging: +V +PRES-PART Caught: (catch +V +PAST-PART) or (catch +V +PAST)

39 Principles of morphological parsing Lexicon Morphotactics (e.g., plural follows noun) Orthography (easy  easier) Irregular nouns: e.g., geese, sheep, mice Irregular verbs: e.g., caught, ate, eaten

40 FSA for adjectives Big, bigger, biggest Cool, cooler, coolest, coolly Red, redder, reddest Clear, clearer, clearest, clearly, unclear, unclearly Happy, happier, happiest, happily Unhappy, unhappier, unhappiest, unhappily What about: unbig, redly, and realest?

41 Using FSA for recognition Is a string a legitimate word or not? Two-level morphology: lexical level + surface level (Koskenniemi 83) Finite-state transducers (FST) – used for regular relations Inversion and composition of FST

42 Orthographic rules Beg/begging Make/making Watch/watches Try/tries Panic/panicked

43 Combining FST lexicon and rules Cascades of transducers: the output of one becomes the input of another

44 Weighted Automata

45 Phonetic symbols IPA Arpabet Examples

46 Using WFST for language modeling Phonetic representation Part-of-speech tagging

47 Word Classes and Part Of Speech Tagging

48 Some POS statistics Preposition list from COBUILD Single-word particles Conjunctions Pronouns Modal verbs

49 Tagsets for English Penn Treebank Other tagsets (see Week 1 slides)

50 POS ambiguity Degrees of ambiguity (DeRose 1988) Rule-based POS tagging –ENGTWOL (Voutilainen et al. ) –Sample rule: Adverbial-That rule (“it isn’t that odd”) (“Given input: “that” if (+1 A/ADV/QUANT); (+2 SENT-LIM); (NOT –1 SVOC/A); (not a verb like “consider”) then eliminate non-ADV tags else eliminate ADV tag

51 Evaluating POS taggers Percent correct What is the lower bound on a system’s performance? What about the upper bound?

52 Kappa N: number of items (index i) n: number of categories (index j) k: number of annotators when  >.8 – agreement is considered high

53 Midterm reading list Chapter 1 – Introduction Chapter 2 – Regular expressions and automata Chapter 3 – Morphology and finite-state transducers + FSM tutorial Chapter 8 – Word classes and POS tagging Chapter 9 – Context-free grammars for English Chapter 10 – Parsing with context-free grammars Chapter 11 - Features and unification

54 Syntaxscape Written by Juno Suk of Lucent http://www.cs.columbia.edu/~radev/syntaxscape/

55

56 Read by yourselves 9.9. Spoken language syntax 9.10. Grammar equivalence 9.11. Finite-state and context-free grammars


Download ppt "Fall 2005 Lecture Notes #4 EECS 595 / LING 541 / SI 661 Natural Language Processing."

Similar presentations


Ads by Google