Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/2/2015CPSC503 Winter 20071 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Similar presentations


Presentation on theme: "6/2/2015CPSC503 Winter 20071 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini."— Presentation transcript:

1 6/2/2015CPSC503 Winter 20071 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini

2 6/2/2015CPSC503 Spring 20042 Today Sep 13 Brief check of some background knowledge English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

3 6/2/2015CPSC503 Spring 20043 Knowledge-Formalisms Map (including some probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

4 6/2/2015CPSC503 Spring 20044 Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

5 6/2/2015CPSC503 Spring 20045 ?? baaa !\ 0 123465 baba !\ 0 123465

6 6/2/2015CPSC503 Spring 20046 ?? /CPSC50[34]/ /^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/ /[0-9]+(\.[0-9]+){3}/

7 6/2/2015CPSC503 Spring 20047 Example of Usage: Text Searching/Editing Find me all instances of the determiner “the” in an English text. –To count them –To substitute them with something else You try: /the/ /[tT]he//\bthe\b/ /\b[tT]he\b/ The other cop went to the bank but there were no people there. s/\b([tT]he|[Aa]n?)\b/DET/

8 6/2/2015CPSC503 Spring 20048 Fundamental Relations FSA Regular Expressions Many Linguistic Phenomena model implement (generate and recognize) describe

9 6/2/2015CPSC503 Spring 20049 Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

10 6/2/2015CPSC503 Spring 200410 English Morphology We can usefully divide morphemes into two classes –Stems: The core meaning bearing units –Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions Def. The study of how words are formed from minimal meaning-bearing units (morphemes) Example: unhappily

11 6/2/2015CPSC503 Spring 200411 Word Classes For now word classes: nouns, verbs, adjectives and adverbs. We’ll go into the gory details in Ch 5 Word class determines to a large degree the way that stems and affixes combine

12 6/2/2015CPSC503 Spring 200412 English Morphology We can also divide morphology up into two broad classes –Inflectional –Derivational

13 6/2/2015CPSC503 Spring 200413 Inflectional Morphology The resulting word: –Has the same word class as the original –Serves a grammatical/semantic purpose different from the original

14 6/2/2015CPSC503 Spring 200414 Nouns, Verbs and Adjectives (English) Nouns are simple (not really) –Markers for plural and possessive Verbs are only slightly more complex –Markers appropriate to the tense of the verb and to the person Adjectives –Markers for comparative and superlative

15 6/2/2015CPSC503 Spring 200415 Regulars and Irregulars Some words misbehave (refuse to follow the rules) –Mouse/mice, goose/geese, ox/oxen –Go/went, fly/flew The terms regular and irregular will be used to refer to words that follow the rules and those that don’t.

16 6/2/2015CPSC503 Spring 200416 Regular and Irregular Verbs Regulars… –Walk, walks, walking, walked, walked Irregulars –Eat, eats, eating, ate, eaten –Catch, catches, catching, caught, caught –Cut, cuts, cutting, cut, cut

17 6/2/2015CPSC503 Spring 200417 Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you. –Changes of word class –Less Productive ( -ant V -> N only with V of Latin origin!)

18 6/2/2015CPSC503 Spring 200418 Derivational Examples Verb/Adj to Noun -ationcomputerizecomputerization -eeappointappointee -erkillkiller -nessfuzzyfuzziness

19 6/2/2015CPSC503 Spring 200419 Derivational Examples Noun/Verb to Adj -alComputationComputational -ableEmbraceEmbraceable -lessClueClueless

20 6/2/2015CPSC503 Spring 200420 Compute Many paths are possible… Start with compute –Computer -> computerize -> computerization –Computation -> computational –Computer -> computerize -> computerizable –Compute -> computee

21 6/2/2015CPSC503 Spring 200421 Summary State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

22 6/2/2015CPSC503 Spring 200422 FSAs and Morphology GOAL1: recognize whether a string is an English word PLAN: 1.First we’ll capture the morphotactics (the rules governing the ordering of affixes in a language) 2.Then we’ll add in the actual stems

23 6/2/2015CPSC503 Spring 200423 FSA for Portion of N Inflectional Morphology

24 6/2/2015CPSC503 Spring 200424 Adding the Stems But it does not express that: Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box) Reg nouns ending –y preceded by a consonant change the –y to -i

25 6/2/2015CPSC503 Spring 200425 Small Fragment of V and N Derivational Morphology [noun i ] eg. hospital [adj al ] eg. formal [adj ous ] eg. arduous [verb j ] eg. speculate [verb k ] eg. conserve

26 6/2/2015CPSC503 Spring 200426 GOAL2: Morphological Parsing/Generation (vs. Recognition) Recognition is usually not quite what we need. –Usually given a word we need to find: the stem and its class and morphological features (parsing) –Or we have a stem and its class and morphological features and we want to produce the word (production/generation) Examples (parsing) –From “ cats” to “ cat +N +PL” –From “lies” to ……

27 6/2/2015CPSC503 Spring 200427 Computational problems in Morphology Recognition: recognize whether a string is an English word (FSA) Parsing/Generation: word stem, class, lexical features …. lies lie +N +PL lie +V +3SG Stemming: word stem …. e.g.,

28 6/2/2015CPSC503 Spring 200428 Finite State Transducers FSA cannot help…. The simple story –Add another tape –Add extra symbols to the transitions –On one tape we read “ cats ”, on the other we write “ cat +N +PL ”

29 6/2/2015CPSC503 Spring 200429 FSTs generationparsing

30 6/2/2015CPSC503 Spring 200430 FST formal definition Q: a finite set of states I,O: input and an output alphabets (which may include ε) Σ: a finite alphabet of complex symbols i:o, i  I and o  O Q 0: the start state F: a set of accept/final states (F  Q) A transition relation δ that maps QxΣ to 2 Q

31 6/2/2015CPSC503 Spring 200431 FST can be used as… Translators: input one string from I, output another from O (or vice versa) Recognizers: input a string from IxO Generator: output a string from IxO

32 6/2/2015CPSC503 Spring 200432 Simple Example Transitions (as a translator): c:c means read a c on one tape and write a c on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) c:ca:at:t +N:ε +PL:s +SG: ε

33 6/2/2015CPSC503 Spring 200433 Examples (as a translator) cats +N +SG cat lexical surface generation parsing

34 6/2/2015CPSC503 Spring 200434 More complex Example Transitions (as a translator): l:l means read an l on one tape and write an l on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) … l:li:ie:e +N:ε +PL:s +V:ε +3SG:s q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7

35 6/2/2015CPSC503 Spring 200435 Examples (as a translator) lies +V+3SGlie lexical surface generation parsing

36 6/2/2015CPSC503 Spring 200436 Examples (as a recognizer and a generator) lies +V+3SGlie lexical surface

37 6/2/2015CPSC503 Spring 200437 Next Time Finish FST and morphological analysis Porter Stemmer Read Chp. 3 up to 3.10 excluded (def. of FST: understand the one on slides) (3.4.1 optional)


Download ppt "6/2/2015CPSC503 Winter 20071 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini."

Similar presentations


Ads by Google