Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Similar presentations


Presentation on theme: "1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini."— Presentation transcript:

1 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini

2 1/11/2016CPSC503 Winter 20102 Today Sep 14 Subscribe to mailing list cpsc503 (majordomo) Questionnaire Brief check of some background knowledge (& annotated corpora) English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

3 1/11/2016CPSC503 Winter 20103 Finite state machines Regular Expressions & Finite State Automata 6.7 Finite State Transducers 2.0 Hidden-Markov Models 4.2 Basic Probability, Bayesian Statistics and Information Theory Conditional Probability Programming 7.2Java Bayesian Networks6.5 5.4Python Entropy 5.4 3.4Dynamic Programming Machine Learning5.7 Supervised Classification (e.g., Decision Trees)Search Algorithms 4.56.0 Unsupervised Learning (e.g., clustering) Linguistics 4.3 2.4 Richer Formalisms Context-Free Grammar 4.3 First-Order Logics 5.4

4 1/11/2016CPSC503 Winter 20104 Today Sep 14 Brief check of some background knowledge English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

5 1/11/2016CPSC503 Winter 20105 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics, Prob. Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners (MDP Markov Decision Processes)

6 1/11/2016CPSC503 Winter 20106 Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

7 1/11/2016CPSC503 Winter 20107 ?? baaa !\ 0 123465 baba !\ 0 123465

8 1/11/2016CPSC503 Winter 20108 ?? /CPSC50[34]/ /^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/ /[0-9]+(\.[0-9]+){3}/

9 1/11/2016CPSC503 Winter 20109 Fundamental Relations FSA Regular Expressions Many Linguistic Phenomena model implement (generate and recognize) describe

10 1/11/2016CPSC503 Winter 201010 Second Usage of RegExp: Text Searching/Editing Find me all instances of the determiner “the” in an English text. –To count them –To substitute them with something else You try: /the/ /[tT]he//\bthe\b/ /\b[tT]he\b/ The other cop went to the bank but there were no people there. s/\b([tT]he|[Aa]n?)\b/DET/

11 Annotated Corpora Example The CoNLL corpora provide chunk structures, which are encoded as flat trees. The CoNLL 2000 Corpus includes ***phrasal chunks*** The CoNLL 2002 Corpus includes ***named entity chunks***. http://nltk.googlecode.com/svn/trunk/doc/h owto/corpus.html 1/11/2016CPSC503 Winter 201011

12 1/11/2016CPSC503 Winter 201012 Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

13 1/11/2016CPSC503 Winter 201013 English Morphology We can usefully divide morphemes into two classes –Stems: The core meaning bearing units –Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions Def. The study of how words are formed from minimal meaning-bearing units (morphemes) Examples: unhappily, ……………

14 1/11/2016CPSC503 Winter 201014 Word Classes For now word classes: nouns, verbs, adjectives and adverbs. We’ll go into the gory details in Ch 5 Word class determines to a large degree the way that stems and affixes combine

15 1/11/2016CPSC503 Winter 201015 English Morphology We can also divide morphology up into two broad classes –Inflectional –Derivational

16 1/11/2016CPSC503 Winter 201016 Inflectional Morphology The resulting word: –Has the same word class as the original –Serves a grammatical/semantic purpose different from the original

17 1/11/2016CPSC503 Winter 201017 Nouns, Verbs and Adjectives (English) Nouns are simple (not really) –Markers for plural and possessive Verbs are only slightly more complex –Markers appropriate to the tense of the verb and to the person Adjectives –Markers for comparative and superlative

18 1/11/2016CPSC503 Winter 201018 Regulars and Irregulars Some words misbehave (refuse to follow the rules) –Mouse/mice, goose/geese, ox/oxen –Go/went, fly/flew Regulars… –Walk, walks, walking, walked, walked Irregulars –Eat, eats, eating, ate, eaten –Catch, catches, catching, caught, caught –Cut, cuts, cutting, cut, cut

19 1/11/2016CPSC503 Winter 201019 Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you. –Changes of word class –Less Productive ( -ant V -> N only with V of Latin origin!)

20 1/11/2016CPSC503 Winter 201020 Derivational Examples Verb/Adj to Noun -ationcomputerizecomputerization -eeappointappointee -erkillkiller -nessfuzzyfuzziness

21 1/11/2016CPSC503 Winter 201021 Derivational Examples Noun/Verb to Adj -alComputationComputational -ableEmbraceEmbraceable -lessClueClueless

22 1/11/2016CPSC503 Winter 201022 Compute Many paths are possible… Start with compute –Computer -> computerize -> computerization –Computation -> computational –Computer -> computerize -> computerizable –Compute -> computee

23 1/11/2016CPSC503 Winter 201023 Summary State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

24 1/11/2016CPSC503 Winter 201024 FSAs and Morphology GOAL1: recognize whether a string is an English word PLAN: 1.First we’ll capture the morphotactics (the rules governing the ordering of affixes in a language) 2.Then we’ll add in the actual stems

25 1/11/2016CPSC503 Winter 201025 FSA for Portion of Noun Inflectional Morphology

26 1/11/2016CPSC503 Winter 201026 Adding the Stems But it does not express that: Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box) Reg nouns ending –y preceded by a consonant change the –y to -i

27 1/11/2016CPSC503 Winter 201027 Small Fragment of V and N Derivational Morphology [noun i ] eg. hospital [adj al ] eg. formal [adj ous ] eg. arduous [verb j ] eg. speculate [verb k ] eg. conserve

28 1/11/2016CPSC503 Winter 201028 GOAL2: Morphological Parsing/Generation (vs. Recognition) Recognition is usually not quite what we need. –Usually given a word we need to find: the stem and its class and morphological features (parsing) –Or we have a stem and its class and morphological features and we want to produce the word (production/generation) Examples (parsing) –From “ cats” to “ cat +N +PL” –From “lies” to ……

29 1/11/2016CPSC503 Winter 201029 Computational problems in Morphology Recognition: recognize whether a string is an English word (FSA) Parsing/Generation: word stem, class, lexical features …. lies lie +N +PL lie +V +3SG Stemming: word stem …. e.g.,

30 1/11/2016CPSC503 Winter 201030 Finite State Transducers FSA cannot help…. The simple story –Add another tape –Add extra symbols to the transitions –On one tape we read “ cats ”, on the other we write “ cat +N +PL ”

31 1/11/2016CPSC503 Winter 201031 FSTs generationparsing

32 1/11/2016CPSC503 Winter 201032 (Simplified) FST formal definition (you can skip 3.4.1 unless you want to work on FST) Q: a finite set of states I,O: input and an output alphabets (which may include ε) Σ: a finite alphabet of complex symbols i:o, i  I and o  O Q 0: the start state F: a set of accept/final states (F  Q) A transition relation δ that maps QxΣ to 2 Q

33 1/11/2016CPSC503 Winter 201033 FST can be used as… Translators: input one string from I, output another from O (or vice versa) Recognizers: input a string from IxO Generator: output a string from IxO

34 1/11/2016CPSC503 Winter 201034 Simple Example Transitions (as a translator): c:c means read a c on one tape and write a c on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) c:ca:at:t +N:ε +PL:s +SG: ε

35 Examples (as a translator) cats +N +SG cat lexical surface generation parsing c:ca:at:t +N:ε +PL:s +SG: ε 1/11/201635 CPSC503 Winter 2010

36 1/11/2016CPSC503 Winter 201036 Slightly More complex Example Transitions (as a translator): l:l means read an l on one tape and write an l on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) … +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7

37 Examples (as a translator) lies +V+3SGlie lexical surface generation parsing +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 1/11/2016 37 CPSC503 Winter 2010

38 Examples (as a recognizer and a generator) lies +V+3SGlie lexical surface +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 1/11/201638CPSC503 Winter 2010

39 1/11/2016CPSC 503 – Winter 201039 Introductions Your Name Previous experience in NLP? Why are you interested in NLP? Are you thinking of NLP as your main research area? If not, what else do you want to specialize in…. Anything else…………

40 1/11/2016CPSC503 Winter 201040 Next Time Finish FST and morphological analysis Porter Stemmer Read Chp. 3 up to 3.10 excluded (def. of FST: understand the one on slides) (3.4.1 optional) Assignment-1 will be out today (due Sept21)


Download ppt "1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini."

Similar presentations


Ads by Google