1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Slides:



Advertisements
Similar presentations
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Advertisements

Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
1 Morphology September 4, 2012 Lecture #3. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Statistical NLP: Lecture 3
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
Midterm Review CS4705 Natural Language Processing.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Computational Language Finite State Machines and Regular Expressions.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Some Basic Concepts: Morphology.
CPSC 503 Computational Linguistics
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
9/8/20151 Natural Language Processing Lecture Notes 1.
Morphology: Words and their Parts CS 4705 Slides adapted from Jurafsky, Martin Hirschberg and Dorr.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA3050: Natural Language Algorithms Finite State Devices.
CPSC 503 Computational Linguistics
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
CPSC 503 Computational Linguistics
Lecture 7 Summary Survey of English morphology
Speech and Language Processing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Natural Language Processing
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
CS4705 Natural Language Processing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Morphological Parsing
Presentation transcript:

1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini

1/11/2016CPSC503 Winter Today Sep 14 Subscribe to mailing list cpsc503 (majordomo) Questionnaire Brief check of some background knowledge (& annotated corpora) English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

1/11/2016CPSC503 Winter Finite state machines Regular Expressions & Finite State Automata 6.7 Finite State Transducers 2.0 Hidden-Markov Models 4.2 Basic Probability, Bayesian Statistics and Information Theory Conditional Probability Programming 7.2Java Bayesian Networks Python Entropy Dynamic Programming Machine Learning5.7 Supervised Classification (e.g., Decision Trees)Search Algorithms Unsupervised Learning (e.g., clustering) Linguistics Richer Formalisms Context-Free Grammar 4.3 First-Order Logics 5.4

1/11/2016CPSC503 Winter Today Sep 14 Brief check of some background knowledge English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

1/11/2016CPSC503 Winter Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics, Prob. Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners (MDP Markov Decision Processes)

1/11/2016CPSC503 Winter Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

1/11/2016CPSC503 Winter ?? baaa !\ baba !\

1/11/2016CPSC503 Winter ?? /CPSC50[34]/ /^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/ /[0-9]+(\.[0-9]+){3}/

1/11/2016CPSC503 Winter Fundamental Relations FSA Regular Expressions Many Linguistic Phenomena model implement (generate and recognize) describe

1/11/2016CPSC503 Winter Second Usage of RegExp: Text Searching/Editing Find me all instances of the determiner “the” in an English text. –To count them –To substitute them with something else You try: /the/ /[tT]he//\bthe\b/ /\b[tT]he\b/ The other cop went to the bank but there were no people there. s/\b([tT]he|[Aa]n?)\b/DET/

Annotated Corpora Example The CoNLL corpora provide chunk structures, which are encoded as flat trees. The CoNLL 2000 Corpus includes ***phrasal chunks*** The CoNLL 2002 Corpus includes ***named entity chunks***. owto/corpus.html 1/11/2016CPSC503 Winter

1/11/2016CPSC503 Winter Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

1/11/2016CPSC503 Winter English Morphology We can usefully divide morphemes into two classes –Stems: The core meaning bearing units –Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions Def. The study of how words are formed from minimal meaning-bearing units (morphemes) Examples: unhappily, ……………

1/11/2016CPSC503 Winter Word Classes For now word classes: nouns, verbs, adjectives and adverbs. We’ll go into the gory details in Ch 5 Word class determines to a large degree the way that stems and affixes combine

1/11/2016CPSC503 Winter English Morphology We can also divide morphology up into two broad classes –Inflectional –Derivational

1/11/2016CPSC503 Winter Inflectional Morphology The resulting word: –Has the same word class as the original –Serves a grammatical/semantic purpose different from the original

1/11/2016CPSC503 Winter Nouns, Verbs and Adjectives (English) Nouns are simple (not really) –Markers for plural and possessive Verbs are only slightly more complex –Markers appropriate to the tense of the verb and to the person Adjectives –Markers for comparative and superlative

1/11/2016CPSC503 Winter Regulars and Irregulars Some words misbehave (refuse to follow the rules) –Mouse/mice, goose/geese, ox/oxen –Go/went, fly/flew Regulars… –Walk, walks, walking, walked, walked Irregulars –Eat, eats, eating, ate, eaten –Catch, catches, catching, caught, caught –Cut, cuts, cutting, cut, cut

1/11/2016CPSC503 Winter Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you. –Changes of word class –Less Productive ( -ant V -> N only with V of Latin origin!)

1/11/2016CPSC503 Winter Derivational Examples Verb/Adj to Noun -ationcomputerizecomputerization -eeappointappointee -erkillkiller -nessfuzzyfuzziness

1/11/2016CPSC503 Winter Derivational Examples Noun/Verb to Adj -alComputationComputational -ableEmbraceEmbraceable -lessClueClueless

1/11/2016CPSC503 Winter Compute Many paths are possible… Start with compute –Computer -> computerize -> computerization –Computation -> computational –Computer -> computerize -> computerizable –Compute -> computee

1/11/2016CPSC503 Winter Summary State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

1/11/2016CPSC503 Winter FSAs and Morphology GOAL1: recognize whether a string is an English word PLAN: 1.First we’ll capture the morphotactics (the rules governing the ordering of affixes in a language) 2.Then we’ll add in the actual stems

1/11/2016CPSC503 Winter FSA for Portion of Noun Inflectional Morphology

1/11/2016CPSC503 Winter Adding the Stems But it does not express that: Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box) Reg nouns ending –y preceded by a consonant change the –y to -i

1/11/2016CPSC503 Winter Small Fragment of V and N Derivational Morphology [noun i ] eg. hospital [adj al ] eg. formal [adj ous ] eg. arduous [verb j ] eg. speculate [verb k ] eg. conserve

1/11/2016CPSC503 Winter GOAL2: Morphological Parsing/Generation (vs. Recognition) Recognition is usually not quite what we need. –Usually given a word we need to find: the stem and its class and morphological features (parsing) –Or we have a stem and its class and morphological features and we want to produce the word (production/generation) Examples (parsing) –From “ cats” to “ cat +N +PL” –From “lies” to ……

1/11/2016CPSC503 Winter Computational problems in Morphology Recognition: recognize whether a string is an English word (FSA) Parsing/Generation: word stem, class, lexical features …. lies lie +N +PL lie +V +3SG Stemming: word stem …. e.g.,

1/11/2016CPSC503 Winter Finite State Transducers FSA cannot help…. The simple story –Add another tape –Add extra symbols to the transitions –On one tape we read “ cats ”, on the other we write “ cat +N +PL ”

1/11/2016CPSC503 Winter FSTs generationparsing

1/11/2016CPSC503 Winter (Simplified) FST formal definition (you can skip unless you want to work on FST) Q: a finite set of states I,O: input and an output alphabets (which may include ε) Σ: a finite alphabet of complex symbols i:o, i  I and o  O Q 0: the start state F: a set of accept/final states (F  Q) A transition relation δ that maps QxΣ to 2 Q

1/11/2016CPSC503 Winter FST can be used as… Translators: input one string from I, output another from O (or vice versa) Recognizers: input a string from IxO Generator: output a string from IxO

1/11/2016CPSC503 Winter Simple Example Transitions (as a translator): c:c means read a c on one tape and write a c on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) c:ca:at:t +N:ε +PL:s +SG: ε

Examples (as a translator) cats +N +SG cat lexical surface generation parsing c:ca:at:t +N:ε +PL:s +SG: ε 1/11/ CPSC503 Winter 2010

1/11/2016CPSC503 Winter Slightly More complex Example Transitions (as a translator): l:l means read an l on one tape and write an l on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) … +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7

Examples (as a translator) lies +V+3SGlie lexical surface generation parsing +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 1/11/ CPSC503 Winter 2010

Examples (as a recognizer and a generator) lies +V+3SGlie lexical surface +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 1/11/201638CPSC503 Winter 2010

1/11/2016CPSC 503 – Winter Introductions Your Name Previous experience in NLP? Why are you interested in NLP? Are you thinking of NLP as your main research area? If not, what else do you want to specialize in…. Anything else…………

1/11/2016CPSC503 Winter Next Time Finish FST and morphological analysis Porter Stemmer Read Chp. 3 up to 3.10 excluded (def. of FST: understand the one on slides) (3.4.1 optional) Assignment-1 will be out today (due Sept21)