Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.

Slides:



Advertisements
Similar presentations
MAIN NOTIONS OF MORPHOLOGY
Advertisements

Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Language & Mind Summer Words Perhaps the most conspicuous, most easily extractable aspect of language. Cf. phone, phoneme, syllable NB word vis.
Morphology and Lexicon Chapter 3
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Brief introduction to morphology
Autosegmental Phonology
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Morphology I. Basic concepts and terms Derivational processes
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Morphological analysis
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Introduction to English Morphology Finite State Transducers
Natural Language Processing DR. SADAF RAUF. Topic Morphology: Indian Language and European Language Maryam Zahid.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 2 26 July 2007.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 4, Jan 15, 2007.
Phonological Rules Rules about how sounds may or may not go together in a language English: Words may not start with two stop consonants German: Devoicing.
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Chapter Four Morphology
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Ch4 – Features Consider the following data from Mokilese
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
Reasons to Study Lexicography  You love words  It can help you evaluate dictionaries  It might make you more sensitive to what dictionaries have in.
Formal Properties of Language: Talk is achieved through the interdependent components of sounds, words, sentences, and meanings.
Morphology An Introduction to the Structure of Words Lori Levin and Christian Monson Grammars and Lexicons Fall Term, 2004.
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
Artificial Intelligence: Natural Language
WHAT IS LANGUAGE?. INTRODUCTION In order to interact,human beings have developed a language which distinguishes them from the rest of the animal world.
Natural Language Processing Chapter 2 : Morphology.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
MORPHOLOGY definition; variability among languages.
Levels of Linguistic Analysis
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
Derivational morphemes
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Morphology 1 : the Morpheme
Introduction to Linguistics Unit Four Morphology, Part One Dr. Judith Yoel.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Speech and Language Processing
Morphology and syntax.
What is Linguistics? The scientific study of human language
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Língua Inglesa - Aspectos Morfossintáticos
Levels of Linguistic Analysis
Morphological Parsing
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005

2 The Description of Language Language = Words and Rules  Dictionary (vocabulary) + Grammar Dictionary set of words defined in the language.open (dynamic)  Traditional - paper based  Electronic - machine readable dictionaries; can be obtained from paper-based Grammar set of rules which describe what is allowable in a language Classic Grammars  meant for humans who know the language  definitions and rules are mainly supported by examples  no (or almost no) formal description tools; cannot be programmed Explicit Grammar (CFG, Dependency Grammars, Link Grammars,...) formal description can be programmed & tested on data (texts)

3 Levels of (Formal) Description 6 basic levels (more or less explicitly present in most theories) : and beyond (pragmatics/logic/...) meaning (semantics) (surface) syntax morphology phonology phonetics/orthography Each level has an input and output representation  output from one level is the input to the next (upper) level  sometimes levels might be skipped (merged) or split

4 Phonetics/Orthography Input:  acoustic signal (phonetics) / text (orthography) Output:  phonetic alphabet (phonetics) / text (orthography) Deals with:  Phonetics:  consonant & vowel (& others) formation in the vocal tract  classification of consonants, vowels,... in relation to frequencies, shape & position of the tongue and various muscles  intonation  Orthography: normalization, punctuation, etc.

5 Phonology Input:  sequence of phones/sounds (in a phonetic alphabet); or “normalized” text (sequence of (surface) letters in one language’s alphabet) [NB: phones vs. phonemes] Output:  sequence of phonemes (~ (lexical) letters; in an abstract alphabet) Deals with:  relation between sounds and phonemes (units which might have some function on the upper level)  e.g.: [u] ~ oo (as in book), [æ] ~ a (cat); i ~ y (flies)

6 Morphology Input:  sequence of phonemes (~ (lexical) letters) Output:  sequence of pairs (lemma, (morphological) tag) Deals with:  composition of phonemes into word forms and their underlying lemmas (lexical units) + morphological categories (inflection, derivation, compounding)  e.g. quotations ~ quote/V + -ation(der.V->N) + NNS.

7 (Surface) Syntax Input:  sequence of pairs (lemma, (morphological) tag) Output:  sentence structure (tree) with annotated nodes (all lemmas, (morphosyntactic) tags, functions), of various forms Deals with:  the relation between lemmas & morphological categories and the sentence structure  uses syntactic categories such as Subject, Verb, Object,...  e.g.: I/PP1 see/VB a/DT dog/NN ~  ((I/sg)SB ((see/pres)V (a/ind dog/sg)OBJ)VP)S

8 Meaning (semantics) Input:  sentence structure (tree) with annotated nodes (lemmas, (morphosyntactic) tags, surface functions) Output:  sentence structure (tree) with annotated nodes (semantic lemmas, (morpho-syntactic) tags, deep functions) Deals with:  relation between categories such as “Subject”, “Object” and (deep) categories such as “Agent”, “Effect”; adds other cat’s  e.g. ((I)SB ((was seen)V (by Tom)OBJ)VP)S ~  (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f)

9...and Beyond Input:  sentence structure (tree): annotated nodes (autosemantic lemmas, (morphosyntactic) tags, deep functions) Output:  logical form, which can be evaluated (true/false) Deals with:  assignment of objects from the real world to the nodes of the sentence structure  e.g.: (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f) ~ see( Mark-Twain[SSN:...],Tom-Sawyer[SSN:...] ) [Time:bef 99/9/27/14:15][Place:39ş19’40”N76ş37’10”W]

Lecture 3, 7/27/2005Natural Language Processing10 Morphology Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes (morph = shape, logos = word) We can usefully divide morphemes into two classes  Stems: The core meaning bearing units  Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions  Prefix: un-, anti-, etc  Suffix: -ity, -ation, etc  Infix: are inserted inside the stem Tagalog: um + hingi  humingi  Circumfixes – precede and follow the stem English doesn’t stack more affixes. But Turkish can have words with a lot of suffixes. Languages, such as Turkish, tend to string affixes together are called agglutinative languages.

Lecture 3, 7/27/2005Natural Language Processing11 Surface and Lexical Forms The surface level of a word represents the actual spelling of that word.  geliyorum eats cats kitabım The lexical level of a word represents a simple concatenation of morphemes making up that word.  gel +PROG +1SG  eat +AOR  cat +PLU  kitap +P1SG Morphological processors try to find correspondences between lexical and surface forms of words.  Morphological recognition/ analysis – surface to lexical  Morphological generation/ synthesis – lexical to surface

12 Morphology: Morphemes & Order Handles what is an isolated form in written text Grouping of phonemes into morphemes  sequence deliverables  deliver, able and s (3 units) Morpheme Combination  certain combinations/sequencing possible, other not:  deliver+able+s, but not able+derive+s; noun+s, but not noun+ing  typically fixed (in any given language)

Lecture 3, 7/27/2005Natural Language Processing13 Inflectional & Derivational Morphology We can also divide morphology up into two broad classes  Inflectional  Derivational Inflectional morphology concerns the combination of stems and affixes where the resulting word  Has the same word class as the original  Serves a grammatical/semantic purpose different from the original After a combination with an inflectional morpheme, the meaning and class of the actual stem usually do not change.  eat / eats pencil / pencils After a combination with an derivational morpheme, the meaning and the class of the actual stem usually change.  compute / computer do / undo friend / friendly  Uygar / uygarlaşkapı / kapıcı The irregular changes may happen with derivational affixes.

Lecture 3, 7/27/2005Natural Language Processing14 Morphological Parsing Morphological parsing is to find the lexical form of a word from its surface form.  cats -- cat +N +PLU  cat -- cat +N +SG  goose -- goose +N +SG or goose +V  geese -- goose +N +PLU  gooses -- goose +V +3SG  catch -- catch +V  caught -- catch +V +PAST or catch +V +PP There can be more than one lexical level representation for a given word. (ambiguity)

Lecture 3, 7/27/2005Natural Language Processing15 Morphological Analysis Analyzing words into their linguistic components (morphemes). Morphemes are the smallest meaningful units of language. carscar+PLU givinggive+PROG AsachhilAmaAsA+PROG+PAST+1st I/We was/were coming Ambiguity: More than one alternatives fliesfly VERB +PROG fly NOUN +PLU mAtAla kare

Lecture 3, 7/27/2005Natural Language Processing16 Fly + s  flys  flies (y  i rule) Duckling Go-getter  get + er Doer  do + er Beer  ? What knowledge do we need? How do we represent it? How do we compute with it?

Lecture 3, 7/27/2005Natural Language Processing17 Knowledge needed Knowledge of stems or roots  Duck is a possible root, not duckl We need a dictionary (lexicon) Only some endings go on some words  Do + er ok  Be + er – not ok In addition, spelling change rules that adjust the surface form  Get + er – double the t getter  Fox + s – insert e – foxes  Fly + s – insert e – flys – y to i – flies  Chase + ed – drop e - chased

Lecture 3, 7/27/2005Natural Language Processing18 Put all this in a big dictionary (lexicon) Turkish – approx 600  10 6 forms Finnish – 10 7 Hindi, Bengali, Telugu, Tamil? Besides, always novel forms can be constructed  Anti-missile  Anti-anti-missile Anti-anti-anti-missile  …….. Compounding of words – Sanskrit, German

19 Morphology: From Morphemes to Lemmas & Categories Lemma: lexical unit, “pointer” to lexicon  typically is represented as the “base form”, or “dictionary headword”  possibly indexed when ambiguous/polysemous: state 1 (verb), state 2 (state-of-the-art), state 3 (government)  from one or more morphemes (“root”, “stem”, “root+derivation”,...) Categories: non-lexical  small number of possible values (< 100, often < 5-10)

20 Morphology Level: The Mapping Formally: A +    2 (L,C 1,C 2,...,Cn)  A is the alphabet of phonemes (A + denotes any non-empty sequence of phonemes)  L is the set of possible lemmas, uniquely identified  C i are morphological categories, such as:  grammatical number, gender, case  person, tense, negation, degree of comparison, voice, aspect,...  tone, politeness,...  part of speech (not quite morphological category, but...)  A, L and C i are obviously language-dependent

Lecture 3, 7/27/2005Natural Language Processing21 Morphological Analysis (cont.) Relatively simple for English. But for many Indian languages, it may be more difficult. Examples Inflectional and Derivational Morphology. Common tools: Finite-state transducers