CS 4705 Morphology: Words and their Parts CS 4705.

Slides:



Advertisements
Similar presentations
Finite-state automata and Morphology
Advertisements

Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Morphology.
1 Morphology September 2009 Lecture #4. 2 What is Morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of.
The Study Of Language Unit 7 Presentation By: Elham Niakan Zahra Ghana’at Pisheh.
Language & Mind Summer Words Perhaps the most conspicuous, most easily extractable aspect of language. Cf. phone, phoneme, syllable NB word vis.
Morphology Morphology is the branch of linguistics that studies the structure of words. In English and many other languages, many words can be broken down.
Morphology Nuha Alwadaani.
Morphology Chapter 7 Prepared by Alaa Al Mohammadi.
Brief introduction to morphology
Lecture -3 Week 3 Introduction to Linguistics – Level-5 MORPHOLOGY
Language is very difficult to put into words. -- Voltaire What do we mean by “language”? A system used to convey meaning made up of arbitrary elements.
Announcements  Revised Final Exam date:  THURSDAY 03/15/ :30-10:20 BAG 131.
Morphology I. Basic concepts and terms Derivational processes
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Some Basic Concepts: Morphology.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Morphology: Words and their Parts CS 4705.
Morphology.
323 Morphology The Structure of Words 1.1 What is Morphology? Morphology is the internal structure of words. V: walk, walk+s, walk+ed, walk+ing N: dog,
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 4, Jan 15, 2007.
English Lexicology Morphological Structure of English Words Week 3: Mar. 10, 2009 Instructor: Liu Hongyong.
Morphology: Words and their Parts CS 4705 Slides adapted from Jurafsky, Martin Hirschberg and Dorr.
Chapter Four Morphology
Morphology The Structure of Words.
Prof. Erik Lu. MORPHOLOGY GRAMMAR MORPHOLOGY MORPHEMES BOUND FREE WORDS LEXICAL GRAMMATICAL NOUNS VERBS ADJECTIVES (ADVERBS) PRONOUNS ARTICLES ADVERBS.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
Ch4 – Features Consider the following data from Mokilese
Formal Properties of Language. Grammar Morphology Syntax Semantics.
WEEK3- MORPHOLOGY Dr. Monira I. Al-Mohizea. What is this?
Formal Properties of Language: Talk is achieved through the interdependent components of sounds, words, sentences, and meanings.
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Chapter III morphology by WJQ. Morphology Morphology refers to the study of the internal structure of words, and the rules by which words are formed.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
Linguistics The ninth week. Chapter 3 Morphology  3.1 Introduction  3.2 Morphemes.
Morphology!. But puns first.  In partnership with Gabe  Have you seen Ken Burns' new documentary on the impact of yeast on agricultural societies? 
M ORPHOLOGY Lecturer/ Najla AlQahtani. W HAT IS MORPHOLOGY ? It is the study of the basic forms in a language. A morpheme is “a minimal unit of meaning.
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
Morphological typology
Natural Language Processing Chapter 2 : Morphology.
MORPHOLOGY definition; variability among languages.
MORPHOLOGY. Morphology The study of internal structure of words, and of the rules by which words are formed.
III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
MORPHOLOGY : THE STRUCTURE OF WORDS. MORPHOLOGY Morphology deals with the syntax of complex words and parts of words, also called morphemes, as well as.
Yun-Pi Yuan1 Morphology I. Parts of Speech II. Basic concepts and terms II. Derivational processes Derivational processes III. Inflection Inflection IV.
Chapter 3 Word Formation I This chapter aims to analyze the morphological structures of words and gain a working knowledge of the different word forming.
Morphology 1 : the Morpheme
King Faisal University [ ] 1 E-learning and Distance Education Deanship Department of English Language College of Arts King Faisal University Introduction.
INTRODUCTION ADE SUDIRMAN, S.Pd ENGLISH DEPARTMENT MATHLA’UL ANWAR UNIVERSITY.
Lecture 7 Summary Survey of English morphology
Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.
Lecture -3 Week 3 Introduction to Linguistics – Level-5 MORPHOLOGY
Introduction to Linguistics
عمادة التعلم الإلكتروني والتعليم عن بعد
Chapter 3 Morphology Without grammar, little can be conveyed. Without vocabulary, nothing can be conveyed. (David Wilkins ,1972) Morphology refers to.
Lecturer Ms. Abrar Mujaddidi LANE 321
Chapter 6 Morphology.
Morphology: Parsing Words
Morphology.
Morphology: Words and their Parts
EDL 1201 Linguistics for ELT Mohd Marzuki Maulud
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
Chhatrapati Shivaji College, Satara
Introduction to English morphology
Introduction to Linguistics
Presentation transcript:

CS 4705 Morphology: Words and their Parts CS 4705

Basic Uses of Morphology The study of how words are composed from smaller, meaning-bearing units (morphemes) Applications: –Spelling correction: referece –Hyphenation algorithms: refer-ence –Part-of-speech analysis: googler –Text-to-speech: grapheme-to-phoneme conversion hothouse (/T/ or /D/)

–Speech recognition: phoneme-to-grapheme conversion –Amusing poetry and artificial languages in standardized tests ‘Twas brillig and the slithy toves… Muggles moogled migwiches

What is a word? In formal languages, words are arbitrary strings In natural languages, words are made up of meaningful subunits called morphemes –Allows for productivity: googled, texted –Abstract concepts denoting entities or relationships in the world Roots + Syntactic or grammatical elements –Realizations of morphemes: morphs Door realizes door; take and took realize take

Allomorphs are classes of related morphs that realize a given morpheme –Allomorphs of s include en, men, es in English –Take and took are allomorphs of take –Sum: Morpheme [s] is realized by an allomorph class that includes the related morphs {en,men,es} –Syntactic or grammatical morphemes can convey many things –In Italian, mark nouns for gender and number SingularPlural Mascpomodoropomodori Femcipollacipolle pomodor- cipoll-: stems, may or may not occur on their own as words –Stem may not occur as a word: derivative/deriv –Base form (lemma) occurs as word: derivative/derive –Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too

What useful information does morphology give us? Different things in different languages –Spanish: hablo, hablaré/ English: I speak, I will speak –English: book, books/ Japanese: hon, hon Languages differ in how they encode morphological information –Isolating languages (e.g. Cantonese) have no affixes: each word usually has 1 morpheme –Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem (like beads on a string) – each feature realized by a single affix, e.g. Finnish

epäjärjestelmällistyttämättömyydellänsäkäänköhän ‘Wonder if he can also... with his capability of not causing things to be unsystematic’ –Inflectional languages (e.g. English) merge different features into a single affix (e.g. ‘s’ in likes indicates both person and tense); and the same feature can be realized by different affixes –Polysynthetic languages (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g. Western Greenlandic Aliikusersuillammassuaanerartassagaluarpaalli. aliiku-sersu-i-llammas-sua-a-nerar-ta-ssa-galuar-paal-li entertainment-provide-SEMITRANS-one.good.at-COP-say.that-REP- FUT-sure.but-3.PL.SUBJ/3SG.OBJ-but 'However, they will say that he is a great entertainer, but...'SEMITRANSCOP FUTOBJ –So….different languages may require very different morphological analyzers

Morphology Can Help Define Word Classes AKA morphological classes, parts-of-speech Closed vs. open (function vs. content) class words –Pronoun, preposition, conjunction, determiner,… –Noun, verb, adverb, adjective,… Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic detection…very basic semantics

(English) Inflectional Morphology Word stem + grammatical morpheme  different forms of same word –Usually produces word of same classclass –Usually serves a syntactic or grammatical function (e.g. agreement) like  likes or liked bird  birds Nominal morphology –Plural forms s or es Irregular forms (goose/geese)

Mass vs. count nouns (fish/fish(es), or s?) –Possessives (cat’s, cats’) Verbal inflection –Main verbs (sleep, like, fear) relatively regular -s, ing, ed And productive: ed, instant-messaged, faxed, homered But some are not: –eat/ate/eaten, catch/caught/caught –Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive »Be: am/is/are/were/was/been/being –Irregular verbs few (~250) but frequently occurring

Particles occur in only one form: in English –Prepositions: to, from –Adverbs: happily, quickly –Conjunctions: but, and –Articles: the, a, an –Japanese? So….English inflectional morphology is fairly easy to model….with some special cases...

Derivational Morphology Word stem + syntactic/grammatical morpheme  new words –Usually produces word of different class –Incomplete process: derivational morphs cannot be applied to just any member of a class Verbs --> nouns –-ize verbs  -ation nouns –generalize, realize  generalization, realization –synthesize but no synthesization

Verbs, nouns  adjectives –embrace, pity  embraceable, pitiable –care, wit  careless, witless Adjective  adverb –happy  happily Process selective in unpredictable ways –Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness –Meanings of derived terms harder to predict by rule clueless, careless, nerveless, sleepless

Derivation can be applied recursively: –Hospital  hospitalize  hospitalization  prehospitalization  … –Morphological analysis identifies concatenative processes as well as morphemes [pre[[[hospital]ize]ation]] –But there are bracketing paradoxes unhappier [un[happier]: not happier [[unhappy]er]: more unhappy

Compounding Two base forms join to form a new word –Bedtime, Weinerschnitzel, Rotwein –Careful? Compound or derivation?

Affixes can be attached to stems in different ways –Prefixation Immaterial –Suffixation: more common across languages than prefixation Trying –Circumfixation: combine prefixation and suffixation Gesagt

–Infixation English: Absobl**dylutely Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red)  kumilad (to be red))

Concatenative vs. Non-concatenative Morphology Semitic root-and-pattern morphology –Root (2-4 consonants) conveys basic semantics (e.g. Arabic /ktb/) –Vowel pattern conveys voice and aspect –Derivational template (binyan) identifies word class

TemplateVowel Pattern activepassive CVCVCkatabkutibwrite CVCCVCkattabkuttibcause to write CVVCVCka:tabku:tibcorrespond tVCVVCVCtaka:tabtuku:tib write each other nCVVCVCnka:tabnku:tibsubscribe CtVCVCktatabktutibwrite stVCCVCstaktabstuktibdictate

Morphotactics What are the ‘rules’ for constructing a word in a given language? –Pseudo-intellectual vs. *intellectual-pseudo –Rational-ize vs *ize-rational –Cretin-ous vs. *cretin-ly vs. *cretin-acious Possible ‘rules’ –Suffixes are suffixes and prefixes are prefixes –Certain affixes attach to certain types of stems (nouns, verbs, etc.) –Certain stems can/cannot take certain affixes

Semantics: In English, un- cannot attach to adjectives that already have a negative connotation: –Unhappy vs. *unsad –Unhealthy vs. *unsick –Unclean vs. *undirty Phonology: In English, -er cannot attach to words of more than two syllables –great, greater –Happy, happier –Competent, *competenter –Elegant, *eleganter –Unruly, ?unrulier

Morphological Parsing These regularities enable us to create software to parse words into their component parts –Known words and new ones (e.g. Pneumonoultramicroscopicsilicovolcanoconiosi s, Columbianize, Columbianization)

Morphological Representations: Evidence from Human Performance Hypotheses: –Full listing hypothesis: words listed –Minimum redundancy hypothesis: morphemes listed Experimental evidence: –Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither –Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)

–But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart) Speech errors suggest affixes must be represented separately in the mental lexicon –‘easy enoughly’ for ‘easily enough’

Summing Up Different languages have different morphological systems –If we can discover how to decode such a system, we can identify useful information about the word class and the semantic meaning of a word –Morphological regularities provide basis for building (automatic) morphological analyzers Next time: Read Ch –HW1 will be assigned (check the course syllabus and courseworks)

Announcements HW1 will now be due 9/25/07 WICS lunch tomorrow at noon in the CS Lounge, 452 MUDD (rsvp to