Multiword Expressions: A Pain in the Neck for NLP Emad Soliman Mohamed Nawfal Department of Linguistics.

Slides:



Advertisements
Similar presentations
ADGEN USC/ISI ADGEN: Advanced Generation for Question Answering Kevin Knight and Daniel Marcu USC/Information Sciences Institute.
Advertisements

Multiword Expressions Presented by: Bhuban Seth ( )Somya Gupta ( )Advait Mohan Raut ( )Victor Chakraborty ( ) Under the guidance.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Cognitive Linguistics Croft & Cruse 9
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Yun-Pi Yuan 1 Linguistics DISCUSSION 3. Yun-Pi Yuan 2 Q1: The textbook and lecture discuss language and sex mainly in relation to English. Discuss language.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Sag et al., Chapter 4 Complex Feature Values 10/7/04 Michael Mulyar.
Language Special form of communication in which we learn complex rules to manipulate symbols that can be used to generate an endless number of meaningful.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Second Language Acquisition and Real World Applications Alessandro Benati (Director of CAROLE, University of Greenwich, UK) Making.
Semantics and Lexicology Generativist semantics. From structuralist semantics Semantic features, components.
Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Cultural mistakes in English language Feofanova Varvara Grade 9 School 852 Moscow, Zelenograd.
9/8/20151 Natural Language Processing Lecture Notes 1.
Communicative Language Teaching Vocabulary
LING 388: Language and Computers Sandiway Fong Lecture 27.
Unit 6: The Culture of Communication
Computational Linguistics INTroduction
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Past Tense of Regular Verbs Tips for Teaching Grammar Focus on the grammar that is presented in the communication that is taking place. Students are.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Collecting primary data: use of questionnaires Lecture 20 th.
Time, Tense and Aspect Rajat Kumar Mohanty Centre For Indian Language Technology Department of Computer Science and Engineering Indian.
An ICALL writing support system tunable to varying levels of learner initiative Karin Harbusch 1 & Gerard Kempen 2,3 1 University of Koblenz-Landau, Koblenz,
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Head-driven Phrase Structure Grammar (HPSG)
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
ICS 482: Natural language Processing Pre-introduction
Natural Language Processing Menu Based Natural Language Interfaces -Kyle Neumeier.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic By: Mohammed A. Attia Abbas Al-Julaih Natural Language Processing ICS.
Idiomaticity and Translation in the Context of Contemporary Applied Linguistics. Zinaida Camenev, doctor conferenţiar, ULIM, Chişinău,Moldova Olga Pascari,
Introduction to Communicative Language Teaching Zhang Lu.
The Unreasonable Effectiveness of Data
Morphology and Syntax- Week 5
 2003 CSLI Publications Ling 566 Oct 17, 2011 How the Grammar Works.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
In this lecture, we will learn about: Translation.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
Q: What is your definition of “knowing a word”?. Knowing a word means… Knowing how often it occurs, the company it keeps, its appropriateness in different.
Reading and Frequency Lists
Natural Language Processing (NLP)
How Do We Translate? Methods of Translation The Process of Translation.
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
An ICALL writing support system tunable to varying levels
Vocabulary List #1 Aspects of Language.
Ling 566 Oct 14, 2008 How the Grammar Works.
Introduction to Semantics
Natural Language Processing (NLP)
Statistical NLP : Lecture 9 Word Sense Disambiguation
Natural Language Processing (NLP)
Presentation transcript:

Multiword Expressions: A Pain in the Neck for NLP Emad Soliman Mohamed Nawfal Department of Linguistics

Words jargon grudge hours in computer linguistics. Without a solution to this problem, the area was radically comfort and does not stop hunger.

NO, this is not how I speak English. This is the Google translation of the Arabic sentence العبارات الاصطلاحية غصة في حلق اللغويات الحاسوبية, وبدون حل هذه المشكلة حلا جذريا فإن المجال لا يسمن ولا يغني من جوع. This should have been translated as: Multiword expressions are a pain in the neck for Computational Linguistics. Without a radical solution, the field is simply useless.

So, what's going on? ● Multiword expressions ---> words jargon ● Pain in the neck ---> grudges hours ● Useless ---> radically comfort and does not stop hunger. ● The literal meaning of the Arabic for the last thing is: The field will neither fatten you nor even drive away your hunger. ● Everyone is invited to think about what the problems are.

What are MWEs and why are they important? ● MWEs are those expressions whose meanings cross word boundaries: The English idiom “kick the bucket” means to die. It has nothing to do with kicking or with buckets. MWEs pose a threat to the principle of compositionality. ● In the lexical database WordNet 1.7, 41% of the entries are multiword. ● This is still an underestimation as specialized domain vocabulary overwhelmingly consists of MWEs

What kinds of problems might be involved? ● If MWEs are treated by general compositional methods of linguistic analysis, this might lead to overgeneration and the idiomaticity problem. ● Overgeneration: The system will correctly generate telephone booth or telephone box, but might also generate such perfectly compositional, but unacceptable examples as telephone cabinet, telephone closet, etc. ● The idiomaticity Problem: How to predict that an expression like kick the bucket, which appears to conform to the grammar of English VPs, has a meaning unrelated to the meanings of kick, the, and bucket.

Is there a way out? ● Statistical as well as linguistic models are being explored. ● The LINGO project of Stanford University is employing a linguistic technique within the HPSG formalism. ● Kick the bucket and Part of Speech, which have one word that inflects looks like this: ● part_of_speech_1 := intr_noun_l & ● [ STEM, ● INFL-POS "1", ● SEMANTICS [KEY part_of_speech_rel ]].

Is there a way out? ● Sag suggests that the linguistic rules should be used in combination with frequency information about both semantic relations and construction rules, in so far as they contribute to semantic interpretation. ● Sag also mentions a potentially viable approach (by Johnson et al. (1999)) to developing probabilistic grammars based on feature structures e.g. Head-driven Phrase Structure Grammar and Lexical Functional grammar.

What else? ● Some researchers are using techniques from Bioinformatics to solve the idiom and collocation problems. ● Instead of finding Amino Acids, the algorithms can be used to find related words and phrases, even when they are separated by other linguistic units. ● This is useful esp that Bioinformatics algorithms take mutations into considerations.

Finally ● Thank you for listening. ● This presentation is based mainly on information from: ● ml ml ● th574/Projects/AndyNLP.pdf ● translate.google.com