Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.

Slides:



Advertisements
Similar presentations
Sort [noun] type, kind. What sort of bird is that? [adverb] a little, more or less: she’s sort of strange. [verb] arrange things. Can you sort these words.
Advertisements

STUDY SUPPORT SKILLS PARTS OF SPEECH: ADJECTIVES.
Cognitive Linguistics Croft & Cruse 9
1 Analysing and teaching meaning Analysing and teaching meaning SSIS Lazio - Lesson 2 prof. Hugo Bowles January 2007.
Natural Language Processing COLLOCATIONS Updated 16/11/2005.
Outline What is a collocation?
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 Collocation and translation MA Literary Translation- Lesson 2 prof. Hugo Bowles February
MODULE 2 Meaning and discourse in English
Fall 2001 EE669: Natural Language Processing 1 Lecture 5: Collocations (Chapter 5 of Manning and Schutze) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Specialized Dictionaries I Dictionaries of Collocations
Slide 1 Analyzing Patterns of Missing Data While SPSS contains a rich set of procedures for analyzing patterns of missing data, they are not included in.
Collocations 09/23/2004 Reading: Chap 5, Manning & Schutze (note: this chapter is available online from the book’s page
Measuring Social Life Ch. 5, pp
Lesson 16 Lexical semantics 2 (collocation)
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Albert Gatt Corpora and Statistical Methods – Part 2.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
1 INFINITIVE LAY SENGHOR LAY SENGHOR.
Language and Thought.
Statistical Natural Language Processing Diana Trandabăț
Developing Student Vocabulary: Fun Ways to Learn Words Katie Bain
1 Introduction to Natural Language Processing ( ) Words and the Company They Keep AI-lab
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Vocabulary connections
1 COMP791A: Statistical Language Processing Collocations Chap. 5.
10/04/1999 JHU CS /Jan Hajic 1 *Introduction to Natural Language Processing ( ) Words and the Company They Keep Dr. Jan Hajič CS Dept., Johns.
Vocabulary connections:multi- word items in English.
Albert Gatt Corpora and Statistical Methods. In this lecture Corpora and Statistical Methods We have considered distributions of words and lexical variation.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
1 Natural Language Processing (3b) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
1 Natural Language Processing (5) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
1 Statistical NLP: Lecture 7 Collocations (Ch 5).
CSNB143 – Discrete Structure Topic 11 – Language.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Vocabulary connections: multi- word items in English Orietta Gutiérrez Herrera.
Lecture # 11.  Language made of signs  Linguistic sign has two parts – Signifier & Signified  That which signifies (the word) – Signifier  That which.
1 Natural Language Processing (3a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Collocation 발표자 : 이도관. Contents 1.Introduction 2.Frequency 3.Mean & Variance 4.Hypothesis Testing 5.Mutual Information.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Lexicology 4 – Semantics (cont.) Katarína Veselá 2008.
Structural Levels of Language Lecture 1. Ferdinand de Saussure  "Language is a system sui generis “ = a system where everything holds together  The.
MULTI-WORD ITEMS IN ENGLISH COLLOCATIONS RESTRICTED COLLOCATION WHERE CERTAIN WORDS OCCUR ALMOST ENTIRELY IN THE CO-TEXT OF ONE OR TWO OTHER WORDS IS A.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
Communicative and Academic English for the EFL Professional.
COLLOCATIONS He Zhongjun Outline Introduction Approaches to find collocations Frequency Mean and Variance Hypothesis test Mutual information.
By: Felipe Ortiz Bryanna Leal.  Subjects  Predicates  Objects  Modifiers  Phrases  Clauses.
Levels of Linguistic Analysis
Phrasal verbs.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
An Introduction to Semantic Parts of Speech Rajat Kumar Mohanty rkm[AT]cse[DOT]iitb[DOT]ac[DOT]in Centre for Indian Language Technology Department of Computer.
COGS Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 8 Collocations.
2. The standards of textuality: cohesion Traditional approach to the study of lannguage: sentence as conventional object of study Structuralism (Bloofield,
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Collocations David Guy Brizan Speech and Language Processing Seminar 26 th October, 2006.
LINGUA INGLESE 1 modulo A/B Introduction to English Linguistics prof. Hugo Bowles Lesson 15 Collocation.
Statistical NLP: Lecture 7
Collocation
Parts of Speech Copy all text that is highlighted in RED in your notebook like we started in class. All other information may be copied or just reviewed.
Vocabulary connections: multi-word items in English
Many slides from Rada Mihalcea (Michigan), Paul Tarau (U.North Texas)
How Do We Translate? Methods of Translation The Process of Translation.
How to use a dictionary effectively
Levels of Linguistic Analysis
Presentation transcript:

Collocations

Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic approaches 3: chi-square and mutual information

What is a Collocation? A COLLOCATION is an expression consisting of two or more words that correspond to some conventional way of saying things. The words together can mean more than their sum of parts (The Times of India, disk drive) Previous examples: hot dog, mother in law Examples of collocations noun phrases like strong tea and weapons of mass destruction phrasal verbs like to make up, and other phrases like the rich and powerful. Valid or invalid? a stiff breeze but not a stiff wind (while either a strong breeze or a strong wind is okay). broad daylight (but not bright daylight or narrow darkness).

Criteria for Collocations Typical criteria for collocations: non-compositionality non-substitutability non-modifiability. Collocations usually cannot be translated into other languages word by word. A phrase can be a collocation even if it is not consecutive (as in the example knock . . . door).

Non-Compositionality A phrase is compositional if the meaning can be predicted from the meaning of the parts. E.g. new companies A phrase is non-compositional if the meaning cannot be predicted from the meaning of the parts E.g. hot dog Collocations are not necessarily fully compositional in that there is usually an element of meaning added to the combination. Eg. strong tea. Idioms are the most extreme examples of non-compositionality. Eg. to hear it through the grapevine.

Non-Substitutability We cannot substitute near-synonyms for the components of a collocation. For example We can’t say yellow wine instead of white wine even though yellow is as good a description of the color of white wine as white is (it is kind of a yellowish white). Many collocations cannot be freely modified with additional lexical material or through grammatical transformations (Non-modifiability). E.g. white wine, but not whiter wine mother in law, but not mother in laws

Linguistic Subclasses of Collocations Light verbs: Verbs with little semantic content like make, take and do. E.g. make lunch, take easy, Verb particle constructions E.g. to go down Proper nouns E.g. Bill Clinton Terminological expressions refer to concepts and objects in technical domains. E.g. Hydraulic oil filter

Principal Approaches to Finding Collocations How to automatically identify collocations in text? Simplest method: Selection of collocations by frequency Selection based on mean and variance of the distance between focal word and collocating word Hypothesis testing Mutual information

Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic approaches 3: chi-square and mutual information

Frequency Find collocations by counting the number of occurrences. Need also to define a maximum size window Usually results in a lot of function word pairs that need to be filtered out. Fix: pass the candidate phrases through a part of-speech filter which only lets through those patterns that are likely to be “phrases”. (Justesen and Katz, 1995)

Collocational Window Many collocations occur at variable distances. A collocational window needs to be defined to locate these. Frequency based approach can’t be used. she knocked on his door they knocked at the door 100 women knocked on Donaldson’s door a man knocked on the metal front door