Vector Semantics Introduction.

Slides:



Advertisements
Similar presentations
Text Databases Text Types
Advertisements

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Latent Semantic Analysis
Hinrich Schütze and Christina Lioma
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Tirgul 9 Amortized analysis Graph representation.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Probabilistic Latent Semantic Analysis
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Vector Semantics Introduction. Dan Jurafsky Why vector models of meaning? computing the similarity between words “fast” is similar to “rapid” “tall” is.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Review of Matrices Or A Fast Introduction.
1 CS 430: Information Discovery Lecture 16 Thesaurus Construction.
Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
Lecture 22 Word Similarity Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarityReadings: NLTK book Chapter.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Link Distribution on Wikipedia [0407]KwangHee Park.
Vector Semantics.
Vector Semantics Dense Vectors.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Project 1 to 3. Project 1 (10 pts) (use the word document to enter results and answers – Save this file as Lname_BUA350_Cohort#_projects#.doc Go to Total.
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
From Frequency to Meaning: Vector Space Models of Semantics
Vector Semantics. Dan Jurafsky Why vector models of meaning? computing the similarity between words “fast” is similar to “rapid” “tall” is similar to.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Distributed Representations for Natural Language Processing
Dimensionality Reduction and Principle Components Analysis
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Document Clustering Based on Non-negative Matrix Factorization
Semantic Processing with Context Analysis
Intro to NLP and Deep Learning
Word Embeddings and their Applications
Word Meaning and Similarity
CSC 594 Topics in AI – Natural Language Processing
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Thesaurus Creation/ Term Clustering
HCC class lecture 13 comments
Word Embedding Word2Vec.
Lecture 22 Word Similarity
Vector Representation of Text
Word2Vec.
Word embeddings (continued)
Introduction to Sentiment Analysis
Statistical NLP : Lecture 9 Word Sense Disambiguation
Vector Representation of Text
Latent Semantic Analysis
Presentation transcript:

Vector Semantics Introduction

Why vector models of meaning? computing the similarity between words “fast” is similar to “rapid” “tall” is similar to “height” Question answering: Q: “How tall is Mt. Everest?” Candidate A: “The official height of Mount Everest is 29029 feet”

Word similarity for plagiarism detection

Word similarity for historical linguistics: semantic change over time Sagi, Kaufmann Clark 2013 Kulkarni, Al-Rfou, Perozzi, Skiena 2015

Problems with thesaurus-based meaning We don’t have a thesaurus for every language We can’t have a thesaurus for every year For historical linguistics, we need to compare word meanings in year t to year t+1 Thesauruses have problems with recall Many words and phrases are missing Thesauri work less well for verbs, adjectives

Distributional models of meaning = vector-space models of meaning = vector semantics Intuitions: Zellig Harris (1954): “oculist and eye-doctor … occur in almost the same environments” “If A and B have almost identical environments we say that they are synonyms.” Firth (1957): “You shall know a word by the company it keeps!”

Intuition of distributional word similarity Nida example: Suppose I asked you what is tesgüino? A bottle of tesgüino is on the table Everybody likes tesgüino Tesgüino makes you drunk We make tesgüino out of corn. From context words humans can guess tesgüino means an alcoholic beverage like beer Intuition for algorithm: Two words are similar if they have similar word contexts.

Four kinds of vector models Sparse vector representations Mutual-information weighted word co-occurrence matrices Dense vector representations: Singular value decomposition (and Latent Semantic Analysis) Neural-network-inspired models (skip-grams, CBOW) Brown clusters

Shared intuition Model the meaning of a word by “embedding” in a vector space. The meaning of a word is a vector of numbers Vector models are also called “embeddings”. Contrast: word meaning is represented in many computational linguistic applications by a vocabulary index (“word number 545”) Old philosophy joke: Q: What’s the meaning of life? A: LIFE’

Words and co-occurrence vectors Vector Semantics Words and co-occurrence vectors

Co-occurrence Matrices We represent how often a word occurs in a document Term-document matrix Or how often a word occurs with another Term-term matrix (or word-word co-occurrence matrix or word-context matrix)

Term-document matrix Each cell: count of word w in a document d: Each document is a count vector in ℕv: a column below

Similarity in term-document matrices Two documents are similar if their vectors are similar

The words in a term-document matrix Each word is a count vector in ℕD: a row below

The words in a term-document matrix Two words are similar if their vectors are similar

The word-word or word-context matrix Instead of entire documents, use smaller contexts Paragraph Window of ± 4 words A word is now defined by a vector over counts of context words Instead of each vector being of length D Each vector is now of length |V| The word-word matrix is |V|x|V|

Word-Word matrix Sample contexts ± 7 words … …

Word-word matrix We showed only 4x6, but the real matrix is 50,000 x 50,000 So it’s very sparse Most values are 0. That’s OK, since there are lots of efficient algorithms for sparse matrices. The size of windows depends on your goals The shorter the windows , the more syntactic the representation ± 1-3 very syntacticy The longer the windows, the more semantic the representation ± 4-10 more semanticy

2 kinds of co-occurrence between 2 words (Schütze and Pedersen, 1993) First-order co-occurrence (syntagmatic association): They are typically nearby each other. wrote is a first-order associate of book or poem. Second-order co-occurrence (paradigmatic association): They have similar neighbors. wrote is a second- order associate of words like said or remarked.