Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.

Slides:



Advertisements
Similar presentations
Growth-rate Functions
Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Order Statistics Sorted
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Joseph Xu Soar Workshop Learning Modal Continuous Models.
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Distributed Representations for Natural Language Processing
An Introduction to Triple Scoring (WSDM Cup T2)
CMPT 438 Algorithms.
Sivan Biham & Adam Yaari
Zheng ZHANG 1-st year PhD candidate Group ILES, LIMSI
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Deep Feedforward Networks
Deep Learning for Bacteria Event Identification
Statistical NLP: Lecture 7
Deep learning David Kauchak CS158 – Fall 2016.
Lecture – 2 on Data structures
Intro to NLP and Deep Learning
CSCI 5922 Neural Networks and Deep Learning Language Modeling
Enhancing User identification during Reading by Applying Content-Based Text Analysis to Eye- Movement Patterns Akram Bayat Amir Hossein Bayat Marc.
Vector-Space (Distributional) Lexical Semantics
mengye ren, ryan kiros, richard s. zemel
Efficient Estimation of Word Representation in Vector Space
Random walk initialization for training very deep feedforward networks
Word2Vec CS246 Junghoo “John” Cho.
Image Question Answering
Neural Language Model CS246 Junghoo “John” Cho.
Distributed Representation of Words, Sentences and Paragraphs
Jun Xu Harbin Institute of Technology China
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
Deep Visual-Semantic Alignments for Generating Image Descriptions
Word Embedding Word2Vec.
Introduction Task: extracting relational facts from text
Creating Data Representations
Word embeddings based mapping
Word embeddings based mapping
Milton King, Waseem Gharbieh, Sohyun Park, and Paul Cook
RCNN, Fast-RCNN, Faster-RCNN
Hubs and Authorities & Learning: Perceptrons
Vector Representation of Text
Word2Vec.
Word embeddings (continued)
Word2vec推导 北京大学 苑斌.
Presented by: Anurag Paul
From Unstructured Text to StructureD Data
Word representations David Kauchak CS158 – Fall 2016.
Natural Language Processing Is So Difficult
Vector Representation of Text
CS249: Neural Language Model
Professor Junghoo “John” Cho UCLA
Presentation transcript:

Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu

Roadmap Overview The Skip-gram Model with Different Objective Functions Subsampling of Frequent Words Learning Phrases

CNN for Text Classification

Word2vec: Google’s Word Embedding Approach What is word2vec? Word2vec turns text into a numerical form that deep nets can understand. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. Assumption of word2vec Word2vec assumes that if two words always appear together then their word embeddings should be similar(cosine similarity).

Skip-gram Model An efficient method for learning high quality vector representations of words from large amounts of unstructured text data

Skip-gram Model Objective: To maximize the average log probability While, p(wt+j|Wt) is defined by softmax

Skip-gram Model Computationally Efficient Approximations Hierarchical Softmax

Skip-gram Model Computationally Efficient Approximations Negative Sampling

Subsampling of Frequent Words In very large corpora, the most frequent words can easily occur hundreds of millions of times (e.g.,“in”, “the”, and “a”). Such words usually provide less information value than the rare words.

Vector Representations of Words Vec(“Paris”) - Vec(“France”) ≈ Vec(“Berlin”) – Vec(“Germany”)

Vector Representations of Words Analogical Reasoning task Semantic analogies “Germany” : “Berlin” :: “France” : ? Syntactic analogies “quick” : “quickly” :: “slow” : ?

Vector Representations of Words The word and phrase representations learned by the Skip-gram model exhibit a linear structure that makes it possible to perform precise analogical reasoning using simple vector arithmetic

Learning Phrases Bigrams Run 2-4 passes to get longer phrases

Learning Phrases

Additive Compositionality The Skip-gram representations exhibit another kind of linear structure that makes it possible to meaningfully combine words by an element-wise addition of their vector representations

Comparison to Published Word Representations

Any Questions?