Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)
Learning Word Representations from Scarce and Noisy Data with Embedding Sub-spaces 1Introduction 2Theory 3Results
Introduction Unsupervised word embedding for scarce and noisy data 1
Abstract A technique to adapt unsupervised word embeddings to specific applications, when only small and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model parameters, and then use the labeled data to tailor them for the intended task. But this approach is prone to overfitting when the training is performed with scarce and noisy data. To overcome this issue here the supervised data to find an embedding subspace that fits the task complexity. All the word representations are adapted through a projection into this task-specific sub-space. This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-the-art results.
2 Theory
Unsupervised Structured Skip-Gram
Adapting Embedding with Sub-space Projections Word embeddings are useful unsupervised techniques to attain initial model values or features prior to supervised learning. These models can be then retrained using the available labeled data. Embedding provide a compact real valued representations of each word in a vocabulary. Even then the total number of parameters in the model can be rather high. Very often a small amount of supervised data is available which can lead to severe overfitting. Even if regularization is used to reduce the overfitting only a reduced subset of words will actually be present in the labeled dataset. Words not seen during training will never get their embeddings updated. In the following slides, simple solution to this problem is explained.
Embedding Sub-space
Non-Linear Sub-space Embedding Model The concept of embedding sub-space can be applied to log-linear classifiers or deep learning architecture that uses embeddings. The NLSE can be interpreted as a simple feed-forward neural network model with one single hidden layer utilizing the embedding sub-space approach.
3 Results
Twitter Sentiment Analysis Average F-measure on the SemEval test sets varying with embedding sub-space size s. Sub-space size 0 used to denote the baseline (log-linear model)
Comparison of two baselines with two variations Performance of state-of-the-art systems for Twitter sentiment prediction
Thank You