Word embeddings based mapping Raymond ZHAO Wenlong (Updated on 15/08/2018 )
Word embeddings Vector space models represent words using low-fixed-dim vector Try to capture word relations via inner products Can group semantically similar words, and encode rich linguistic patterns( like word2vec (Mikolov et al., 2013) or GloVe (Pennington et al., 2014)) To apply vector model to sentence / doc, one must select an appro composition fuction
A typical NN Model A composition function g + classifier (on final representation) Unordered functions: treat input texts as bags of word embeddings Syntactic functions take word order and sentence structure into account - like NN ( CNN/RNN, g depends on a parse tree of the input sequence) Composition function is a math process for combining multiple words into a single vector Syntactic functions require more training time for huge datasets - RNN - computer a syntactic parse tree A deep unordered model Apply a composition function g to the sequence of word embeddings Vw The output is a vector z that servers as input to a logistic regression function Syntactic functions - g depend on a parse tree of the input sequence
- A deep unordered model SWEB Model - A deep unordered model By Duke University 2018, ACL Source code is on github Could obtains near state-of-the-art accuracies on sentence and document-level tasks
Paper’s result Document-level classification Dataset: Yahoo! Ans. and AG News SWEM model exhibits stronger performances, relative to both LSTM and CNN compositional architectures Marry the speed of unordered functions with the accuracy of syntactic functions Computational efficient - fewer parameters
Paper’s result Sentence-level task SWEM yields inferior accuracies Approximate 20 words on average
Simple word-embedding model SWEM-aver: take the information of each sequence into account via the addition operation ( take the info of each word) Max pooling: extract the most salient features (get the info of key words) SWEM-concat SWEM-hier: swem-aver on a local window, then a global max-pooling for each window (like n-grams)
The experiments SWEM-aver using Keras Current baseline model Use our amazon review texts ( 830k texts and 19.8k unique tokens) Use pre-trained Glove word embeddings (a dataset of 1B tokens) Current baseline model - multiclass logistic regression - activation =’sigmoid’ + loss = ‘categorical_crossentropy’ Current Accuracy on cpu classification: 0.6106 On Keras/Tensorflow Multi-class classification
The experiments On SWEM-aver Alg using Keras - The current experiments on RAM, Screen Size, Hard Disk and Graphics Coprocessor Configurator
The experiments On SWEN-max Alg - The current experiments on RAM, Screen Size, Hard Disk and Graphics Coprocessor Configurator
The experiments On SWEN-con Alg Concatenate SWEM-aver and SWEM-max together Remove punctuation (a bit improvements)
The experiments - todo Try to use SWEM-hier alg Try to use SVM/CRF classifiers Currently use multiclass logistic regression Try to use topic model for short Texts
Thanks Thanks Dr. Wong