Cross-lingual Models of Word Embeddings: An Empirical Comparison

Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay Manaal Faruqui Chris Dyer Dan Roth

Cross-lingual Embeddings
Vectors in L1 monde world law loi market marche life vie enfants children guerre war Vectors in L2 country pays argent money energy peace paix energie

Works on Cross-Lingual Embeddings
(2012) Klementiev et al. CoLING (2013) Zou et al. EMNLP, Mikolov et al. Arxiv (2014) Hermann et al. ACL, Faruqui et al. EACL, Kocisky et al. ACL, Chandar et al. NIPS (2015) CamachoCollados et al. ACL, Lu et al. NAACL, Luong et al. NAACL, Guows et al. ICML, Ishiwatari et al CoNLL, Shi et al, Guo et al, Gardner et al. EMNLP, Coulmance et al. EMNLP, Vulic et al. ACL (2016) Already a few papers in NAACL, ACL. Which approach is best suited for my task?

Overview of The Talk General Schema Forms of Cross-lingual Supervision
General Algorithm Comparison Setup Results Conclusion

Initial embedding (Optional) W Initial embedding (Optional) V
General Schema Vectors in L1 Initial embedding (Optional) W Cross-lingual Supervision L1 and L2 Cross-lingual Word Vector Model Vectors in L2 Initial embedding (Optional) V

Forms of Cross-lingual Supervision
Decreasing Cost (You, t’) (Love, aime) (I, je) Bonjour! Je t’ aime Hello! How are you? I love you Je I t’ aime love You Je t’ aime I love you word + sentence sentence document word BiSkip Luong et al. 15 BiCVM Hermann et al. 14 BiCCA Faruqui et al. 14 BiVCD Vulic et al. 15

General Algorithm Vectors for L2 Mono. Obj. of L1 Mono. Obj. of L2
Cross-lingual Obj. Vectors for L1 All cross-lingual embedd models can be viewed as instances of the above general algorithm. The algo optimizes an objective containing monolingual terms A and B for each language, and a cross lingual interaction term C, coupling the embeddings across langauges. The output is the cross lingually trained embeddings. We can show how to reduce the 4 models described earlier using different choices of A.B and C For eg. for the BiSkip models, A and B are monoling skip-gram objective in each language and C is sum of two cross lingual skipgram terms D12 and D21, one from each direction. We omit the details of choice of ABC for other models due to time constraints.

Comparison Setup We compare on
Mono-lingual Word Similarity Cross-lingual Dictionary Induction Cross-lingual Document Classification Cross-lingual Dependency Parsing All models trained using parallel data for 4 languages – DE, FR, SV, ZH We select parameters by picking the setting which did best on an average across all tasks. Through our comparison, we aim to show which model is suitable for particular task. We choose 4 tasks Where the first two -- monolingual WS and CL Dict – are intrinsic evaluations. While the last two– CL Doc. Classification and CL Dep. Parsing – are extrinsic. All models were trained on parallel data for 4 languages and parameters were chosen to maximize average performance on all tasks.

Mono-lingual Word Similarity
Are English embeddings obtained from cross-lingual training better than those obtained from monolingual training? Evaluated using Simlex a standard word similarity dataset Qvec - an intrinsic embedding evaluation, which is shown to correlate with linguistic information.

Mono-lingual Word Similarity
Decreasing Cost of Supervision Decreasing Cost of Supervision Performance does not correlate with performance in later downstream applications.

Cross-lingual Dictionary Induction
For a word in English, find its top-10 neighbors in the foreign language. Evaluate neighbors against possible translations (per a gold dictionary). Gold Dictionaries induced using aligned synsets from Multilingual Wordnet (Bond and Foster 2013). past baisse passe accepter ….... … Gold Dictionary (white, blanc) … (past, passe) (watch, garde) (school, école)

Results Decreasing Cost of Supervision Performance improves with cost of supervision, with gaps > 10 pts b/w some models.

Cross-lingual Document Classification
Vectors in L1 Vectors in L2 Document classification model Train the Classifier Documents in L2 Test the Classifier Labeled Documents in L1 Training on L1 Testing on L2

When transferring semantic knowledge across languages,
Results Decreasing Cost of Supervision Decreasing Cost of Supervision When transferring semantic knowledge across languages, sentence + word alignment information is superior to sentence or word alignment alone.

Cross-Lingual Dependency Parsing
Vectors in L1 Vectors in L2 Train the Parser Parsing model Treebank in L2 Test the Parser Treebank in L1 We also consider the case when L1=L2 Training on L1 Testing on L2

Results Word-Level Alignment Word-Level Alignment When transferring syntactic knowledge across languages, using word alignments for training embeddings is crucial.

Conclusion Comparison of 4 representative models on several tasks.
Provided insight into relation between type of application and required form of supervision. Supervision with word-level and sentence-level alignment (almost) always superior to word-level or sentence-level alignment alone for semantic tasks. Supervision with word alignment crucial to performance for syntactic tasks. Our experimental setup is modular and easy to use. Vectors and scripts available at github.com/shyamupa/biling-survey

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Similar presentations

Presentation on theme: "Cross-lingual Models of Word Embeddings: An Empirical Comparison"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Similar presentations

Presentation on theme: "Cross-lingual Models of Word Embeddings: An Empirical Comparison"— Presentation transcript:

Similar presentations

About project

Feedback