Download presentation
Presentation is loading. Please wait.
Published byJulia Diana Bailey Modified over 9 years ago
1
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009
2
2 Research Gap Opinion mining has drawn much attention recently –Sentiment classification (POS, NEG, NEU) –Subjectivity analysis (subjective, objective) Annotated corpora are most important for training However, most of them are English data Corpora for other languages, including Chinese, are rare
3
3 Related Work Pilot studies on cross-lingual subjectivity classification Mihalcea et al. ACL 2007 –Bilingual lexicon and manually translated parallel corpus Banea et al. EMNLP 2008 –English annotation tool + MT –Build Romanian annotation tool –Not much loss compared to human translation –Suggesting MT is a viable way
4
4 Problem Definition Perform cross-lingual sentiment classification –Either positive or negative Source: English Target: Chinese Leverage –8000 Labeled English product reviews –1000 Unlabeled Chinese product reviews –Machine translation (MT) Derive –Sentiment classification tools for Chinese product reviews
5
5 Framework Training Phase Classification Phase
6
6 Training Phase (1) Machine Translation
7
7 Two Views Chinese ViewEnglish View
8
8 Training Phase (2) The Co-Training Approach English View
9
9 Label the unlabeled data (English) English Classifier with SVM Label E en Top p positive Top n negative most confident review
10
10 Label the unlabeled data (Chinese) Chinese Classifier with SVM E cn Top p positive Top n negative most confident review Label
11
11 Remove from Unlabeled Data Finish one Iteration E en Top p positive Top n negative most confident review E cn Top p positive Top n negative most confident review ∪ Train again
12
12 Setting #Iteration = 40 p = n = 5
13
13 Classification Phase Chinese Classifier English Classifier average [-1, 1]
14
14 Experiment Setting (Training) 8000 Amazon product reviews. 4000 positive 4000 negative Books, DVDs, electronics 1000 product reviews from www.it168.com mp3 player, mobile phones, DC
15
15 Experiment Setting (Testing) 886 Chinese product reviews from www.it168.com –451 positive, 435 negative –Different from unlabeled training data (outside testing)
16
16 Baseline SVM –Use only labeled data TSVM (Transductive SVM) –Joachims, 1999 –Use both labeled and unlabeled
17
17 SVM Baselines SVM(EN) SVM(CN)
18
18 SVM Baselines SVM(ENCN1)
19
19 SVM Baselines SVM(ENCN2) average
20
20 TSVM Baselines TSVM(EN) TSVM(CN)
21
21 TSVM Baselines TSVM(ENCN1)
22
22 TSVM Baselines TSVM(ENCN2) average
23
23 Result: Method Comparison (1)
24
24 Result: Method Comparison (2) Performance on Each Side SVM(EN) TSVM(EN) CoTrain(EN)
25
25 Result: Method Comparison (3) Accuracy SVM(EN)0.738 TSVM(EN)0.769 CoTrain(EN)0.790 Accuracy SVM(CN)0.771 TSVM(CN)0.767 CoTrain(CN)0.775 CoTrain make better use of unlabeled Chinese reviews than TSVM
26
26 Result: Iteration Number Outperform TSVM(ENCN2) after 20 iterations
27
27 Result: Balance of (p,n) Unbalanced examples hurt the performance badly
28
28 Conclusion & Comment Co-Training approach for cross-lingual sentiment classification Future Work –Translated and natural text have different feature distribution –Domain adaptation algorithm (ex. structural correspondence learning) for linking them
29
29 Comment Leverage word (phrase) alignment in translated text
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.