1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009

2 Research Gap Opinion mining has drawn much attention recently –Sentiment classification (POS, NEG, NEU) –Subjectivity analysis (subjective, objective) Annotated corpora are most important for training However, most of them are English data Corpora for other languages, including Chinese, are rare

3 Related Work Pilot studies on cross-lingual subjectivity classification Mihalcea et al. ACL 2007 –Bilingual lexicon and manually translated parallel corpus Banea et al. EMNLP 2008 –English annotation tool + MT –Build Romanian annotation tool –Not much loss compared to human translation –Suggesting MT is a viable way

4 Problem Definition Perform cross-lingual sentiment classification –Either positive or negative Source: English Target: Chinese Leverage –8000 Labeled English product reviews –1000 Unlabeled Chinese product reviews –Machine translation (MT) Derive –Sentiment classification tools for Chinese product reviews

5 Framework Training Phase Classification Phase

6 Training Phase (1) Machine Translation

7 Two Views Chinese ViewEnglish View

8 Training Phase (2) The Co-Training Approach English View

9 Label the unlabeled data (English) English Classifier with SVM Label E en Top p positive Top n negative most confident review

10 Label the unlabeled data (Chinese) Chinese Classifier with SVM E cn Top p positive Top n negative most confident review Label

11 Remove from Unlabeled Data Finish one Iteration E en Top p positive Top n negative most confident review E cn Top p positive Top n negative most confident review ∪ Train again

12 Setting #Iteration = 40 p = n = 5

13 Classification Phase Chinese Classifier English Classifier average [-1, 1]

14 Experiment Setting (Training) 8000 Amazon product reviews. 4000 positive 4000 negative Books, DVDs, electronics 1000 product reviews from www.it168.com mp3 player, mobile phones, DC

15 Experiment Setting (Testing) 886 Chinese product reviews from www.it168.com –451 positive, 435 negative –Different from unlabeled training data (outside testing)

16 Baseline SVM –Use only labeled data TSVM (Transductive SVM) –Joachims, 1999 –Use both labeled and unlabeled

17 SVM Baselines SVM(EN) SVM(CN)

18 SVM Baselines SVM(ENCN1)

19 SVM Baselines SVM(ENCN2) average

20 TSVM Baselines TSVM(EN) TSVM(CN)

21 TSVM Baselines TSVM(ENCN1)

22 TSVM Baselines TSVM(ENCN2) average

23 Result: Method Comparison (1)

24 Result: Method Comparison (2) Performance on Each Side SVM(EN) TSVM(EN) CoTrain(EN)

25 Result: Method Comparison (3) Accuracy SVM(EN)0.738 TSVM(EN)0.769 CoTrain(EN)0.790 Accuracy SVM(CN)0.771 TSVM(CN)0.767 CoTrain(CN)0.775 CoTrain make better use of unlabeled Chinese reviews than TSVM

26 Result: Iteration Number Outperform TSVM(ENCN2) after 20 iterations

27 Result: Balance of (p,n) Unbalanced examples hurt the performance badly

28 Conclusion & Comment Co-Training approach for cross-lingual sentiment classification Future Work –Translated and natural text have different feature distribution –Domain adaptation algorithm (ex. structural correspondence learning) for linking them

29 Comment Leverage word (phrase) alignment in translated text

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Similar presentations

Presentation on theme: "1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Similar presentations

Presentation on theme: "1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009."— Presentation transcript:

Similar presentations

About project

Feedback