Download presentation
Presentation is loading. Please wait.
Published byGarey Burke Modified over 9 years ago
1
A MIXED MODEL FOR CROSS LINGUAL OPINION ANALYSIS Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao, Jiyun Zhou, Shuwei Wang, Qiaoyun Qiu, Ricky Chenug
2
Content 2 Introduction Our approach The evaluation Future work
3
Introduction 3 opinion analysis technique become a focus topic in natural language processing research Rule-based/machine learning methods The bottle neck of machine learning method The cross-lingual method for opinion analysis Related work
4
Our approach 4 Cross-lingual self-training Cross-lingual co-training The mixed model
5
Cross-lingual self-training 5
6
6 MT Source language annotated corpus Unlabeled source language corpus SVMs Target language annotated corpus MT Unlabeled target language corpus SVMs Top K
7
Cross-lingual co-training 7 similar to cross-lingual transfer self-training Difference : – In the co-training model, the classification results for a sample in one language and its translation in another language are incorporated for classification in each iteration
8
Cross-lingual co-training 8 Source language annotated corpus Unlabeled source language corpus SVMs Target language annotated corpus Unlabeled target language corpus SVMs Top K MT
9
The mixed model 9
10
The evaluation 10 This dataset consists of the reviews on DVD, Book and Music category The training data of each category contains 4,000 English annotated documents and 40 Chinese annotated documents Chinese raw corpus contains 17,814 DVD documents, 47,071 Book documents and 29,677 Music documents. each category contains 4,000 Chinese documents
11
The baseline performance 11 Only use 40 Chinese annotated documents and the machine translation result of 4000 English annotated documents The classifier: SVM-LIGHT Feature: lexicon/unigram/bigram MT system: Baidu fanyi Segmentation tools: ICTCLA Lexicon resource: WordNet Affect
12
The baseline performance 12 CategoryAccuracy Accuracy DVD 0.7373 Accuracy Book 0.7215 Accuracy Music 0.7423 Accuracy0.7337
13
The evaluation of self-training 13
14
The evaluation of co-training 14
15
The evaluation of mixed model 15
16
The evaluation result 16 TeamDVDMusicBookAccuracy BISTU0.64730.66050.59800.6353 HLT-Hitsz0.77730.75130.78500.7712 THUIR-SENTI0.73900.73250.74230.7379 SJTUGSLIU0.77200.74530.72400.7471 LEO_WHU0.78330.75950.77000.7709 Our Approach0.79650.78300.78700.7889
17
Conclusion & future works 17 This weighted based mixed model achieves the best performance on NLP&CC 2013 CLOA bakeoff dataset T ransfer learning process does not satisfy the independent identical distribution hypothesis
18
Thanks 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.