Download presentation
Presentation is loading. Please wait.
1
CAS-IA System Description Jinhua Du CNGL July 23, 2008
2
Outline Hardware in IA Pre-process & Data MT System Configuration for Evaluation Achievements Conclusions
3
Hardware Machines Parallel Computing –Condor –Grid Computing Module developed by ASR group TypeOperating SystemNumberCPURAM Desktop PCWindows 20039Pentium 4, 3.0G2.0G ServerLinux (Ubuntu)1Xeon 2.0G×416.0G
4
Pre-process & Data Pre-processing –encoding conversion & filter –punctuation and number conversion (full-shaped -> half-shaped, etc.) –case conversion (only the initial alphabet of the initial word), abbreviation processing –Chinese word segment (ICT or IA tool), English tokenization Data for NIST –Parallel: 3.4 M (if adds UN corpus, up to 10M) –Monolingual: 3.4M + 9.6M(gigaword1&2) + 1.4M(giga3) = 14.4M Data for IWSLT –Parallel: BTEC(20K or 40K); LDC –Monolingual: BTEC; Gigaword –Data Filter: only need the high correlation data, very important for spoken evaluation (More better data, more better performance)
5
System Configuration Modules –Pre-processing –Alignment Post-preprocessing & Models Generation –Decoding & MER Training –System Combination & Post-Processing
6
Achievements (zh-en) The 3 rd MT Symposia in China ( rank 3) –Limited (830K pairs) –Unlimited (3M pairs)
7
Achievements (zh-en) NIST MT Eval. 2008 SystemBLEU-4IBM BLEU Primary ( combination ) 0.24070.2310 HPB0.24030.2279 STTB0.22860.2169 PB0.20000.1935
8
Achievements (zh-en) IWSLT2008 –More systems to be combined 2 PB systems developed by CASIA Moses SAMT (CMU) Hierarchical PB BTG-based system (Xiong) –Better performance (bleu+meteor)/ 2 bleumeteor (bleu+meteor)/ 2 bleumeteor 59.0949.8068.37tch.CRR58.7250.5566.89 58.2648.4468.08nlpr.CRR58.1249.3966.85
9
Conclusions More better data, better performance System combination is very helpful to improve the performance Evaluation is different from theoretical research: empirical methods and tricks are usually more effective For better rank, should be prepare in advance and build a temporarily team for evaluation Evaluation is a horrible thing for student: more time, more energy and no paper (joke but true) Develop systems for application purpose
10
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.