Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAS-IA System Description Jinhua Du CNGL July 23, 2008.

Similar presentations


Presentation on theme: "CAS-IA System Description Jinhua Du CNGL July 23, 2008."— Presentation transcript:

1 CAS-IA System Description Jinhua Du CNGL July 23, 2008

2 Outline Hardware in IA Pre-process & Data MT System Configuration for Evaluation Achievements Conclusions

3 Hardware Machines Parallel Computing –Condor –Grid Computing Module developed by ASR group TypeOperating SystemNumberCPURAM Desktop PCWindows 20039Pentium 4, 3.0G2.0G ServerLinux (Ubuntu)1Xeon 2.0G×416.0G

4 Pre-process & Data Pre-processing –encoding conversion & filter –punctuation and number conversion (full-shaped -> half-shaped, etc.) –case conversion (only the initial alphabet of the initial word), abbreviation processing –Chinese word segment (ICT or IA tool), English tokenization Data for NIST –Parallel: 3.4 M (if adds UN corpus, up to 10M) –Monolingual: 3.4M + 9.6M(gigaword1&2) + 1.4M(giga3) = 14.4M Data for IWSLT –Parallel: BTEC(20K or 40K); LDC –Monolingual: BTEC; Gigaword –Data Filter: only need the high correlation data, very important for spoken evaluation (More better data, more better performance)

5 System Configuration Modules –Pre-processing –Alignment Post-preprocessing & Models Generation –Decoding & MER Training –System Combination & Post-Processing

6 Achievements (zh-en) The 3 rd MT Symposia in China ( rank 3) –Limited (830K pairs) –Unlimited (3M pairs)

7 Achievements (zh-en) NIST MT Eval. 2008 SystemBLEU-4IBM BLEU Primary ( combination ) 0.24070.2310 HPB0.24030.2279 STTB0.22860.2169 PB0.20000.1935

8 Achievements (zh-en) IWSLT2008 –More systems to be combined 2 PB systems developed by CASIA Moses SAMT (CMU) Hierarchical PB BTG-based system (Xiong) –Better performance (bleu+meteor)/ 2 bleumeteor (bleu+meteor)/ 2 bleumeteor 59.0949.8068.37tch.CRR58.7250.5566.89 58.2648.4468.08nlpr.CRR58.1249.3966.85

9 Conclusions More better data, better performance System combination is very helpful to improve the performance Evaluation is different from theoretical research: empirical methods and tricks are usually more effective For better rank, should be prepare in advance and build a temporarily team for evaluation Evaluation is a horrible thing for student: more time, more energy and no paper (joke but true) Develop systems for application purpose

10 Thanks


Download ppt "CAS-IA System Description Jinhua Du CNGL July 23, 2008."

Similar presentations


Ads by Google