Download presentation
Presentation is loading. Please wait.
1
Automatic Name Transliteration via OCR and NLP Yu Cao Tao Wang
2
Integration
3
Optical Character Recognition (OCR) ICDAR 2011 dataset character embedded in natural scene histogram of oriented gradients (HOG) 8x8 window sliding across at step of 2 linear kernel SVM 52 classes, i.e. capital and small letters overall character-level accuracy 74%
4
Bayesian Correction Char-level bigram language model Char-level accuracy improved to 75.3%
5
Named Entity Recognition (NER) essentially two types of labels, “ PERSON ” and “ NONPERSON ” MUC 7 corpora maximum entropy Markov model set of features: “ CUR_WORD ”, “ PREV_ LABEL ”, “ MID_INITIAL ”, “ IN_DICT ”, “ IN_NAME DATABASE ”, “ NEXT_WORD ” F1 score of 77.5% (Precision 76.9% & Recall 78.1%)
6
Transliteration character-level translation model training data: 4,256 English – Chinese name pairs obtained online trigram Chinese language model alignment model IBM model 1,3,4 human evaluation 120 English names obtained by NER for testing acceptance score 100 ± 2 /120 F r a n c i s c o 弗 朗 西 斯 科弗 朗 西 斯 科
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.