Recognizing Location Names from Chinese Texts 严德美 2017年10月9日
segment and assign part-of-speech tags break segmented words into characters,assign its features identify the location names using M3 Net
IBO2 representation for proper chunks I Current token is inside of a chunk O Current token is outside of any chunk B Current token is the beginning of a chunk yi
feature extraction for Chinese location names The location name characteristic table is set up in advance and characteristics of location names such such as “市”,“省”,“县” etc.
POS tags
An example of features extraction Posiotion -2 -1 0 +1 +2 Character Da Lian Shi Ren Min POS tags n-B n-E n-S n-B n-E LC N N Y N N BeforeLoc Y N N N N BehindLoc N N N Y N
two kinds of Markov networks feature function is feature function is
Experiment and results 1 million characters in training corpus (18522 location names) 223 thousand characters in the testing corpus (3658 location names)
Q&A Thank you!