Download presentation
Presentation is loading. Please wait.
Published byMorris Martin York Modified over 6 years ago
1
Perceptron Learning for Chinese Word Segmentation
Yaoyong Li, Chuanjiang Miao, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield
2
Outline Perceptron learning for Chinese word segmentation (CWS)
Different feature sets Open task Result analysis 2(15)
3
Character Based CWS Check every character to see which of the following four categories it belongs to. beginning, end or middle character of a multi-character word; or a single character word Convert CWS into four binary classification problems Large training dataset, needs fast algorithm 3(15)
4
Perceptron Algorithm Simple, fast and effective
On-line or batch learning Checks learning instance one by one Binary classification In application, a character is assigned the class which classifier has the maximal output 4(15)
5
Uneven Margins Perceptron
Perceptron with margin has better generalisation capability than the original Perceptron Uneven margins make Perceptron handling imbalanced data better Uneven margins Perceptron is as simple and efficient as Perceptron 5(15)
6
Results for four classifications
F1(%) of four classifiers using 4-fold CV on training set of the four corpora beginning middle end single combination as 95.64 90.07 95.47 95.27 95.5 cityu 96.64 90.06 96.43 95.14 95.1 msr 96.36 89.79 96.00 94.99 94.9 pku 96.09 89.99 96.18 94.12 6(15)
7
Comparison of PAUM with SVM
Averaged F1(%) and computation time on three subsets and whole data of cityu corpus by 4-fold CV 100 1000 5000 53019 PAUM 73.55 (4s) 78.00 (14s) 88.08 (92s) 95.13 (1.03h) SVM 75.50 (3.8m) 79.15 (1.1h) 88.78 (13.7h) ------ 7(15)
8
Features The features of the character c0 are from the 5 neighbouring characters, {c-2 c-1 c0 c1 c2 } 1-order features {c-2 c-1 c0 c1 c2} 2-order features {c-2c-1 c-1c-0 c0c1 c1c2 c-1c1} Α海运 业 雄踞全球之首Ω 8(15)
9
Different feature sets
Different kernels correspond to different feature sets. Linear kernel amounts to using 1-order features quadratic kernel: all 1- and 2-order features semi-quadratic kernel: all 1-order features and some 2-order features, as shown in last slide linear quadratic semi-quadratic cityu 81.30 94.78 95.13 msr 79.80 94.92 pku 82.33 94.80 95.05 9(15)
10
Open task Replacement of some special text with symbol in order to achieve better generalisation Replace every English text with a symbol E Replace every Arabic number with a symbol N Smaller training data and less computation time 10(15)
11
Experimental results for open task
Comparison between close and open task using 4-fold CV on training sets of four corpora, F1 (%) and computation time Only text Text with E Text with E & N as 95.53 (8.88h) 95.65 (7.66h) 95.78 (7.07h) cityu 95.13 (1.03h) 95.25 (0.86h) 95.25 (0.82h) msr 94.92 (2.62h) 94.98 (1.69h) 95.00 (1.62h) pku 95.05 (0.70h) 95.08 (0.63h) 95.15 (0.60h) 11(15)
12
Official Results F1 (%) from official results Close task Open task as
94.4 94.8 cityu 93.6 msr 95.6 95.4 pku 92.7 93.8 12(15)
13
Analysis Comparison with best ones and those from 4-fold CV on training set for Close Task: F1 (%) Ours official Best 4-fold training Unknown characters rate (%) as 94.4 95.2 95.5 0.484 cityu 93.6 94.3 95.1 0.924 msr 95.6 96.4 94.9 0.034 pku 92.7 95.0 0.215 13(15)
14
Analysis (2) Comparison with best ones and those from 4-fold CV on training set for Open Task: F1 (%) test 4-fold training Unknown characters rate (%) as 94.8 95.78 0.042 cityu 93.6 95.25 0.86 msr 95.4 95.00 0.031 pku 93.8 95.15 0.119 14(15)
15
Conclusions A simple and fast learning algorithm for WSD
The results are encouraging Future works: Better way to deal with unknown characters More features 15(15)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.