Using Uneven Margins SVM and Perceptron for IE Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Presented by Niraj Aswani Department of Computer Science University of Sheffield {yaoyong,kalina,hamish,niraj}@dcs.shef.ac.uk http://gate.ac.uk/ http://nlp.shef.ac.uk/
Outline Imbalanced classification problem in Information Extraction (IE) Uneven margins SVM and Perceptron Experimental results 2(21)
Information Extraction (IE) IE is about extracting information about pre-specified types of events, entities or relationships from text such as newswire articles or Web pages Named entity recognition is a basic task for IE Some IE tasks can be regarded as filling in slots in an information template IE can be useful in many applications Information gathering in a variety of domains Automatic annotations of web pages for Semantic Web Knowledge management 3(21)
Machine Learning (ML) for IE ML has been widely used in IE and achieve state of the art results for some tasks Rule learning, such as Rapier, BWI, (LP)2 Statistical learning, such as Maximum Entropy, HMM, SVM, Perceptron Classifier based framework for IE Convert recognition of information entity into (often binary) classification problems. 4(21)
Imbalanced Classification Problem Imbalanced data positive examples are vastly outnumbered by negative ones. Most learning algorithms do not perform very well on imbalanced data The classification problem for IE usually has imbalanced data, particularly for small training set 5(21)
How to deal with imbalanced data Transform the problem Under-sample negative instances Over-sample positive instances Divide the problem into several sub-problems with less imbalanced data Modify the learning algorithm for imbalanced data We adapt the SVM and Perceptron for IE 6(21)
SVM and Perceptron For IE SVM and Perceptron are two popular learning algorithms for IE SVM achieved state of the art results for many classification problem, including IE. Perceptron is a simple, effective and fast learning algorithm. Two variants of Perceptron, voted Perceptron and Winnow have been successfully used in IE We adapt the SVM and Perceptron for imbalanced data in IE 7(21)
The SVM Classifier Classification hyper-plane has the same margins to negative and positive training examples 8(21)
Uneven Margins SVM -- for imbalanced data 9(21)
Uneven Margins SVM Introduce an uneven margins parameter τ into the SVM (see Li and Shawe-Taylor, 2003) τ is the ratio of negative margin to positive margin, which can be used for adjusting margins 10(21)
Uneven Margins Perceptron Perceptron: a simple and fast linear learning algorithm. PAUM: introduce two margin parameters τ+ and τ- into Perceptron (see Li et al, 2002). PAUM’s performance was comparable to the SVM on document classification. 11(21)
Three Experimental Datasets English part of CoNLL-2003 shared task data Most recent evaluation results on named entity recognition The Jobs corpus for template filling 300 software related job posts 17 slots encoding job details, such as title, salary, recruiter Call For Paper (CFP) corpus for Pascal challenge The latest results on template filling 600 annotated posts for workshop call for papers 11 slots such as workshop name, date, location, homepage 12(21)
Results on CoNLL-2003 Corpus MISC Overall F1 (%) Our systems SVMUM 86.30 Standard SVM 85.05 PAUM 84.36 Participating Best result 88.76 (±0.7) systems Another SVM 84.67 (±1.0) Voted Perceptron 84.30 (±0.9) Overall performances of our three systems, compared with three related participating systems. 13(21)
Results on Jobs Corpus SVMUM PAUM (LP)2 Rapiers DCs MA F1 (%) 80.8 (±1.0) 81.6 (±1.1) 77.2 76.0 57.9 Macro-averaged F1 of our two systems and other three systems evaluated on all 17 slots of Jobs dataset PAUM obtained even better results than SVMUM 14(21)
Results on CFP Corpus SVMUM PAUM Best result Micro-averaged F1 (%) 61.1 64.3 73.4 Our SVM and PAUM systems were respectively in the fourth and fifth position among the 20 participating systems Our SVM and PAUM systems performed better than all other SVM-based participating systems. 15(21)
Effects of Uneven Margins for SVM τ 1.0 0.8 0.6 0.4 0.2 Conll-03 85.0 86.0 86.2 85.9 81.6 Jobs 79.0 79.9 81.0 80.8 Uneven margins SVM (τ<1) performed significantly better the standard SVM (τ=1) The results were not very sensitive to the value of τ. 16(21)
Uneven Margins for Perceptron (τ+, τ-) (0,0) (1,1) (50,1) Conll-03 83.5 83.9 84.4 Jobs 74.1 78.8 81.6 Margin Perceptron obtained better results than Perceptron, and PAUM performed better than other two. Differences on Jobs were bigger than those on Conll data, as Jobs data is more imbalanced than Conll. 17(21)
Small Training Data Manually annotating training data is a time-consuming process. In many application we have to use a small training set. Small training data for IE is more imbalanced than larger one. 18(21)
Small Training Datasets for CoNLL-03 Data size 10 20 30 40 50 τ =0.4 60.6 66.4 70.4 72.2 72.8 τ =1.0 46.2 58.6 65.2 68.3 68.6 Compare the uneven margins SVM with standard SVM on small training sets of Conll-03 corpus The smaller the training set is, the better results the SVMUM obtained than the SVM 19(21)
Small Training Datasets for Jobs data Data size 10 20 30 40 50 τ =0.4 51.6 60.9 65.7 68.6 71.1 τ =1.0 47.1 56.5 61.4 65.4 68.1 Compare the uneven margins SVM with standard SVM on small training sets of Jobs corpus The behaviours on Jobs data are similar with those on CoNLL-2003 data. 20(21)
Conclusions Uneven margins parameter was indeed helpful to SVM for IE, especially for small data. PAUM performed well for IE. Future research: Apply uneven margins SVM and PAUM to other NLP learning tasks, as those tasks often lead to imbalanced data as well 21(21)
Thanks! 22(21)