Perceptron Learning for Chinese Word Segmentation

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

Florida International University COP 4770 Introduction of Weka.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Speaker Adaptation for Vowel Classification
Support Vector Machines
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
CSSE463: Image Recognition Day 15 Today: Today: Your feedback: Your feedback: Projects/labs reinforce theory; interesting examples, topics, presentation;
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.
Support Vector Machine
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Efficient Image Classification on Vertically Decomposed Data
Discriminative Training of Chow-Liu tree Multinet Classifiers
Perceptrons Lirong Xia.
Support Vector Machines
Nonparametric Methods: Support Vector Machines
Basic machine learning background with Python scikit-learn
Jan Rupnik Jozef Stefan Institute
Max-margin sequential learning methods
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pawan Lingras and Cory Butz
Machine Learning Week 1.
Efficient Image Classification on Vertically Decomposed Data
Linear machines 28/02/2017.
Cost Sensitive Evaluation Measures for F-term Classification
Automatic Extraction of Hierarchical Relations from Text
SVM Based Learning System for F-term Patent Classification
Usman Roshan CS 675 Machine Learning
Concave Minimization for Support Vector Machine Classifiers
Lecture 10 – Introduction to Weka
Using Uneven Margins SVM and Perceptron for IE
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Hierarchical, Perceptron-like Learning for OBIE
University of Wisconsin - Madison
CAMCOS Report Day December 9th, 2015 San Jose State University
Perceptrons Lirong Xia.
Presentation transcript:

Perceptron Learning for Chinese Word Segmentation Yaoyong Li, Chuanjiang Miao, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield {yaoyong,kalina,hamish}@dcs.shef.ac.uk http://gate.ac.uk/ http://nlp.shef.ac.uk/

Outline Perceptron learning for Chinese word segmentation (CWS) Different feature sets Open task Result analysis 2(15)

Character Based CWS Check every character to see which of the following four categories it belongs to. beginning, end or middle character of a multi-character word; or a single character word Convert CWS into four binary classification problems Large training dataset, needs fast algorithm 3(15)

Perceptron Algorithm Simple, fast and effective On-line or batch learning Checks learning instance one by one Binary classification In application, a character is assigned the class which classifier has the maximal output 4(15)

Uneven Margins Perceptron Perceptron with margin has better generalisation capability than the original Perceptron Uneven margins make Perceptron handling imbalanced data better Uneven margins Perceptron is as simple and efficient as Perceptron 5(15)

Results for four classifications F1(%) of four classifiers using 4-fold CV on training set of the four corpora beginning middle end single combination as 95.64 90.07 95.47 95.27 95.5 cityu 96.64 90.06 96.43 95.14 95.1 msr 96.36 89.79 96.00 94.99 94.9 pku 96.09 89.99 96.18 94.12 6(15)

Comparison of PAUM with SVM Averaged F1(%) and computation time on three subsets and whole data of cityu corpus by 4-fold CV 100 1000 5000 53019 PAUM 73.55 (4s) 78.00 (14s) 88.08 (92s) 95.13 (1.03h) SVM 75.50 (3.8m) 79.15 (1.1h) 88.78 (13.7h) ------ 7(15)

Features The features of the character c0 are from the 5 neighbouring characters, {c-2 c-1 c0 c1 c2 } 1-order features {c-2 c-1 c0 c1 c2} 2-order features {c-2c-1 c-1c-0 c0c1 c1c2 c-1c1} Α海运 业 雄踞全球之首Ω 8(15)

Different feature sets Different kernels correspond to different feature sets. Linear kernel amounts to using 1-order features quadratic kernel: all 1- and 2-order features semi-quadratic kernel: all 1-order features and some 2-order features, as shown in last slide linear quadratic semi-quadratic cityu 81.30 94.78 95.13 msr 79.80 94.92 pku 82.33 94.80 95.05 9(15)

Open task Replacement of some special text with symbol in order to achieve better generalisation Replace every English text with a symbol E Replace every Arabic number with a symbol N Smaller training data and less computation time 10(15)

Experimental results for open task Comparison between close and open task using 4-fold CV on training sets of four corpora, F1 (%) and computation time Only text Text with E Text with E & N as 95.53 (8.88h) 95.65 (7.66h) 95.78 (7.07h) cityu 95.13 (1.03h) 95.25 (0.86h) 95.25 (0.82h) msr 94.92 (2.62h) 94.98 (1.69h) 95.00 (1.62h) pku 95.05 (0.70h) 95.08 (0.63h) 95.15 (0.60h) 11(15)

Official Results F1 (%) from official results Close task Open task as 94.4 94.8 cityu 93.6 msr 95.6 95.4 pku 92.7 93.8 12(15)

Analysis Comparison with best ones and those from 4-fold CV on training set for Close Task: F1 (%) Ours official Best 4-fold training Unknown characters rate (%) as 94.4 95.2 95.5 0.484 cityu 93.6 94.3 95.1 0.924 msr 95.6 96.4 94.9 0.034 pku 92.7 95.0 0.215 13(15)

Analysis (2) Comparison with best ones and those from 4-fold CV on training set for Open Task: F1 (%) test 4-fold training Unknown characters rate (%) as 94.8 95.78 0.042 cityu 93.6 95.25 0.86 msr 95.4 95.00 0.031 pku 93.8 95.15 0.119 14(15)

Conclusions A simple and fast learning algorithm for WSD The results are encouraging Future works: Better way to deal with unknown characters More features 15(15)