Using Uneven Margins SVM and Perceptron for IE

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Florida International University COP 4770 Introduction of Weka.
University of Sheffield NLP Module 11: Advanced Machine Learning.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Farag Saad i-KNOW 2014 Graz- Austria,
ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Presented by Zeehasham Rasheed
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Presented by Tienwei Tsai July, 2005
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
A Language Independent Method for Question Classification COLING 2004.
Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
PASCAL P ASCAL C HALLENGE ON I NFORMATION E XTRACTION & M ACHINE L EARNING Neil Ireson Local Challenge Coordinator Web Intelligent Group Department of.
PROJECT PROPOSAL DIGITAL IMAGE PROCESSING TITLE:- Automatic Machine Written Document Reader Project Partners:- Manohar Kuse(Y08UC073) Sunil Prasad Jaiswal(Y08UC124)
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
1 Tools for Extracting Metadata and Structure from DTIC Documents Digital Library Group Department of Computer Science Old Dominion University December,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Big data classification using neural network
Automatically Labeled Data Generation for Large Scale Event Extraction
Queensland University of Technology
Sentiment analysis algorithms and applications: A survey
Efficient Image Classification on Vertically Decomposed Data
Web Services and Application of Multi-Agent Paradigm for DL
Are End-to-end Systems the Ultimate Solutions for NLP?
An Introduction to Support Vector Machines
Introduction to Information Extraction
Social Knowledge Mining
Efficient Image Classification on Vertically Decomposed Data
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
iSRD Spam Review Detection with Imbalanced Data Distributions
Cost Sensitive Evaluation Measures for F-term Classification
Family History Technology Workshop
Automatic Extraction of Hierarchical Relations from Text
SVM Based Learning System for F-term Patent Classification
Perceptron Learning for Chinese Word Segmentation
Deep Cross-media Knowledge Transfer
University of Illinois System in HOO Text Correction Shared Task
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Hierarchical, Perceptron-like Learning for OBIE
Extracting Information from Diverse and Noisy Scanned Document Images
Presentation transcript:

Using Uneven Margins SVM and Perceptron for IE Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Presented by Niraj Aswani Department of Computer Science University of Sheffield {yaoyong,kalina,hamish,niraj}@dcs.shef.ac.uk http://gate.ac.uk/ http://nlp.shef.ac.uk/

Outline Imbalanced classification problem in Information Extraction (IE) Uneven margins SVM and Perceptron Experimental results 2(21)

Information Extraction (IE) IE is about extracting information about pre-specified types of events, entities or relationships from text such as newswire articles or Web pages Named entity recognition is a basic task for IE Some IE tasks can be regarded as filling in slots in an information template IE can be useful in many applications Information gathering in a variety of domains Automatic annotations of web pages for Semantic Web Knowledge management 3(21)

Machine Learning (ML) for IE ML has been widely used in IE and achieve state of the art results for some tasks Rule learning, such as Rapier, BWI, (LP)2 Statistical learning, such as Maximum Entropy, HMM, SVM, Perceptron Classifier based framework for IE Convert recognition of information entity into (often binary) classification problems. 4(21)

Imbalanced Classification Problem Imbalanced data positive examples are vastly outnumbered by negative ones. Most learning algorithms do not perform very well on imbalanced data The classification problem for IE usually has imbalanced data, particularly for small training set 5(21)

How to deal with imbalanced data Transform the problem Under-sample negative instances Over-sample positive instances Divide the problem into several sub-problems with less imbalanced data Modify the learning algorithm for imbalanced data We adapt the SVM and Perceptron for IE 6(21)

SVM and Perceptron For IE SVM and Perceptron are two popular learning algorithms for IE SVM achieved state of the art results for many classification problem, including IE. Perceptron is a simple, effective and fast learning algorithm. Two variants of Perceptron, voted Perceptron and Winnow have been successfully used in IE We adapt the SVM and Perceptron for imbalanced data in IE 7(21)

The SVM Classifier Classification hyper-plane has the same margins to negative and positive training examples 8(21)

Uneven Margins SVM -- for imbalanced data 9(21)

Uneven Margins SVM Introduce an uneven margins parameter τ into the SVM (see Li and Shawe-Taylor, 2003) τ is the ratio of negative margin to positive margin, which can be used for adjusting margins 10(21)

Uneven Margins Perceptron Perceptron: a simple and fast linear learning algorithm. PAUM: introduce two margin parameters τ+ and τ- into Perceptron (see Li et al, 2002). PAUM’s performance was comparable to the SVM on document classification. 11(21)

Three Experimental Datasets English part of CoNLL-2003 shared task data Most recent evaluation results on named entity recognition The Jobs corpus for template filling 300 software related job posts 17 slots encoding job details, such as title, salary, recruiter Call For Paper (CFP) corpus for Pascal challenge The latest results on template filling 600 annotated posts for workshop call for papers 11 slots such as workshop name, date, location, homepage 12(21)

Results on CoNLL-2003 Corpus MISC Overall F1 (%) Our systems SVMUM 86.30 Standard SVM 85.05 PAUM 84.36 Participating Best result 88.76 (±0.7) systems Another SVM 84.67 (±1.0) Voted Perceptron 84.30 (±0.9) Overall performances of our three systems, compared with three related participating systems. 13(21)

Results on Jobs Corpus SVMUM PAUM (LP)2 Rapiers DCs MA F1 (%) 80.8 (±1.0) 81.6 (±1.1) 77.2 76.0 57.9 Macro-averaged F1 of our two systems and other three systems evaluated on all 17 slots of Jobs dataset PAUM obtained even better results than SVMUM 14(21)

Results on CFP Corpus SVMUM PAUM Best result Micro-averaged F1 (%) 61.1 64.3 73.4 Our SVM and PAUM systems were respectively in the fourth and fifth position among the 20 participating systems Our SVM and PAUM systems performed better than all other SVM-based participating systems. 15(21)

Effects of Uneven Margins for SVM τ 1.0 0.8 0.6 0.4 0.2 Conll-03 85.0 86.0 86.2 85.9 81.6 Jobs 79.0 79.9 81.0 80.8 Uneven margins SVM (τ<1) performed significantly better the standard SVM (τ=1) The results were not very sensitive to the value of τ. 16(21)

Uneven Margins for Perceptron (τ+, τ-) (0,0) (1,1) (50,1) Conll-03 83.5 83.9 84.4 Jobs 74.1 78.8 81.6 Margin Perceptron obtained better results than Perceptron, and PAUM performed better than other two. Differences on Jobs were bigger than those on Conll data, as Jobs data is more imbalanced than Conll. 17(21)

Small Training Data Manually annotating training data is a time-consuming process. In many application we have to use a small training set. Small training data for IE is more imbalanced than larger one. 18(21)

Small Training Datasets for CoNLL-03 Data size 10 20 30 40 50 τ =0.4 60.6 66.4 70.4 72.2 72.8 τ =1.0 46.2 58.6 65.2 68.3 68.6 Compare the uneven margins SVM with standard SVM on small training sets of Conll-03 corpus The smaller the training set is, the better results the SVMUM obtained than the SVM 19(21)

Small Training Datasets for Jobs data Data size 10 20 30 40 50 τ =0.4 51.6 60.9 65.7 68.6 71.1 τ =1.0 47.1 56.5 61.4 65.4 68.1 Compare the uneven margins SVM with standard SVM on small training sets of Jobs corpus The behaviours on Jobs data are similar with those on CoNLL-2003 data. 20(21)

Conclusions Uneven margins parameter was indeed helpful to SVM for IE, especially for small data. PAUM performed well for IE. Future research: Apply uneven margins SVM and PAUM to other NLP learning tasks, as those tasks often lead to imbalanced data as well 21(21)

Thanks! 22(21)