Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Slides:

Advertisements

Similar presentations

Lecture 9 Support Vector Machines

Advertisements

ECG Signal processing (2)

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Classification / Regression Support Vector Machines

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

1 Machine Learning in Natural Language 1.No Lecture on Thursday. 2.Instead: Monday, 4pm, 1404SC Mark Johnson lectures on: Bayesian Models of Language Acquisition.

An Introduction of Support Vector Machine

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.

SVM—Support Vector Machines

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Machine learning continued Image source:

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Classification and Decision Boundaries

Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A AA.

Support Vector Machines and Kernel Methods

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Support Vector Machines

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of nonlinear features.

Lecture 10: Support Vector Machines

An SVM Based Voting Algorithm with Application to Parse Reranking Paper by Libin Shen and Aravind K. Joshi Presented by Amit Wolfenfeld.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

CS 478 – Tools for Machine Learning and Data Mining SVM.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Question Classification using Support Vector Machine Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR2003.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign

Introduction to Machine Learning Prof. Nir Ailon Lecture 5: Support Vector Machines (SVM)

Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)

Support Vector Machines

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

Improving a Pipeline Architecture for Shallow Discourse Parsing

Kernels Usman Roshan.

Support Vector Machines Introduction to Data Mining, 2nd Edition by

A Machine Learning Approach to Coreference Resolution of Noun Phrases

COSC 4335: Other Classification Techniques

Usman Roshan CS 675 Machine Learning

COSC 4368 Machine Learning Organization

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

SVMs for Document Ranking

Presentation transcript:

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech Processing John Hopkins University Baltimore, Agust 22, 2007 Alessandro Moschitti and Xiaofeng Yang

Outline Motivations Support Vector Machines Kernel Methods Polynomial Kernel Sequence Kernels Tree kernels Kernels for Co-reference problem An effective syntactic structure Mention context via word sequences Experiments Conclusions

Motivations Intra/Cross document coreference resolution require the definition of complex features, i.e. syntactic/semantic structures For pronoun resolution Preference factors: Subject, Object, First-Mention, Definite NP Constraint factors: C-commanding,… For non-pronoun Predicative Structure, Appositive Structure

Motivations (2) How to represent such structures in the learning algorithm? How to combine different features ? How to select the relevant ones? Kernel methods allows us to represent structures in terms of substructures (high dimensional feature spaces) define implicit and abstract feature spaces Support Vector Machines “select” the relevant features Automatic Feature engineering side-effect

Support Vector Machines Var 1 Var 2 The margin is equal to We need to solve

SVM Classification Function and the Kernel Trick From the primal form

SVM Classification Function and the Kernel Trick From the primal form To the dual form where l is the number of training examples

SVM Classification Function and the Kernel Trick From the primal form To the dual form where l is the number of training examples

Flat features (Linear Kernel) Documents in Information Retrieval are represented as word vectors The dot product counts the number of features in common This provides a sort of similarity

Feature Conjunction (polynomial Kernel) The initial vectors are mapped in a higher space  Stock, Market,Downtown    Stock, Market, Downtown, Stock+Market, Downtown+Market, Stock+Downtown  We can efficiently compute the scalar product as

String Kernel Given two strings, the number of matches between their substrings is evaluated E.g. Bank and Rank B, a, n, k, Ba, Ban, Bank, Bk, an, ank, nk,.. R, a, n, k, Ra, Ran, Rank, Rk, an, ank, nk,.. String kernel over sentences and texts Huge space but there are efficient algorithms

Word Sequence Kernel String kernels where the symbols are words e.g. “so Bill Gates says that”  - Bill Gates says that - Gates says that - Bill says that - so Gates says that - so says that - …

A Tree Kernel [Collins and Duffy, 2002] NP D N VP V delivers a talk NP D N VP V delivers a NP D N VP V delivers NP D N VP V NP VP V

The overall SST fragment set

Explicit kernel space Given another vector, counts the number of common substructures

Implicit Representation [Collins and Duffy, ACL 2002] evaluate  in O(n 2 ):

Kernels for Co-reference problem: Syntactic Information Syntactic knowledge is important For pronoun resolution Subject, Object, First-Mention, Definite NP, C-commanding, … ? For non-pronoun Predicative Structure, Appositive Structure … Source of syntactic knowledge: Parse Tree: How to utilize such knowledge…

Previous Works on Syntactic knowledge Define a set of syntactic features extracted from parse trees whether a candidate is a subject NP whether a candidate is an object NP whether a candidate is c-commanding the anaphor …. Limitations Manually design a set of syntactic features By linguistic intuition Completeness, Effectiveness?

Incorporate structured syntactic knowledge – main idea Use parse tree directly as a feature Employ a tree kernel to compare the similarity of the tree features in two instances Learn a SVM classifier

Syntactic Tree feature Subtree that covers both anaphor and antecedent candidate  syntactic relations between anaphor & candidate (subject, object, c-commanding, predicate structure) Include the nodes in path between anaphor and candidate, as well as their first_level children –“ the man in the room saw him ” – inst( “ the man ”, “ him ” )

Context Sequence Feature A word sequence representing the mention expression and its context Create a sequence for a mention –“ Even so, Bill Gates says that he just doesn ’ t understand our infatuation with thin client versions of Word ” – (so)(,) (Bill)(Gates)(says)(that)

Composite Kernel different kernels for different features Poly Kernel: for baseline flat features Tree Kernel : for syntax trees Sequence Kernel: for word sequences A composite kernel for all kinds of features Composite Kernel = TreeK*PolyK+PolyK+SeqenceK

Results for pronoun resolution MUC-6ACE-02-BNews RPFRPF All attribute value features Syntactic Tree + Word Sequence

Results for over-all coreference Resolution using SVMs MUC-6ACE02-BNews RPFRPF BaseFeature SVMs BaseFeature + Syntax Tree BaseFeature+Synta xTree + Word Sequences All Sources of Knowledge

Conclusions SVMs and Kernel methods are powerful tools to design intra/cross doc coreference systems SVMs allows for better exploit attribute/vector features the use of syntactic structures the use of word sequence context The results show noticeable improvement over the baseline

Thank you