Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.

Slides:

Advertisements

Similar presentations

Explanation-Based Learning (borrowed from mooney et al)

Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

Semi-supervised Relation Extraction with Large-scale Word Clustering Ang Sun Ralph Grishman Satoshi Sekine New York University June 20, 2011 NYU.

Machine learning continued Image source:

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.

Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.

Frustratingly Easy Domain Adaptation

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Introduction to domain adaptation

To Trust of Not To Trust? Predicting Online Trusts using Trust Antecedent Framework Viet-An Nguyen 1, Ee-Peng Lim 1, Aixin Sun 2, Jing Jiang 1, Hwee-Hoon.

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong

Multi-Task Learning for HIV Therapy Screening Steffen Bickel, Jasmina Bogojeska, Thomas Lengauer, Tobias Scheffer.

Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task.

Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.

Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign.

Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.

Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

AIFB Ontology Mapping I3CON Workshop PerMIS August 24-26, 2004 Washington D.C., USA Marc Ehrig Institute AIFB, University of Karlsruhe.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

Relation Extraction: Rule-based Approaches CSCI-GA.2590 Ralph Grishman NYU.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.

Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies Bhavana Dalvi ¶*, Aditya Mishra †, and William W. Cohen * ¶ Allen Institute.

Aspect-Based Sentiment Analysis Using Lexico-Semantic Patterns

Transfer Learning in Astronomy: A New Machine Learning Paradigm

Relation Extraction CSCI-GA.2591

Cold-Start Heterogeneous-Device Wireless Localization

Are End-to-end Systems the Ultimate Solutions for NLP?

Introduction Task: extracting relational facts from text

Automatic Extraction of Hierarchical Relations from Text

Multiple DAGs Learning with Non-negative Matrix Factorization

Modeling IDS using hybrid intelligent systems

Extracting Why Text Segment from Web Based on Grammar-gram

Improving Cross-lingual Entity Alignment via Optimal Transport

Presentation transcript:

Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009

Aug 5, 2009ACL-IJCNLP Relation Extraction Task definition: to label the semantic relation between a pair of entities in a sentence (fragment) …[leader arg-1 ] of a minority [government arg-2 ]… PHYSPER-SOCEMP-ORGNIL PHYS: Physical PER-SOC: Personal / Social EMP-ORG: Employment / Membership / Subsidiary

Aug 5, 2009ACL-IJCNLP Supervised Learning Current solution: supervised machine learning (e.g. [Zhou et al. 2005], [Bunescu & Mooney 2005], [Zhang et al. 2006]) Training data is needed for each relation type …[leader arg-1 ] of a minority [government arg-2 ]… arg-1 word: leaderarg-2 type: ORG dependency: arg-1  of  arg-2 EMP-ORGPHYSPER-SOCNIL

Aug 5, 2009ACL-IJCNLP Challenge in Practice New relation type (in a new domain): no training data or a few seed instances In this work, we study weakly-supervised relation extraction –A few seed instances of the target relation type –Many instances of other auxiliary relation types –Additional human knowledge about the target relation type Main idea: Auxiliary relation types can help!

Aug 5, 2009ACL-IJCNLP Syntactic Similarity across Relation Types …[leader arg-1 ] of a minority [government arg-2 ]… arg-1 word: leaderarg-2 type: ORG dependency: arg-1  of  arg-2 the youngest [son arg-1 ] of ex-director [Suharto arg-2 ] the [Socialist People’s Party arg-1 ] of [Montenegro arg-2 ] EMP-ORG PER-SOC GPE-AFF

Aug 5, 2009ACL-IJCNLP Syntactic Similarity Syntactic Pattern Relation InstanceRelation Type (Subtype) arg-2 arg-1Arab leadersOTHER-AFF (Ethnic) his fatherPER-SOC (Family) South Jakarta Prosecution Office GPE-AFF (Based-in) arg-1 [verb] arg-2Yemen [sent] planes to Baghdad ART (User-or- Owner) His wife [had] three young children PER-SOC (Family) Jody Scheckter [paced] Farrari to both victories EMP-ORG (Employ- Staff)

Aug 5, 2009ACL-IJCNLP Problem Formulation based on Transfer Learning Domain adaptation and transfer learning (e.g. [Blitzer et al. 2006], [Hal Daume III 2007]) our goal: PER-SOCEMP-ORG We apply our previous framework ([Jiang & Zhai 2007b]) –Similar in spirit to [Evgeniou & Pontil 2004] and [Daume III, 2007]

Aug 5, 2009ACL-IJCNLP Review of Relation Extraction Basics Linear classifier …[leader arg-1 ] of a minority [government arg-2 ]… arg-2 type: ORG arg-2 type: PER dependency: arg-1  of  arg arg-2 type: ORG feature vectorweight vector in linear classifier dependency: arg-1  of  arg-2 EMP-ORG

Aug 5, 2009ACL-IJCNLP General vs. Specific Features Assumption: some features are commonly useful for different relation types, while other features are specific for individual relation types : weight vector for target type : weight vector for k’th auxiliary type common weight vector in a lower H dimensional space

Aug 5, 2009ACL-IJCNLP Learning Framework loss function on the target seed instances loss function on the auxiliary training instances

Aug 5, 2009ACL-IJCNLP General Features Which subset of features should be captured by ? common weight vector in a lower H dimensional space

Aug 5, 2009ACL-IJCNLP Feature Separation Automatic separation within the learning framework (see [Jiang & Zhai 2007b]) Human guidance –Argument word features: features that contain head word of an argument E.g. arg-1 word: sister –Entity type features: features that contain the entity type (subtype) of an argument E.g. arg-2 type: ORG Combined

Aug 5, 2009ACL-IJCNLP Imposing Entity Type Constraint Fix the possible entity types for the arguments for the target relation type Filter out the relation instances that do not satisfy the constraint in the end

Aug 5, 2009ACL-IJCNLP Experiment Setup ACE 2004, 7 relation types –6 types  auxiliary types 1 type  target type 5-fold cross validation # seed instances: 10

Aug 5, 2009ACL-IJCNLP Methods Compared BL: train on seed instances only BL-A: train on seed and auxiliary training instances together w/o feature separation TL-auto: transfer learning w/ automatic feature separation TL-guide: transfer learning w/ human-guided feature separation TL-comb: automatic feature separation combined with human guidance TL-NE: TL-comb + entity type constraint

Aug 5, 2009ACL-IJCNLP Comparison Target TypeBLBL-ATL-autoTL- guide TL- comb TL-NE PhysicalP R F Personal/SocialP R F Employment /Membership /Subsidiary P R F AverageP R F

Aug 5, 2009ACL-IJCNLP Effect of λ λμTλμT P R F Performance of TL-comb. λ μ k = 10 4, λ ν = 1.

Aug 5, 2009ACL-IJCNLP Number of Seed Instances

Aug 5, 2009ACL-IJCNLP Sensitivity of H

Aug 5, 2009ACL-IJCNLP Conclusions We proposed to apply a multi-task transfer learning framework to the weakly-supervised relation extraction problem. We defined two kinds of type-specific features. Our experiments show that automatic feature separation combined with human guidance and entity type constraint can significantly outperform the baselines.

Aug 5, 2009ACL-IJCNLP Thank You! Questions?

Aug 5, 2009ACL-IJCNLP Related Work [Zhou et al. 2008]: Different way of modeling commonality among relation types. [Banko & Etzioni, 2008]: Open-domain relation extraction. No target relation type. [Xu et al. 2008]: Rule-based adaptation. Same type.

Aug 5, 2009ACL-IJCNLP Hypothesized Type-Specific Features