An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
The Query Compiler Varun Sud ID: 104. Agenda Parsing  Syntax analysis and Parse Trees.  Grammar for a simple subset of SQL  Base Syntactic Categories.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
Reporter: Longhua Qian School of Computer Science and Technology
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
SMS-Based web Search for Low- end Mobile Devices Jay Chen New York University Lakshmi Subramanian New York University
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Rui Yan, Yan Zhang Peking University
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
A hybrid method for Mining Concepts from text CSCE 566 semester project.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Open Information Extraction using Wikipedia
Measuring the Similarity between Implicit Semantic Relations using Web Search Engines Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka Web Search and.
Feng Zhang, Guang Qiu, Jiajun Bu*, Mingcheng Qu, Chun Chen College of Computer Science, Zhejiang University Hangzhou, China Reporter: 洪紹祥 Adviser: 鄭淑真.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Algorithmic Detection of Semantic Similarity WWW 2005.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
2016/3/11 Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chu.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Using lexical chains for keyword extraction
Relation Extraction CSCI-GA.2591
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Summarizing Entities: A Survey Report
Disambiguation Algorithm for People Search on the Web
Table Cell Search for Question Answering Huan Sun
Introduction Task: extracting relational facts from text
Automatic Detection of Causal Relations for Question Answering
Searching with context
Presentation transcript:

An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009

2 Abstract Relation Extraction from Wikipedia Texts A novel distance function A linear clustering algorithm Wikipedia Texts  High quality texts  Heavily cross-linked articles  Sentence -> Dependency tree Web Texts  Frequency information  Relation terms  Sentence -> Surface pattern Experiments on two different domains American chief executives Companies

3 Problem definition Relation extraction between article entitled concept (ec) and one of related concepts (rc) There is a salient semantic relation r between p and p’  l(p)

4 Problem definition (Eric E. Schmidt, Google) (Eric E. Schmidt, Compiler) (Eric E. Schmidt, Atherton, California) … (Bill Gates, Microsoft) … Concept pairs ClusteringEvaluation

5 Overview of the Approach Text preprocessor Concept pair collection Sentence filtering Web Context Collector A set of ranked relational terms A set of surface patterns Dependency pattern modeling Linguistic information Linear clustering algorithm Local clustering Global clustering

6 1. Text Preprocessor - Relation Candidate Generation Wikipedia article texts to get relation candidates corresponding sentences. All hyper-linked concepts in the article as related concepts, which may share a semantic relationship with the entitled concept Concept pairs Appling a linguistic parser to split article text into sentences for the dependency pattern modeling module

7 2. Web Context Collection Querying with a concept pair Hypothesis The web exists some key terms and patterns that provide clues to the relation the concept pair assume Two kinds of relational information a set of ranked relational terms as keywords a set of surface patterns

8 2. Web Context Collection - Relational Term Ranking (1/2) To collect relational terms as indicators for each concept pair Verbs, nouns Such as “CEO”, “founder” Entropy-based feature ranking algorithm Chen et al., 2005 (IJCNLP) After the ranking A relational term list T cp is ranked according to term order A keyword k cp is selected as co-appearing in the term list T cp and corresponding Wikipedia sentence

9 Entropy-based Feature Ranking - J. Chen, D. Ji, C.L. Tan, and Z. Niu Unsupervised Feature Selection for Relation Extraction. In Proceedings of JCNLP Local context vectors of co-occurrences of entity pair E 1 and E 2 P ={ p 1, p 2, … p N } The words occurred in P W ={ w 1, w 2, … w M } To select a subset of important features from W ;

10 2. Web Context Collection - Surface Pattern Generation (2/2) Content Words(CWs) ec( entitled concept), rc(related concept), keyword k cp Function Words Bag of words is to look for verbs, nouns, and coordinating conjunctions

11 3. Dependency Pattern Modeling Dependency patterns for relation clustering selected sentences one of entitled concept, one of the related concepts parsing into dependency structures  R. Bunescu and R. Mooney A shortest path dependency kernel for relation extraction. In Proceedings of HLT/EMLNP  M. Zhang, J. Zhang, J. Su and G. Zhou A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. In Proceedings of ACL-2006.

12 4. Linear Clustering Algorithm - Distance Function & Centroid Selection (1/2) All concept pairs are grouped by their keywords t cp Let G={G 1,G 2, …G n }, G i ={cp i1,cp i2,…, } shares the same keyword t cp A centroid c i is selected for group G i

13 4. Linear Clustering Algorithm - Distance Function & Centroid Selection (2/2) cost function cost(sp 1i,sp 2j ) B. Rosenfeld and R. Feldman URES: an Unsupervised Web Relation Extraction System. In Proceedings of COLING/ACL-2006.

14 4. Linear Clustering Algorithm - Local Dependency Pattern Clustering

15 4. Linear Clustering Algorithm - Local Dependency Pattern Clustering

16 4. Linear Clustering Algorithm - Global Surface Pattern Clustering

17 Experiments Wikipedia dump on 03/12/2008 Two categories American chief executives 526 articles, 7310 concept pairs 1/3,1/3 for D l and D g, 18 groups Companies 434 articles, 4935 concept pairs 1/3, 1/3 for D l and D g, 28 groups Compare with B. Rosenfeld and R. Feldman Clustering for Unsupervised Relation Identification. In Proceedings of CIKM surface feature

18 Experiments

19

20

21 Experiments

22 Conclusions A novel distance function A linear clustering algorithm Combination of two kinds of patterns Dependence patterns Surface patterns J. Chen, D. Ji, C.L. Tan, and Z. Niu Unsupervised Feature Selection for Relation Extraction. In Proceedings of JCNLP R. Bunescu and R. Mooney A shortest path dependency kernel for relation extraction. In Proceedings of HLT/EMLNP M. Zhang, J. Zhang, J. Su and G. Zhou A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. In Proceedings of ACL B. Rosenfeld and R. Feldman URES: an Unsupervised Web Relation Extraction System. In Proceedings of COLING/ACL-2006.