Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,

Slides:



Advertisements
Similar presentations
Statistical modelling of MT output corpora for Information Extraction.
Advertisements

Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
Automatically Acquiring a Linguistically Motivated Genic Interaction Extraction System Mark A. Greenwood Mark Stevenson Yikun Guo Henk Harkema Angus Roberts.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
NLDB 2004 ORAKEL: A Natural Language Interface to an F-Logic Knowledge Base Philipp Cimiano Institute AIFB University of Karlsruhe NLDB 2004.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Automatically Acquiring a Linguistically Motivated Genic Interaction Extraction System Mark A. Greenwood Mark Stevenson Yikun Guo Henk Harkema Angus Roberts.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Comparing Information Extraction Pattern Models Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.
Natural Language Interfaces to Ontologies Danica Damljanović
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK.
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Houses of Mirrors: Deeply Adaptive Designs for Machine Cognition Deborah Duong, Michael Ross.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Exploring and Navigating: Tools for GermaNet
Text-based User-kNN: Measuring user similarity based on text reviews
Presentation transcript:

Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK

The problem IE systems are time consuming to build using knowledge engineering approaches U. Mass report their system took 1,500 person- hours of expert labour to port to MUC-3 Learning methods can help reduce the burden –Supervised methods often require large amounts of training data –Weakly supervised approaches require less annotated data than supervised ones and less expert knowledge than knowledge engineering approaches

Learning Patterns Iterative Learning Algorithm 1.Begin with set of seed patterns which are known to be good extraction patterns 2.Compare every other pattern with the ones known to be good 3.Choose the highest scoring of these and add them to the set of good patterns 4.Stop if enough patterns have been learned, else goto 2. Seeds Candidates Rank Patterns

Semantic Approach Assumption: –Relevant patterns are ones with similar meanings to those already identified as useful Example: “The chairman resigned” “The chairman stood down” “The chairman quit” “Mr. Smith quit the job of chairman” Parallel with paraphrase identification problem

Patterns and Similarity Semantic patterns are SVO-tuples extracted from each clause in the sentence: chairman+resign Tuple fillers can be lexical items or semantic classes (eg. COMPANY, PERSON ) Patterns can be represented as vectors encoding the slot role and filler: chairman_subject, resign_verb Similarity between two patterns defined as follows:

Matrix Population Example matrix for patterns ceo+resigned and ceo+quit Matrix W is populated using semantic similarity metrics based on WordNet W ij = 0 for different roles or sim(w i, w j ) using a WordNet similarity measure Semantic classes are manually mapped onto an appropriate WordNet synset

Advantage ceo+resigned ceo+quit ceo_subject resign_verb quit_verb sim(ceo+resigned, ceo+quit) = 0.95 Adapted cosine metric allows synonymy and near-synonymy to be taken into account

Similarity Measures Define similarity in terms of length of path between WordNet synsets 1.Leacock and Chodrow 2.Wu and Palmer Define similarity using information content of nodes in WordNet hierarchy 3.Resnik 4.Lin 5.Jiang and Conrath

Algorithm Setup At each iteration –each candidate pattern is compared against the centroid of the set of currently accepted patterns –patterns with score within 95% of best pattern are accepted, up to a maximum of 4 Text pre-processed using GATE to tokenise, split into sentences and identify semantic classes Parsed using MINIPAR (adapted to deal with semantic classes marked in input) SVO tuples extracted from dependency tree

Evaluation MUC-6 “management succession” task COMPANY+appoint+PERSON COMPANY+elect+PERSON COMPANY+promote+PERSON COMPANY+name+PERSON PERSON+resign PERSON+quit PERSON+depart Seed Patterns

Example Learned Patterns COMPANY+hire+PERSON PERSON+hire+PERSON PERSON+succeed+PERSON PERSON+appoint+PERSON PERSON+name+POST PERSON+join+COMPANY PERSON+own+COMPANY COMPANY+aquire+COMPANY

Comparison Sentence filtering was used to evaluate the learned patterns Experiments carried out using five different WordNet similarity measures to populate W while learning patterns Compared with alternative approach –“Document centric” method described by Yangarber, Grishman, Tapanainen and Huttunen (2000) –Based on assumption that useful patterns will occur in similar documents to those which have already been identified as relevant

Evaluation Version of MUC-6 corpus in which sentences containing events were marked (Soderland, 1999) Evaluate how accurately generated pattern set can distinguish between “relevant” (event describing) and non-relevant sentences

Results

Comparison with Alternative Approach

Conclusion Use WordNet for a novel approach to weakly supervised pattern acquisition for Information Extraction Some WordNet similarity measures more suited to the task than others –Jiang and Conrath’s (1997) measure seems to give the best performance in this evaluation