Distant supervision for relation extraction without labeled data

Slides:



Advertisements
Similar presentations
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Distant Supervision for Relation Extraction without Labeled Data CSE 5539.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
TEXTRUNNER Turing Center Computer Science and Engineering
Beyond TREC-QA Ling573 NLP Systems and Applications May 28, 2013.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
ISMB 2003 presentation Extracting Synonymous Gene and Protein Terms from Biological Literature Hong Yu and Eugene Agichtein Dept. Computer Science, Columbia.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Open Information Extraction using Wikipedia
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.
Post-Ranking query suggestion by diversifying search Chao Wang.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Automatically Labeled Data Generation for Large Scale Event Extraction
Queensland University of Technology
Ling573 NLP Systems and Applications May 30, 2013
PRESENTED BY: PEAR A BHUIYAN
Deep Learning Amin Sobhani.
A Brief Introduction to Distant Supervision
Relation Extraction CSCI-GA.2591
Lecture 24: Relation Extraction
Reading Report: Open QA Systems
Improving a Pipeline Architecture for Shallow Discourse Parsing
CSCE 590 Web Scraping – Information Retrieval
Applying Key Phrase Extraction to aid Invalidity Search
Presentation 王睿.
Extracting Semantic Concept Relations
Topic Oriented Semi-supervised Document Clustering
Introduction Task: extracting relational facts from text
iSRD Spam Review Detection with Imbalanced Data Distributions
Automatic Detection of Causal Relations for Question Answering
Lecture 13 Information Extraction
Searching with context
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Text Mining & Natural Language Processing
Open Information Extraction from the Web
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Extracting Why Text Segment from Web Based on Grammar-gram
Machine Reading.
Presentation transcript:

Distant supervision for relation extraction without labeled data Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky Stanford University / Stanford, CA 94305

Outline Motivation Introduction Proposed Strategy Results Conclusion References

Motivation In recent times, models of relation extraction for the purpose of Automatic Content Extraction (ACE) are based on supervised learning of relations from small labeled corpus. The motivation of the paper was to somewhat come up with a technique that does not depend on labeled corpora – thus avoiding the domain dependence of ACE style algorithms which also supports the use of large size corpora.

Introduction Freebase - a large semantic database of thousands of relations is used to provide distant supervision. Distant Supervision based algo usually – May have access to labeled data. Access to a pool of unlabeled data. Operator which samples data from the large pool and labels them. This operator is expected to be noisy in its labels. Effectively utilizes the training labeled data and the new noisy labeled data to give a final output.

Previous work were based on – Supervised approaches with access to small sized labeled dataset. Then using lexical, semantic , syntactic features, and use supervised classifiers to label relation mention holding between given pair of entities in a test set sentence. These approaches have lot of drawbacks – firstly having a labeled dataset is expensive , also these models tend to be biased towards that text domain. Purely unsupervised techniques extracts strings of words between entities from large corpus and clusters and simplifies them to produce relation – strings. Since we have large data not easy to map to relations needed for a particular knowledge base. Bootstrapping algorithms - These systems begin with a set of hand-tagged seed entities, then alternately learn rules from seeds, and further seeds from rules.

Proposed Strategy Freebase Database Architecture Feature Extraction Lexical Syntactic Named Entity Features Feature Conjunction

Freebase Database : ‘Relation’ refers to an ordered, binary relation between entities. Individual ordered pairs in these relations are referred to as ‘Relation Instances’. For Ex: in ‘person-nationality’ relation - <Jack Wilshere , England> , <Olivier Giroud, France> are instances. All relations and instances were used from Freebase – freely available online database of structured semantic data. One major source of this data is Wikipedia , NNDB (biographical information), MusicBrainz (music), the SEC (financial and corporate data). Architecture : Distant Supervision approach is to use Freebase to give us training set of relations and entity pairs that participate in those relations. Initially all entities in sentences are identified using entity tagger that labels persons , locations and organizations. (Stanford Named Entity Recognizer) If a particular sentence contains two entities and these as a pair form an instance of a relation , features are extracted from that sentence and added to the feature vector for that relation. Because any individual sentence may give an incorrect cue, our algorithm trains a multiclass logistic regression classifier, learning weights for each noisy feature. In training, the features for identical tuples (relation, entity1, entity2) from different sentences are combined, creating a richer feature vector.

Feature Extraction – Lexical Features - describe specific words between and surrounding the two entities in the sentence in which they appear. The sequence of words between the two entities The part-of-speech tags of these words A flag indicating which entity came first in the sentence A window of k words to the left of Entity 1 and their part-of-speech tags A window of k words to the right of Entity 2 and their part-of-speech tags Syntactic Features – features based on syntax. In order to generate these features each sentence was parsed with the broad-coverage dependency parser MINIPAR. A dependency path between the two entities For each entity, one ‘window’ node that is not part of the dependency path. A window node is a node connected to one of the two entities and not part of the dependency path. We generate one conjunctive feature for each pair of left and right window nodes, as well as features which omit one or both of them. Named Entity Tag Features - named entity tags for the two entities performed using the Stanford four-class named entity tagger. The tagger provides each word with a label from {person, location, organization, miscellaneous, none}. Feature Conjunction – Instead of using the above features independently in the classifier , conjunctive features have been used. Each feature consists of the conjunction of several attributes of the sentence, plus the named entity tags. For two features to match, all of their conjuncts must match exactly. Since large amounts of data was used , even complex features appear multiple times, allowing high precision features to work as intended.

Features for - ‘Astronomer Edwin Hubble was born in Marshfield, Missouri’. Dependency Parser Ex –

Results Preprocessing - each sentence is parsed by dependency parser MINPAR to produce the dependency graph. Consecutive words with same entity tag are ‘chunked’. For Ex – ‘Edwin/Person Hubble/Person’ becomes ‘Edwin Hubble/Person’. Chunks must be contiguous so that the dependency graph does not suffer. Training and Testing - Held Out Evaluation - 900,000 relation instances for training , 900,000 left for held out. 800,000 Wiki articles for training and 400,000 articles for testing. Human Evaluation – All of 1.8 million relation instances are used for training. While the train-test split in Wiki articles remain same. For their experiments – they extracted instances which did not appear in the training phase. Negative Training Data – randomly sample 1% of entity pairs which do not appear in freebase , extracting features for them. It may be the case that these pairs are wrongly omitted from the freebase dataset but it was observed that these false negatives have very little effect on the classifier.

Evaluations - Once the relation instances are extracted from the testing data , they are ranked by confidence score and used to generate a list of n most likely new relation instances. Held out Evaluations - automatically, by holding out part of the Freebase relation data during training, and comparing newly discovered relation instances against this held-out data. At most recall levels, the combination of syntactic and lexical features offers a substantial improvement in precision over either of these feature sets on its own. Human Evaluation - having humans who look at each positively labeled entity pair and mark whether the relation indeed holds between the participants. For each of the 10 relations that appeared most frequently in the test data (according to our classifier), took samples from the first 100 and 1000 instances of this relation generated in each experiment, and were sent to Mechanical Turk for human evaluation. At a recall of 100 instances, the combination of lexical and syntactic features has the best performance for a majority of the relations, while at a recall level of 1000 instances the results are mixed. No feature set strongly outperforms any of the others across all relations. Both evaluations allow us to calculate the precision of the system for the best N instances.

Conclusion Results show that the distant supervision algorithm is able to extract high- precision patterns for a reasonably large number of relations. Syntactic features consistently outperform lexical features for the director-film and writer-film relations. These two relations are particularly ambiguous, suggesting that syntactic features may help tease apart difficult relations. Syntactic features can more easily abstract from the syntactic modifiers that comprise the extraneous parts of these strings. Syntactic features are indeed useful in distantly supervised information extraction, and that the benefit of syntax occurs in cases where the individual patterns are particularly ambiguous, and where they are nearby in the dependency structure but distant in terms of words.

References Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Manuela M Veloso, editor, IJCAI-07, pages 2670– 2676 Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD ’08, pages 1247– 1250, New York, NY. ACM. Benjamin Rozenfeld and Ronen Feldman. 2008. Selfsupervised relation extraction from the web. Knowledge and Information Systems, 17(1):17–33.

Thanks And Think Positively