Inductive Approaches to the Detection and Classification of Semantic Relation Mentions Depth Report Examination Presentation Gabor Melli August 27, 2007.

Slides:

Advertisements

Similar presentations

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Advertisements

Machine learning continued Image source:

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Presented by Zeehasham Rasheed

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Scalable Text Mining with Sparse Generative Models

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Introduction to Machine Learning Approach Lecture 5.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Graphical models for part of speech tagging

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Information Extraction: Distilling Structured Data from Unstructured Text. -Andrew McCallum Presented by Lalit Bist.

Ling 570 Day 17: Named Entity Recognition Chunking.

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.

Machine Learning.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Supertagging CMSC Natural Language Processing January 31, 2006.

Semi-automatic Product Attribute Extraction from Store Website

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

Data Mining and Decision Support

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

A Brief Introduction to Distant Supervision

Relation Extraction CSCI-GA.2591

iSRD Spam Review Detection with Imbalanced Data Distributions

KnowItAll and TextRunner

Presentation transcript:

Inductive Approaches to the Detection and Classification of Semantic Relation Mentions Depth Report Examination Presentation Gabor Melli August 27,

Overview Introduction ( ~ 5 mins.) Task Description ( ~ 5 mins.) Predictive Features ( ~ 10 mins.) Inductive Algorithms ( ~ 10 mins.) Benchmark Tasks ( ~ 5 mins.) Research Directions ( ~ 5 mins.)

Simple examples of the “shallow” semantics sought “E. coli is a bacteria.”  R TypeOf (E. coli, bacteria) “An organism has proteins.”  R PartOf (proteins, organism) “IBM is based in Armonk, NY.”  R HeadquarterLocation (IBM, Armonk, NY)

Motivations Information Retrieval –Researchers could retrieve scientific papers based on relations E.g. “all papers that report localization experiments on V. cholera’s outer membrane proteins” –Judges could retrieve legal cases. E.g. “all Supreme Court cases involving third party liability claims” Information Fusion –Researchers could populate a database with semantic relations in research articles. E.g. SubcellularLocalization(Organism,Protein,Location) –Activists could save resources when compiling statistics from newspaper reports. Document Summarization, Question Answering, …

State-of-the-Art Current focus is to automatically induce predictive patterns/classifiers. –Can be more quickly applied to a new domain than an engineered solution. Human levels of competency are nearby. –F-measure: 76% on the ACE-2004 benchmark task (Zhou et al, 2007) 75% on a protein/gene interaction (Fundel et al, 2007) 72% on the SemEval-2007 task (Beamer et al, 2007). –Though under simplified conditions binary relations within a single sentence perfectly classified entity mentions.

Shallow semantic analysis is challenging Many ways to say the same thing –O is based in L.; L-based O …; Headquartered in L, O …; From its L headquarters, O … Many relations to disambiguate from.

Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions

Task Description Documents, Token, Sentences Entity Mentions: Detected and Classified Semantic Relation Cases and Mentions Performance Metrics Comparison with Information Extraction Task What name for the task? General Pipelined Process Subtask: Relation Case Generation Subtask: Relation Case Labeling Naïve Baseline Algorithms Documents, Token, Sentences Entity Mentions: Detected and Classified Semantic Relation Cases and Mentions Performance Metrics Comparison with Information Extraction Task What name for the task? General Pipelined Process Subtask: Relation Case Generation Subtask: Relation Case Labeling Naïve Baseline Algorithms

Document, Tokens, Sentences

Entity Mentions are pre-Detected (and pre-Classified)

Semantic Relations A relation with fixed set of two or more arguments. R i (Arg 1,…,Arg a )  {TRUE, FALSE} Examples: –TypeOf (E.coli, Bacteria)  TRUE –OrgLocation(IBM, Jupiter)  FALSE –SCL(V.cholerae, TcpC, Extracellular)  TRUE

Semantic Relation Cases Some permutation of distinct entity mentions within the document. D 1 : “E.coli 1 is a bacteria 2. As with all bacteria 3, E.coli 4 has a cytoplasm 5 ” C(R i, D 1, E 1, E 2 ) C(R i, D 1, E 2, E 1 ) … C(R j, D 1, E 4, E 3, E 5 ) C(R j, D 1, E 3, E 4, E 5 ) e – entity mentions a max – arguments c – relation cases

Semantic Relation Detection vs. Classification C ( R, D i, E j,…, E k )  ? Relation Detection {True,False} Relation Classification {1,2,…,r} ? Predict the semantic relation R j associated with a relation mention. Predict whether this is a true mention of some semantic relation.

Test and Training Sets C ( R ?, D d+1, E 1, E 2 )  ? … C ( R ?, D d+k, E x,…, E y )  ? C(R 1, D 1, E 1, E 2 )  F C(R 1, D 1, E 1, E 3 )  T … C(R r, D d, E 2, E 3, E 5 )  F C(R r, D d, E 3, E 4, E 5 )  F

Performance Metrics Precision ( P ): probability that a test case that is predicted to have label True is tp. Recall ( R ): probability that a True test case will be tp. F-measure ( F1 ): Harmonic mean of the Precision and Recall estimates. Accuracy: Proportion of predictions with correct labels, True of False.

Pipelined Process Framework

Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions

Predictive Feature Categories 1.Token-based 2.Entity Mention Argument-based 3.Chunking-based 4.Shallow Phrase-Structure Parse Tree- based 5.Phrase-Structure Parse Tree-based 6.Dependency Parse Tree-based 7.Semantic Role Label-based 1.Token-based 2.Entity Mention Argument-based 3.Chunking-based 4.Shallow Phrase-Structure Parse Tree- based 5.Phrase-Structure Parse Tree-based 6.Dependency Parse Tree-based 7.Semantic Role Label-based

Vector of Feature Information “ Protein1 is a Location1 lipoprotein required for Location2 biogenesis.”

Token-based Features “Protein1 is a Location1...” Token Distance –2 intervening tokens Token Sequence(s) –Unigrams –Bigrams

Token-based Features (cont.) Stemmed Word Sequences –“banks  bank” –“scheduling  schedule” Disambiguated Word-Sense (WordNet) –“bank”  river’s edge; financial inst.; row of objects Token Part-of-Speech Role Sequences

Entity Mention-based Features Entity Mention Tokens –IBM  1, Tierra del Fuego  3, … Entity Mention’s Semantic Type –Semantic Class Organization Location –Subclass Company; University; Charity Country; Province; Region; City

Entity Mention Features (cont.) Entity Mention Type –Name  John Doe, E. coli, periplasm, … –Nominal  the president, the country, … –Pronomial  he, she, they, it, … Entity Mention’s Ontology Id –secreted; extracellular  GO –E. coli; Escheria coli  571 (NCBI tax_id)

Phrase-Structure Parse Tree

Shortest-Path Enclosed Tree Loss of context?

Two types of subtrees proposed Both approaches lead to an exponential number of subtree features! Elementary subtrees general subtrees Elementary subtrees

Now we have a populated feature space

Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions

Inductive Approaches Available Supervised Algorithms –Requires a training set Semi-supervised Algorithms –Also accepts an unlabeled set Unsupervised Algorithms –Does not use a training set Most solutions restrict themselves to the task of detecting and classifying binary relation cases that are intra-sentential.

Supervised Algorithms Discriminative model –Feature-based (state of the art) E.g. k-Nearest Neighbor, Logistic Regression, … –Kernel-based (state of the art) E.g. Support Vector Machine Generative model –E.g. Probabilistic Context Free Grammars, and Hidden Markov Models

Feature-based Algorithms Kambhatla, 2004 –Early proposal to the use a broad set of features. Liu et al, 2007 –Proposed the use of features previously found to be predictive for the task of Semantic Role Labeling. Jiang and Zhai, 2007 –Used bigram and trigram PS parse tree subtree features (and dependency parse tree subtrees). –Adding trigram-based features produced marginal improvement in performance; therefore marginal improvement likely by adding higher-order subtrees.

Kernel-based Induction Zelenko et al, 2003; Culotta and Sorensen, 2004; Bunescu and Mooney, 2005; Zhao and Grishman, 2005; Zhang et al, Require a kernel function, K(C 1,C 2 ) → [0,∞], that maps any two feature vectors to a similarity score from within some transformed space. If symmetric and positive definite then comparison between vectors can often be performed efficiently in a high-dimensional space. If cases are separable in that space then the kernel attains the benefit of the high-dimensional space without explicitly generating the feature space.

Kernel by Zhang et al, 2006 Applies the Convolution Tree Kernel proposed in ( Collins and Duffy, 2001; Haussler, 1999 ) Number of common subtrees K c (T 1,T 2 ) – N j is the set of parent nodes in tree T j –  (n 1, n 2 ) evaluates the common sub-trees rooted at n 1 and n 2

Kernel computed recursively in O(|N 1 |  |N 2 |) –  (n 1, n 2 )=0 If productions at n 1 and n 2 differ –  (n 1, n 2 )=1  if n 1 and n 2 are POS nodes –Otherwise, #ch(n i ) is the number of children of node n i ch(n,k) is the k th child of node n, (0< <1) is a decay factor

Generative Models Approaches Earliest approach ( Leek 1997; Miller 1998 ). Instead of directly estimating model parameters for the conditional probability P(Y | X). Estimate model parameters for P(X | Y) and P(Y) from the training set Then apply Bayes rules to decide which label has the highest posterior probability. If the model fits the data then the generated likelihood ratio estimate is known to be optimal

Two Approaches Surveyed Probabilistic Context Free Grammars –Miller et al, 1998; Miller et al, 2000 Hidden Markov Models –Leek, 1997 –McCallum et al, 2000 –Ray and Craven, 2001; Skounakis, Craven, and Ray, 2003

PCFG-based Model

Miller et al, 1998/2000 From augmented representation learn a PCFG based on these trees. Infer the maximum likelihood estimates of the probabilities based on the frequencies in the training corpus, along with an interpolated adjustment of lower order estimates to handle the (increased) challenge of data sparsity. Parses of test cases that contain the semantic labels are predicted to be relation mentions.

Semi-Supervised Approaches ( Brin, 1998; Agichtein and Gravano, 2000 ) –Use token-based features –Apply resampling with replacement –Assume that relations in the training set are redundantly present and restated in test set. ( Shi et al, 2007 ) –Uses (Miller et al, 1998/2000) approach. –Uses a naïve baseline to convert unlabelled cases to true training cases.

Snowball’s Bootstrapping (Xia, 2006)

Unsupervised Use of Lexico- Syntactic Patterns Suggested initially by ( Hearst, 1992 ). Applied to relation detection by ( Pantel et al, 2004; Etzioni et al, 2005 ) Sample patterns: – such as, …, – like and – is a –, including Suited for the detection of TypeOf () subsumption relations over large corpora.

Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions

Benchmark Tasks Message Understanding Conference (MUC) –DARPA, (1989 – 1997), Newswire –TR task: Location_Of(ORG, LOC); Employee_of(PER, ORG); and Product_Of(ARTIFACT, ORG) Automatic Content Extraction (ACE) –NIST, (2002 – …), Newswire –Relation Mention Detection: ~5 major, ~24 minor rels –Physical(E 1,E 2 ); Social(Person x, Person y ); Employ(Org, Person); … Protein Localization Relation Extraction –SFU, (2006 – …) –SubcellularLocation(Organism, Protein, Location)

Message Understanding Conference 1997 Miller et al, 1998

ACE-2003

Prokaryote Protein Localization Relation Extraction (PPLRE) Task

Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions

1.Additional Features/Knowledge 2.Inter-sentential Relation Cases 3.Relations with More than Two Arguments 4.Grounding Entity Mentions to an Ontology 5.Qualifying the Certainty of a Relation Case 1.Additional Features/Knowledge 2.Inter-sentential Relation Cases 3.Relations with More than Two Arguments 4.Grounding Entity Mentions to an Ontology 5.Qualifying the Certainty of a Relation Case

Additional Features/Knowledge Expose additional features that can identify the more esoteric ways of expressing a relation. Features from outside of the “shortest-path”. –Challenge: past open-ended attempts have reduced performance ( Jiang and Zhi, 2007 ) –( Zhou et al, 2007 ) add heuristics for five common situations. Use domain-specific background knowledge. –E.g. Gram-positive bacteria (such as M. tuberculosis) do not have a periplasm therefore do not predict periplasm.

Inter-sentential Relation Cases Challenge: current approaches focus on syntactic features which cannot be extended beyond the sentence boundary. –Idea: apply Centering Theory ( Hirano et al, 2007 ) –Idea: create a text graph and to apply graph mining. Challenge: A significant increase in the proportion false relation cases. –Idea: a threshold on the number of pairings anyone entity mention can take.

Relations with > Two Arguments Idea: decompose the problem into a set of ( n – 1 ) binary relations and then join relation cases that share an entity mention ( Shi et al, 2007; Liu et al, 2007 ). –How to pick the ‘shared’ entity mention? –How much information is lost? Idea: Create a unified feature vector with features associated with each entity mention pair.

Shortened Reference List ACE Project, ( ). E. Agichtein, and L. Gravano. (2000). Snowball: Extracting Relations from Large Plain-Text Collections. In Proc. of DL-2000.Snowball: Extracting Relations from Large Plain-Text Collections D. E. Appelt, J. R. Hobbs, J. Bear, D. J. Israel, and M. Tyson. (1993). FASTUS: A Finite-state Processor for Information Extraction from Real-world Text. In Proc. IJCAI 1993.FASTUS: A Finite-state Processor for Information Extraction from Real-world Text. B. Beamer, S. Bhat, B. Chee, A. Fister, A. Rozovskaya, and R. Girju. (2007). UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals. In Proc. of the Fourth International Workshop on Semantic Evaluations (SemEval-2007).UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals. R. Bunescu, and R. J. Mooney. (2005). A Shortest Path Dependency Kernel for Relation Extraction. In Proc. of HLT/EMNLP-2005.A Shortest Path Dependency Kernel for Relation Extraction C. Cardie. (1997). Empirical Methods in Information Extraction. AI Magazine, 18(4).Empirical Methods in Information Extraction M. Craven, and J. Kumlien. (1999). Constructing Biological Knowledge-bases by Extracting Information from Text Sources. In Proc. of the International Conference on Intelligent Systems for Molecular Biology.Constructing Biological Knowledge-bases by Extracting Information from Text Sources. A. Culotta, and J. S. Sorensen. (2004). Dependency Tree Kernels for Relation Extraction. In Proc. of ACL-2004.Dependency Tree Kernels for Relation Extraction O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. S. Weld and A. Yates. (2005). Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence, 165(1).Unsupervised Named-Entity Extraction from the Web: An Experimental Study. K. Fundel, R. Kuffner, and R. Zimmer. (2007). RelEx--Relation Extraction Using Eependency Parse Trees. Bioinformatics. 23(3).RelEx--Relation Extraction Using Eependency Parse Trees R. Grishman, and B. Sundheim. (1996). Message Understanding Conference - 6: A Brief History. In Proc. of COLING-1996.Message Understanding Conference - 6: A Brief History. S. M. Harabagiu, C. A. Bejan and P. Morarescu. (2005). Shallow Semantics for Relation Extraction. In Proc. of IJCAI-2005.Shallow Semantics for Relation Extraction T. Hasegawa, S. Sekine, and R. Grishman. (2004). Discovering Relations among Named Entities from Large Corpora. In Proc. of ACL-2004.Discovering Relations among Named Entities from Large Corpora. J. Jiang and C. Zhai. (2007). A Systematic Exploration of the Feature Space for Relation Extraction. In Proc. of NAACL/HLT-2007.A Systematic Exploration of the Feature Space for Relation Extraction N. Kambhatla. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proc. of ACL-2004.Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations T.R. Leek. (1997). Information Extraction Using Hidden Markov Models. M.Sc. Thesis, University of California, San Diego.Information Extraction Using Hidden Markov Models Y. Liu, Z. Shi and A. Sarkar. (2007). Exploiting Rich Syntactic Information for Relation Extraction from Biomedical Articles. In Proc. of NAACL/HLT-2007.Exploiting Rich Syntactic Information for Relation Extraction from Biomedical Articles S. Miller, H. Fox, L. Ramshaw, and R. Weischedel. (2000). A novel use of statistical parsing to extract information from text. In Proc. of NAACL-2000.A novel use of statistical parsing to extract information from text S. Ray, and M. Craven. (2001). Representing Sentence Structure in Hidden Markov Models for Information Extraction. In Proc. IJCAI-2001.Representing Sentence Structure in Hidden Markov Models for Information Extraction. D. Roth, and W. Yih. (2002). Probabilistic Reasoning for Entity & Relation Recognition.. In Proc. of COLING-2002.Probabilistic Reasoning for Entity & Relation Recognition. Z. Shi. (2007). Ph.D. thesis. Forthcoming. Z. Shi, A. Sarkar and F. Popowich. (2007). Simultaneous Identification of Biomedical Named-Entity and Functional Relation Using Statistical Parsing Techniques. Proc. of NAACL/HLT-2007Simultaneous Identification of Biomedical Named-Entity and Functional Relation Using Statistical Parsing Techniques M. Skounakis, M. Craven and S. Ray. (2003). Hierarchical Hidden Markov Models for Information Extraction. In Proc. of IJCAI-2003.Hierarchical Hidden Markov Models for Information Extraction. F. M. Suchanek, G. Ifrim and G. Weikum. (2006). Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents. In Proc. of KDD Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents D. Zelenko, C. Aone, and A. Richardella. (2003). Kernel Methods for Relation Extraction. Journal of Machine Learning Research, Vol. 3.Kernel Methods for Relation Extraction M. Zhang, J. Su, D. Wang. G. Zhou and C. Lim. (2005). Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering. In Proc. of IJCNLP-2005.Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering M. Zhang, J. Zhang, and J. Su. (2006). Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel. In Proc. of HLT-2006.Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel S. Zhao, and R. Grishman. (2005). Extracting Relations with Integrated Information Using Kernel Methods. In Proc. of ACL-2005.Extracting Relations with Integrated Information Using Kernel Methods G. Zhou, M. Zhang, D. Ji and Q. Zhu. (2007). Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information. In Proc. of ACL-2007.Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information.

. The End

Backup Slides for Questions

Entity Mentions pre-Detected

Typed Semantic Relations require that the semantic relation’s arguments also be associated with a semantic class. For example, argument A1,2 may be associated with the semantic class ORGANIZATION.

Information Extraction vs. Relation Detection and Classification Some of the surveyed algorithms such as (Brin, 1998; Miller et al, 1998; Agichtein and Gravano, 2000, Suchanek et al, 2006) are presented in the literature as information extraction algorithms, not as relation detection and classification algorithms. They are included in the survey nonetheless because they can naturally be applied to the task of relation detection and classification. This situation is to be expected because the identification of relation mentions can be a natural preprocessing step to information extraction (ACE, ). IE = “populate a relational database table” (or to fill-in the slots of a template), where each record represents an instance of an entity or semantic relation in the domain.

Information Extraction Example corpus:

Information Extraction detects duplicate relation cases Relation Detection and Classification Information Extraction

What Task Name? Relation Extraction: Culotta and Sorensen, 2004; Harabagiu et al, 2005; Bunescu and Mooney, 2005; Zhang et al, 2005 and 2006; Jiang and Zhai, 2007; Xu et al, 2007; and Zhou et al, Relation Mention Detection (RMD): ACE Project, 2002 – Semantic Relation Identification: Beamer et al, Semantic Relation Classification: Girju et al, Relation Detection: Zhao and Grishman, 2005 Relation Discovery: Hasegawa et al, Relation Recognition: Roth and Yih, 2002.

Relation Case Generation Input: (D, R): A text document D and a set of semantic relations R with a arguments. Output: (C): A set of unlabelled semantic relation cases. Method: Identify all e entity mentions E i in D Create every combination of a entity mentions from the e mentions in the document (without replacement). –For intrasentential semantic relation detection and classification tasks, limit the entity mentions to be from the same sentence. –For typed semantic relation detection and classification tasks, limit the combinations to those where there is a match between the semantic classes of each of the entity mentions E i and the semantic class of their corresponding relation argument A i.

Relation Case Labeling

Naïve Baseline Algorithms Predict True: Always predicts “True” regardless of the contents of the relation case –Attains the maximum Recall by any algorithm on the task. –Attains the maximum F1 by any naïve algorithm. –Most commonly used naïve baseline. Predict Majority: Predicts the most prevalent class label in the training set. –Maximizes accuracy. –Degenerate to a “Predict False” algorithm. Predict (Biased) Random: Randomly predicts "True" with probability matching the distribution of "True" cases in the testing dataset, “False” otherwise. –Trades-off some Precision and Recall for additional Accuracy.

Prediction Outcome Labels true positive ( tp ) –predicted to have the label True and whose label is indeed True. false positive ( fp ) –predicted to have the label True but whose label is instead False. true negative ( tn ) –predicted to have the label False and whose label is indeed False. false negative ( fn ) –predicted to have the label False and whose label is instead True.

Shallow Parse Tree

Chunking-based Features A shallow syntactic analysis of a sentence that is fast and somewhat domain robust. (Abney, 1989) Within a Phrase (Ch.Phr) –Flag whether the two entity mentions are inside the same noun phrase, verb phrase or prepositional phrase. “Extracellular TcpQ is required for TCP biogenesis..”  [NP Extracellular TcpQ] [VP is required] [PP for] [NP TCP biogenesis].

Shallow Parse Tree Features

Subsequences within the SPS.LCS Inform the classifier about the subsequences Two versions (Zelenko et al, 2003) –Contiguous: based on all the subtrees with n edges. –Non-contiguous (sparse): based on subtrees that allow gaps.

Dependency Parse Features (Dep)

Semantic Role Labeling

Overlap with SRL Structures Features extracted from a sentence’s semantic role labeling ( Harabagiu et al, 2005 ) The predicate argument number associated with the entity mention (A0, A1, A2, …). –E.g. Is an entity mention is associated with role A1? The verb associated with the argument (e.g. be, require). –E.g. the verb “be” is associated with the entity mention E i.

One Classifier or Many? Which is better: –One classifier for detection and classification, or at least two? –If at least two, then one multi-label classifier for each relation, or many binary classifiers (one per relation) –Current empirical evidence suggest One classifier for detection One classifier per relation, for classification

Miller et al’s example of semantic annotation

Hidden Markov Model-based One of the first statistical approaches applied to the task. Akin to use of a stochastic version of the finite state automata successfully used in the FASTUS system. Efficient algorithms exist for: –learning the model’s parameters from word sequences –computing a sequence’s probability given the model –finding the highest probability path through the model’s states. Challenge has been to include more features into the models. –(McCallum et al, 2000) include capitalization, formatting, and POS. –(Ray and Craven, 2001) added shallow-parse tree –(Skounakis el al, 2003) use of hierarchical HMMs to represent syntax.

( Brin, 1998 and Agichtein and Gravano, 2000 ) DIPRE and Snowball use resampling with replacement. Snowball: 1.Uses NER and classification to better restrict relation cases considered. 2.Uses word unigrams instead of the single feature of the word sequence; 3.Use a discriminative algorithm 4.Stop iterations based on a threshold on the Precision. Advantages: word-based patterns that make up its classifier can be inspected by a domain expert. –E.g. {, }. Challenges: –more than six thresholds need to be manually set. –experimental evidence does not support its bootstrapping approach.

Hidden Markov Model-based (cont.) Train two HMM models: –A model ( ) from positive cases –A null model (  ) from negative ones. Given a test case sequence S the probability P( | S) and P(  | S) is computed. Once the log-odds for the prior probability of the relation sequence, , is calculated then each test case’s label is decided based on the log of the ratio of the probability of the case.

Hasegawa et al, 2004 Detect and classify all relation cases that require the same two argument types. E.g. R( PERSON, GEO-POLITICAL ENTITY ) –CitizenOf(), PresidentOf(), EnemyOf() Approach: –Use hierarchical clustering and a cosine similarity function. Clusters correspond to cases of the same relation. Cluster can be described by a small set of words that frequently appear in the cluster.

Global Inference Approaches An alternative to the pipelined approach Globally model all of the decisions in order to capture the mutual influences that exist with down stream decisions. Opportunity exists to repair incorrectly labeled entity mentions. For example, a typed relation detection algorithm could predict that an entity mention that is currently labeled as “GENE” is likely incorrect because a relation case that it participates requires that the argument be a “PROTEIN”. (Roth and Yih, 2002) propose to use the dependencies between relation and entity mentions to repair their labels. –First induce separate classifiers for entity detection and classification and for relation detection and classification. Any state-of-the-art supervised algorithm presented above could be used. –Next they perform global inference based on the conditional distributions of the two classifiers. (Miller et al, 2000) and (Shi et al, 2007) also perform a global inference and report a noticeable number of the mislabeled entity mentions.

ACE

Grounding Entity Mentions to an Ontology This step is essential for information extraction, but can be cumbersome and difficult to automate. E.g. A biologist would likely require the protein sequence (e.g. MKQSTIALAL …) not protein name (e.g. “ alkaline phosphatase ”). The sequence can be found in a master database such as Swiss-Prot, but at least two organisms have proteins with the same name. Idea: Use the relation information to disambiguate between ontology entries.

Qualifying the Certainty of a Relation Case It would be useful qualify the certainty that can be assigned to a relation mention. E.g. In the news domain, distinguish relation mentions based on first hand information versus those based on hearsay. Idea: Add an additional label to each relation case that qualifies the certainty of the statement. E.g. in the PPLRE task label cases with: “directly validated”, “indirectly validated”, “hypothesized”, and “assumed”.