Inductive Approaches to the Detection and Classification of Semantic Relation Mentions Depth Report Examination Presentation Gabor Melli August 27,
Overview Introduction ( ~ 5 mins.) Task Description ( ~ 5 mins.) Predictive Features ( ~ 10 mins.) Inductive Algorithms ( ~ 10 mins.) Benchmark Tasks ( ~ 5 mins.) Research Directions ( ~ 5 mins.)
Simple examples of the “shallow” semantics sought “E. coli is a bacteria.” R TypeOf (E. coli, bacteria) “An organism has proteins.” R PartOf (proteins, organism) “IBM is based in Armonk, NY.” R HeadquarterLocation (IBM, Armonk, NY)
Motivations Information Retrieval –Researchers could retrieve scientific papers based on relations E.g. “all papers that report localization experiments on V. cholera’s outer membrane proteins” –Judges could retrieve legal cases. E.g. “all Supreme Court cases involving third party liability claims” Information Fusion –Researchers could populate a database with semantic relations in research articles. E.g. SubcellularLocalization(Organism,Protein,Location) –Activists could save resources when compiling statistics from newspaper reports. Document Summarization, Question Answering, …
State-of-the-Art Current focus is to automatically induce predictive patterns/classifiers. –Can be more quickly applied to a new domain than an engineered solution. Human levels of competency are nearby. –F-measure: 76% on the ACE-2004 benchmark task (Zhou et al, 2007) 75% on a protein/gene interaction (Fundel et al, 2007) 72% on the SemEval-2007 task (Beamer et al, 2007). –Though under simplified conditions binary relations within a single sentence perfectly classified entity mentions.
Shallow semantic analysis is challenging Many ways to say the same thing –O is based in L.; L-based O …; Headquartered in L, O …; From its L headquarters, O … Many relations to disambiguate from.
Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions
Task Description Documents, Token, Sentences Entity Mentions: Detected and Classified Semantic Relation Cases and Mentions Performance Metrics Comparison with Information Extraction Task What name for the task? General Pipelined Process Subtask: Relation Case Generation Subtask: Relation Case Labeling Naïve Baseline Algorithms Documents, Token, Sentences Entity Mentions: Detected and Classified Semantic Relation Cases and Mentions Performance Metrics Comparison with Information Extraction Task What name for the task? General Pipelined Process Subtask: Relation Case Generation Subtask: Relation Case Labeling Naïve Baseline Algorithms
Document, Tokens, Sentences
Entity Mentions are pre-Detected (and pre-Classified)
Semantic Relations A relation with fixed set of two or more arguments. R i (Arg 1,…,Arg a ) {TRUE, FALSE} Examples: –TypeOf (E.coli, Bacteria) TRUE –OrgLocation(IBM, Jupiter) FALSE –SCL(V.cholerae, TcpC, Extracellular) TRUE
Semantic Relation Cases Some permutation of distinct entity mentions within the document. D 1 : “E.coli 1 is a bacteria 2. As with all bacteria 3, E.coli 4 has a cytoplasm 5 ” C(R i, D 1, E 1, E 2 ) C(R i, D 1, E 2, E 1 ) … C(R j, D 1, E 4, E 3, E 5 ) C(R j, D 1, E 3, E 4, E 5 ) e – entity mentions a max – arguments c – relation cases
Semantic Relation Detection vs. Classification C ( R, D i, E j,…, E k ) ? Relation Detection {True,False} Relation Classification {1,2,…,r} ? Predict the semantic relation R j associated with a relation mention. Predict whether this is a true mention of some semantic relation.
Test and Training Sets C ( R ?, D d+1, E 1, E 2 ) ? … C ( R ?, D d+k, E x,…, E y ) ? C(R 1, D 1, E 1, E 2 ) F C(R 1, D 1, E 1, E 3 ) T … C(R r, D d, E 2, E 3, E 5 ) F C(R r, D d, E 3, E 4, E 5 ) F
Performance Metrics Precision ( P ): probability that a test case that is predicted to have label True is tp. Recall ( R ): probability that a True test case will be tp. F-measure ( F1 ): Harmonic mean of the Precision and Recall estimates. Accuracy: Proportion of predictions with correct labels, True of False.
Pipelined Process Framework
Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions
Predictive Feature Categories 1.Token-based 2.Entity Mention Argument-based 3.Chunking-based 4.Shallow Phrase-Structure Parse Tree- based 5.Phrase-Structure Parse Tree-based 6.Dependency Parse Tree-based 7.Semantic Role Label-based 1.Token-based 2.Entity Mention Argument-based 3.Chunking-based 4.Shallow Phrase-Structure Parse Tree- based 5.Phrase-Structure Parse Tree-based 6.Dependency Parse Tree-based 7.Semantic Role Label-based
Vector of Feature Information “ Protein1 is a Location1 lipoprotein required for Location2 biogenesis.”
Token-based Features “Protein1 is a Location1...” Token Distance –2 intervening tokens Token Sequence(s) –Unigrams –Bigrams
Token-based Features (cont.) Stemmed Word Sequences –“banks bank” –“scheduling schedule” Disambiguated Word-Sense (WordNet) –“bank” river’s edge; financial inst.; row of objects Token Part-of-Speech Role Sequences
Entity Mention-based Features Entity Mention Tokens –IBM 1, Tierra del Fuego 3, … Entity Mention’s Semantic Type –Semantic Class Organization Location –Subclass Company; University; Charity Country; Province; Region; City
Entity Mention Features (cont.) Entity Mention Type –Name John Doe, E. coli, periplasm, … –Nominal the president, the country, … –Pronomial he, she, they, it, … Entity Mention’s Ontology Id –secreted; extracellular GO –E. coli; Escheria coli 571 (NCBI tax_id)
Phrase-Structure Parse Tree
Shortest-Path Enclosed Tree Loss of context?
Two types of subtrees proposed Both approaches lead to an exponential number of subtree features! Elementary subtrees general subtrees Elementary subtrees
Now we have a populated feature space
Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions
Inductive Approaches Available Supervised Algorithms –Requires a training set Semi-supervised Algorithms –Also accepts an unlabeled set Unsupervised Algorithms –Does not use a training set Most solutions restrict themselves to the task of detecting and classifying binary relation cases that are intra-sentential.
Supervised Algorithms Discriminative model –Feature-based (state of the art) E.g. k-Nearest Neighbor, Logistic Regression, … –Kernel-based (state of the art) E.g. Support Vector Machine Generative model –E.g. Probabilistic Context Free Grammars, and Hidden Markov Models
Feature-based Algorithms Kambhatla, 2004 –Early proposal to the use a broad set of features. Liu et al, 2007 –Proposed the use of features previously found to be predictive for the task of Semantic Role Labeling. Jiang and Zhai, 2007 –Used bigram and trigram PS parse tree subtree features (and dependency parse tree subtrees). –Adding trigram-based features produced marginal improvement in performance; therefore marginal improvement likely by adding higher-order subtrees.
Kernel-based Induction Zelenko et al, 2003; Culotta and Sorensen, 2004; Bunescu and Mooney, 2005; Zhao and Grishman, 2005; Zhang et al, Require a kernel function, K(C 1,C 2 ) → [0,∞], that maps any two feature vectors to a similarity score from within some transformed space. If symmetric and positive definite then comparison between vectors can often be performed efficiently in a high-dimensional space. If cases are separable in that space then the kernel attains the benefit of the high-dimensional space without explicitly generating the feature space.
Kernel by Zhang et al, 2006 Applies the Convolution Tree Kernel proposed in ( Collins and Duffy, 2001; Haussler, 1999 ) Number of common subtrees K c (T 1,T 2 ) – N j is the set of parent nodes in tree T j – (n 1, n 2 ) evaluates the common sub-trees rooted at n 1 and n 2
Kernel computed recursively in O(|N 1 | |N 2 |) – (n 1, n 2 )=0 If productions at n 1 and n 2 differ – (n 1, n 2 )=1 if n 1 and n 2 are POS nodes –Otherwise, #ch(n i ) is the number of children of node n i ch(n,k) is the k th child of node n, (0< <1) is a decay factor
Generative Models Approaches Earliest approach ( Leek 1997; Miller 1998 ). Instead of directly estimating model parameters for the conditional probability P(Y | X). Estimate model parameters for P(X | Y) and P(Y) from the training set Then apply Bayes rules to decide which label has the highest posterior probability. If the model fits the data then the generated likelihood ratio estimate is known to be optimal
Two Approaches Surveyed Probabilistic Context Free Grammars –Miller et al, 1998; Miller et al, 2000 Hidden Markov Models –Leek, 1997 –McCallum et al, 2000 –Ray and Craven, 2001; Skounakis, Craven, and Ray, 2003
PCFG-based Model
Miller et al, 1998/2000 From augmented representation learn a PCFG based on these trees. Infer the maximum likelihood estimates of the probabilities based on the frequencies in the training corpus, along with an interpolated adjustment of lower order estimates to handle the (increased) challenge of data sparsity. Parses of test cases that contain the semantic labels are predicted to be relation mentions.
Semi-Supervised Approaches ( Brin, 1998; Agichtein and Gravano, 2000 ) –Use token-based features –Apply resampling with replacement –Assume that relations in the training set are redundantly present and restated in test set. ( Shi et al, 2007 ) –Uses (Miller et al, 1998/2000) approach. –Uses a naïve baseline to convert unlabelled cases to true training cases.
Snowball’s Bootstrapping (Xia, 2006)
Unsupervised Use of Lexico- Syntactic Patterns Suggested initially by ( Hearst, 1992 ). Applied to relation detection by ( Pantel et al, 2004; Etzioni et al, 2005 ) Sample patterns: – such as, …, – like and – is a –, including Suited for the detection of TypeOf () subsumption relations over large corpora.
Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions
Benchmark Tasks Message Understanding Conference (MUC) –DARPA, (1989 – 1997), Newswire –TR task: Location_Of(ORG, LOC); Employee_of(PER, ORG); and Product_Of(ARTIFACT, ORG) Automatic Content Extraction (ACE) –NIST, (2002 – …), Newswire –Relation Mention Detection: ~5 major, ~24 minor rels –Physical(E 1,E 2 ); Social(Person x, Person y ); Employ(Org, Person); … Protein Localization Relation Extraction –SFU, (2006 – …) –SubcellularLocation(Organism, Protein, Location)
Message Understanding Conference 1997 Miller et al, 1998
ACE-2003
Prokaryote Protein Localization Relation Extraction (PPLRE) Task
Next Section Introduction Task Description Predictive Features Inductive Algorithms Benchmark Tasks Research Directions
1.Additional Features/Knowledge 2.Inter-sentential Relation Cases 3.Relations with More than Two Arguments 4.Grounding Entity Mentions to an Ontology 5.Qualifying the Certainty of a Relation Case 1.Additional Features/Knowledge 2.Inter-sentential Relation Cases 3.Relations with More than Two Arguments 4.Grounding Entity Mentions to an Ontology 5.Qualifying the Certainty of a Relation Case
Additional Features/Knowledge Expose additional features that can identify the more esoteric ways of expressing a relation. Features from outside of the “shortest-path”. –Challenge: past open-ended attempts have reduced performance ( Jiang and Zhi, 2007 ) –( Zhou et al, 2007 ) add heuristics for five common situations. Use domain-specific background knowledge. –E.g. Gram-positive bacteria (such as M. tuberculosis) do not have a periplasm therefore do not predict periplasm.
Inter-sentential Relation Cases Challenge: current approaches focus on syntactic features which cannot be extended beyond the sentence boundary. –Idea: apply Centering Theory ( Hirano et al, 2007 ) –Idea: create a text graph and to apply graph mining. Challenge: A significant increase in the proportion false relation cases. –Idea: a threshold on the number of pairings anyone entity mention can take.
Relations with > Two Arguments Idea: decompose the problem into a set of ( n – 1 ) binary relations and then join relation cases that share an entity mention ( Shi et al, 2007; Liu et al, 2007 ). –How to pick the ‘shared’ entity mention? –How much information is lost? Idea: Create a unified feature vector with features associated with each entity mention pair.
Shortened Reference List ACE Project, ( ). E. Agichtein, and L. Gravano. (2000). Snowball: Extracting Relations from Large Plain-Text Collections. In Proc. of DL-2000.Snowball: Extracting Relations from Large Plain-Text Collections D. E. Appelt, J. R. Hobbs, J. Bear, D. J. Israel, and M. Tyson. (1993). FASTUS: A Finite-state Processor for Information Extraction from Real-world Text. In Proc. IJCAI 1993.FASTUS: A Finite-state Processor for Information Extraction from Real-world Text. B. Beamer, S. Bhat, B. Chee, A. Fister, A. Rozovskaya, and R. Girju. (2007). UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals. In Proc. of the Fourth International Workshop on Semantic Evaluations (SemEval-2007).UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals. R. Bunescu, and R. J. Mooney. (2005). A Shortest Path Dependency Kernel for Relation Extraction. In Proc. of HLT/EMNLP-2005.A Shortest Path Dependency Kernel for Relation Extraction C. Cardie. (1997). Empirical Methods in Information Extraction. AI Magazine, 18(4).Empirical Methods in Information Extraction M. Craven, and J. Kumlien. (1999). Constructing Biological Knowledge-bases by Extracting Information from Text Sources. In Proc. of the International Conference on Intelligent Systems for Molecular Biology.Constructing Biological Knowledge-bases by Extracting Information from Text Sources. A. Culotta, and J. S. Sorensen. (2004). Dependency Tree Kernels for Relation Extraction. In Proc. of ACL-2004.Dependency Tree Kernels for Relation Extraction O. Etzioni, M. Cafarella, D. Downey, A. Popescu, T. Shaked, S. Soderland, D. S. Weld and A. Yates. (2005). Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence, 165(1).Unsupervised Named-Entity Extraction from the Web: An Experimental Study. K. Fundel, R. Kuffner, and R. Zimmer. (2007). RelEx--Relation Extraction Using Eependency Parse Trees. Bioinformatics. 23(3).RelEx--Relation Extraction Using Eependency Parse Trees R. Grishman, and B. Sundheim. (1996). Message Understanding Conference - 6: A Brief History. In Proc. of COLING-1996.Message Understanding Conference - 6: A Brief History. S. M. Harabagiu, C. A. Bejan and P. Morarescu. (2005). Shallow Semantics for Relation Extraction. In Proc. of IJCAI-2005.Shallow Semantics for Relation Extraction T. Hasegawa, S. Sekine, and R. Grishman. (2004). Discovering Relations among Named Entities from Large Corpora. In Proc. of ACL-2004.Discovering Relations among Named Entities from Large Corpora. J. Jiang and C. Zhai. (2007). A Systematic Exploration of the Feature Space for Relation Extraction. In Proc. of NAACL/HLT-2007.A Systematic Exploration of the Feature Space for Relation Extraction N. Kambhatla. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proc. of ACL-2004.Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations T.R. Leek. (1997). Information Extraction Using Hidden Markov Models. M.Sc. Thesis, University of California, San Diego.Information Extraction Using Hidden Markov Models Y. Liu, Z. Shi and A. Sarkar. (2007). Exploiting Rich Syntactic Information for Relation Extraction from Biomedical Articles. In Proc. of NAACL/HLT-2007.Exploiting Rich Syntactic Information for Relation Extraction from Biomedical Articles S. Miller, H. Fox, L. Ramshaw, and R. Weischedel. (2000). A novel use of statistical parsing to extract information from text. In Proc. of NAACL-2000.A novel use of statistical parsing to extract information from text S. Ray, and M. Craven. (2001). Representing Sentence Structure in Hidden Markov Models for Information Extraction. In Proc. IJCAI-2001.Representing Sentence Structure in Hidden Markov Models for Information Extraction. D. Roth, and W. Yih. (2002). Probabilistic Reasoning for Entity & Relation Recognition.. In Proc. of COLING-2002.Probabilistic Reasoning for Entity & Relation Recognition. Z. Shi. (2007). Ph.D. thesis. Forthcoming. Z. Shi, A. Sarkar and F. Popowich. (2007). Simultaneous Identification of Biomedical Named-Entity and Functional Relation Using Statistical Parsing Techniques. Proc. of NAACL/HLT-2007Simultaneous Identification of Biomedical Named-Entity and Functional Relation Using Statistical Parsing Techniques M. Skounakis, M. Craven and S. Ray. (2003). Hierarchical Hidden Markov Models for Information Extraction. In Proc. of IJCAI-2003.Hierarchical Hidden Markov Models for Information Extraction. F. M. Suchanek, G. Ifrim and G. Weikum. (2006). Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents. In Proc. of KDD Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents D. Zelenko, C. Aone, and A. Richardella. (2003). Kernel Methods for Relation Extraction. Journal of Machine Learning Research, Vol. 3.Kernel Methods for Relation Extraction M. Zhang, J. Su, D. Wang. G. Zhou and C. Lim. (2005). Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering. In Proc. of IJCNLP-2005.Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering M. Zhang, J. Zhang, and J. Su. (2006). Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel. In Proc. of HLT-2006.Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel S. Zhao, and R. Grishman. (2005). Extracting Relations with Integrated Information Using Kernel Methods. In Proc. of ACL-2005.Extracting Relations with Integrated Information Using Kernel Methods G. Zhou, M. Zhang, D. Ji and Q. Zhu. (2007). Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information. In Proc. of ACL-2007.Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information.
. The End
Backup Slides for Questions
Entity Mentions pre-Detected
Typed Semantic Relations require that the semantic relation’s arguments also be associated with a semantic class. For example, argument A1,2 may be associated with the semantic class ORGANIZATION.
Information Extraction vs. Relation Detection and Classification Some of the surveyed algorithms such as (Brin, 1998; Miller et al, 1998; Agichtein and Gravano, 2000, Suchanek et al, 2006) are presented in the literature as information extraction algorithms, not as relation detection and classification algorithms. They are included in the survey nonetheless because they can naturally be applied to the task of relation detection and classification. This situation is to be expected because the identification of relation mentions can be a natural preprocessing step to information extraction (ACE, ). IE = “populate a relational database table” (or to fill-in the slots of a template), where each record represents an instance of an entity or semantic relation in the domain.
Information Extraction Example corpus:
Information Extraction detects duplicate relation cases Relation Detection and Classification Information Extraction
What Task Name? Relation Extraction: Culotta and Sorensen, 2004; Harabagiu et al, 2005; Bunescu and Mooney, 2005; Zhang et al, 2005 and 2006; Jiang and Zhai, 2007; Xu et al, 2007; and Zhou et al, Relation Mention Detection (RMD): ACE Project, 2002 – Semantic Relation Identification: Beamer et al, Semantic Relation Classification: Girju et al, Relation Detection: Zhao and Grishman, 2005 Relation Discovery: Hasegawa et al, Relation Recognition: Roth and Yih, 2002.
Relation Case Generation Input: (D, R): A text document D and a set of semantic relations R with a arguments. Output: (C): A set of unlabelled semantic relation cases. Method: Identify all e entity mentions E i in D Create every combination of a entity mentions from the e mentions in the document (without replacement). –For intrasentential semantic relation detection and classification tasks, limit the entity mentions to be from the same sentence. –For typed semantic relation detection and classification tasks, limit the combinations to those where there is a match between the semantic classes of each of the entity mentions E i and the semantic class of their corresponding relation argument A i.
Relation Case Labeling
Naïve Baseline Algorithms Predict True: Always predicts “True” regardless of the contents of the relation case –Attains the maximum Recall by any algorithm on the task. –Attains the maximum F1 by any naïve algorithm. –Most commonly used naïve baseline. Predict Majority: Predicts the most prevalent class label in the training set. –Maximizes accuracy. –Degenerate to a “Predict False” algorithm. Predict (Biased) Random: Randomly predicts "True" with probability matching the distribution of "True" cases in the testing dataset, “False” otherwise. –Trades-off some Precision and Recall for additional Accuracy.
Prediction Outcome Labels true positive ( tp ) –predicted to have the label True and whose label is indeed True. false positive ( fp ) –predicted to have the label True but whose label is instead False. true negative ( tn ) –predicted to have the label False and whose label is indeed False. false negative ( fn ) –predicted to have the label False and whose label is instead True.
Shallow Parse Tree
Chunking-based Features A shallow syntactic analysis of a sentence that is fast and somewhat domain robust. (Abney, 1989) Within a Phrase (Ch.Phr) –Flag whether the two entity mentions are inside the same noun phrase, verb phrase or prepositional phrase. “Extracellular TcpQ is required for TCP biogenesis..” [NP Extracellular TcpQ] [VP is required] [PP for] [NP TCP biogenesis].
Shallow Parse Tree Features
Subsequences within the SPS.LCS Inform the classifier about the subsequences Two versions (Zelenko et al, 2003) –Contiguous: based on all the subtrees with n edges. –Non-contiguous (sparse): based on subtrees that allow gaps.
Dependency Parse Features (Dep)
Semantic Role Labeling
Overlap with SRL Structures Features extracted from a sentence’s semantic role labeling ( Harabagiu et al, 2005 ) The predicate argument number associated with the entity mention (A0, A1, A2, …). –E.g. Is an entity mention is associated with role A1? The verb associated with the argument (e.g. be, require). –E.g. the verb “be” is associated with the entity mention E i.
One Classifier or Many? Which is better: –One classifier for detection and classification, or at least two? –If at least two, then one multi-label classifier for each relation, or many binary classifiers (one per relation) –Current empirical evidence suggest One classifier for detection One classifier per relation, for classification
Miller et al’s example of semantic annotation
Hidden Markov Model-based One of the first statistical approaches applied to the task. Akin to use of a stochastic version of the finite state automata successfully used in the FASTUS system. Efficient algorithms exist for: –learning the model’s parameters from word sequences –computing a sequence’s probability given the model –finding the highest probability path through the model’s states. Challenge has been to include more features into the models. –(McCallum et al, 2000) include capitalization, formatting, and POS. –(Ray and Craven, 2001) added shallow-parse tree –(Skounakis el al, 2003) use of hierarchical HMMs to represent syntax.
( Brin, 1998 and Agichtein and Gravano, 2000 ) DIPRE and Snowball use resampling with replacement. Snowball: 1.Uses NER and classification to better restrict relation cases considered. 2.Uses word unigrams instead of the single feature of the word sequence; 3.Use a discriminative algorithm 4.Stop iterations based on a threshold on the Precision. Advantages: word-based patterns that make up its classifier can be inspected by a domain expert. –E.g. {, }. Challenges: –more than six thresholds need to be manually set. –experimental evidence does not support its bootstrapping approach.
Hidden Markov Model-based (cont.) Train two HMM models: –A model ( ) from positive cases –A null model ( ) from negative ones. Given a test case sequence S the probability P( | S) and P( | S) is computed. Once the log-odds for the prior probability of the relation sequence, , is calculated then each test case’s label is decided based on the log of the ratio of the probability of the case.
Hasegawa et al, 2004 Detect and classify all relation cases that require the same two argument types. E.g. R( PERSON, GEO-POLITICAL ENTITY ) –CitizenOf(), PresidentOf(), EnemyOf() Approach: –Use hierarchical clustering and a cosine similarity function. Clusters correspond to cases of the same relation. Cluster can be described by a small set of words that frequently appear in the cluster.
Global Inference Approaches An alternative to the pipelined approach Globally model all of the decisions in order to capture the mutual influences that exist with down stream decisions. Opportunity exists to repair incorrectly labeled entity mentions. For example, a typed relation detection algorithm could predict that an entity mention that is currently labeled as “GENE” is likely incorrect because a relation case that it participates requires that the argument be a “PROTEIN”. (Roth and Yih, 2002) propose to use the dependencies between relation and entity mentions to repair their labels. –First induce separate classifiers for entity detection and classification and for relation detection and classification. Any state-of-the-art supervised algorithm presented above could be used. –Next they perform global inference based on the conditional distributions of the two classifiers. (Miller et al, 2000) and (Shi et al, 2007) also perform a global inference and report a noticeable number of the mislabeled entity mentions.
ACE
Grounding Entity Mentions to an Ontology This step is essential for information extraction, but can be cumbersome and difficult to automate. E.g. A biologist would likely require the protein sequence (e.g. MKQSTIALAL …) not protein name (e.g. “ alkaline phosphatase ”). The sequence can be found in a master database such as Swiss-Prot, but at least two organisms have proteins with the same name. Idea: Use the relation information to disambiguate between ontology entries.
Qualifying the Certainty of a Relation Case It would be useful qualify the certainty that can be assigned to a relation mention. E.g. In the news domain, distinguish relation mentions based on first hand information versus those based on hearsay. Idea: Add an additional label to each relation case that qualifies the certainty of the statement. E.g. in the PPLRE task label cases with: “directly validated”, “indirectly validated”, “hypothesized”, and “assumed”.