Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

Similar presentations


Presentation on theme: "Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories."— Presentation transcript:

1 Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1 Dept. Of Computer Science New York University

2 Introduction Internet search engines cannot answer complicated questions. “ a list of recent mergers and acquisitions of companies ” “ current leaders of nations from all over the world ” Information Extraction provides methods to extract information such as events and relations between entities. Domain dependent The goal is to automatically discovering useful relations among arbitrary entities in large text corpora.

3 Introduction Define a relation broadly as an affiliation, role, location, part- whole, social relationship and so on. Information should be extracted: “ George Bush (PERSON) was inaugurated as the president of the United States (GPE). ” Unsupervised method does not need richly annotated corpora and any instances as initial seeds for weakly supervised learning. Since we cannot know the relations in advance. Only need a NE tagger. Recently developed NE tagger work quite well.

4 Prior Work Most of approaches to the ACE RDC task involved supervised learning such as kernel methods. Large annotated corpora needed Some adopted a weakly supervised learning approach. It is unclear how to choose and how many initial seeds needed.

5 Relation Discovery Overview Assume that pairs of entities occurring in similar context can be clustered and each pair in a cluster is an instance of the relation. 1. Tag NE in text corpora 2. Get co-occurrence pairs of NE and their context 3. Measure context similarities among pairs of NEs. 4. Make clusters of pairs of NEs. 5. Label each cluster of pairs of NEs. Run NE tagger, get all context words within a certain distance; if context words of A-B and C-D pair are similar, these two pairs are placed into the same cluster(the same relation), in this case the relation is merger and acquisition.

6 Relation Discovery

7 NE tagging use the extended NE tagger(Sekine, 2001) to detect useful relations. Collect intervening words between two NEs for each co- occurrence. Two NEs are considered to co-occur if they appear within the same sentence and separated by at most N intervening words. Different orders are considered as different contexts. That is, e 1 …e 2 and e 2 …e 1 are collected as different contexts. Passive voice : collect the base forms of words which are stemmed by a POS tagger, but verb past participles are distinguished from other verb forms. Less frequent pairs of NEs should be eliminated. Set a frequency threshold

8 Relation Discovery Calculate similarity between the set of contexts of NE pairs. Vector space model and cosine similarity Only compare NE pairs which have the same types, e.g., one PERSON-GPE pair and another PERSON-GPE pair. Eliminate stop words, words in parallel expressions, and expressions peculiar to particular source documents. A context vector for each NE pair consists of the bag of words formed from all intervening words from all co-occurrences of two NEs. Different orders: if a word w i occurred L times in e 1 …e 2, M times in e 2 …e 1, the tf i of w i is defined as L-M. If the norm |α| is small due to the lack of context words, the similarity might be unreliable, so define a threshold to eliminate short context vectors.

9 Relation Discovery We can cluster the NE pairs base on the similarity among context vectors of them. We do not know the # of clusters in advance so we adopt hierarchical clustering. Using complete linkage Label the cluster with the most frequent word in all combinations of the NE pairs in the same cluster. The frequencies are normalized.

10 Experiments Experiment with one year of The New York Times(1995) as our corpus. Maximum context word length to 5 words Frequency threshold to 30 Use the patterns, “,.*,”, “and” and “or” for parallel expression, “) --” as peculiar to The New York Times. Stop words include symbols and words which occurred 100000 as frequent words.

11 Experiments Analyze the data set manually and identified the relations for two domains. PERSON-GPE : 177 distinct pairs, 38 classes(relations). COMPANY-COMPANY : 65 distinct pairs, 10 classes.

12 Evaluation The errors in NE tagging were eliminated to evaluate correctly. For each cluster, determine the relation R ( major relation ) of the cluster as the most frequently represented relation. NE pairs with relation R in a cluster whose major relation was R were counted as correct. N correct defined as total # of correct pairs in all clusters. N incorrect defined as total # of incorrect pairs in all clusters. N key defined as total # of pairs manually classified in clusters.

13 Evaluation These values vary depending on the threshold of cosine similarity. The best F-measure was 82 in the PER-GPE and 77 in the COM- COM domain, found near 0 cosine similarity threshold. Generally it is difficult to determine the threshold in advance. P F R P F R

14 Evaluation We also investigate each cluster with threshold just above 0. 34 PER-GRE clusters and 15 COM-COM clusters. 80 and 75 F-measure, very close to the best. The larger clusters for each domain and the ratio of # of pairs bearing the major relation to the total # of pairs is shown.

15 Evaluation If two NE pairs in a cluster share a particular context word, they are considered to be linked (with respect to this word). The relative frequency for a word is the # of such links, relative to the maximal possible number of links( N(N-1)/2 for a cluster). If the relative frequency is 1.0, this word is shared by all NE pairs. The frequent common words could be regarded as suitable labels for the relations.

16 Discussion The performance was a little higher in the PER-GRE domain perhaps because there were more NE pairs with high similarity. The COM-COM domain was more difficult to judge due to the similarity of relations. The pair of companies in M&A relation might also subsequently appear in the parent relation. Asymmetric properties caused more difficulties in the COM-COM domain. In determing similarity A→B with C→D and A→B with D→C, sometimes the wrong correspondence ends up being favored.

17 Discussion The main reason for undetected or mis-clustered NE pairs is the absence of common words in the pairs ’ context which explicitly represent the particular relations. Mis-clustered NE pairs were clustered by accidental words. The outer context words may be helpful while extending context in this way have to be carefully evaluated.

18 Discussion We tried single linkage and average linkage as well. The best F-measure is in complete linkage. The best threshold differs in the single and average linkage. The best threshold just above 0 means that each pair in a cluster shares at least one word in common. Sometimes the less frequent pairs might be in valuable, and one way to address this defect would be through bootstrapping.

19 Conclusion The key idea is to cluster pairs pairs of NEs according to the similarity of the context words intervening between them. Experiments show that not only the relations could be detected with high recall and precision, but also the labels could be automatically provided. We are planning to discover less frequent pairs of NEs by combining with bootstrapping.


Download ppt "Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories."

Similar presentations


Ads by Google