Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction Patterns Improving Semi-Supervised Acquisition of Relation Extraction Patterns Mark A. Greenwood Mark Stevenson
22/07/06Information Extraction Beyond the Document Introduction and Overview Recently a number of semi-supervised approaches to acquiring Information Extraction (IE) patterns have been reported. Many of these approaches have used a iterative algorithms to learn new patterns from a small seed set. These approaches tend to be limited by their use of simplistic pattern representations, such as subject-verb-object (Yangarber et al., 2000) Other approaches to IE have used pattern representations derived from dependency trees: Sudo et al (2003) used patterns consisting of a path from a verb to any of its descendents (direct or indirect) - the chain model Bunescu and Mooney (2005) suggest the shortest path between the items being related.
22/07/06Information Extraction Beyond the Document Introduction and Overview These more complex pattern models: Are capable of representing more of the information present in text Require more complex methods of determining similarity between patterns which limit their use. We present a structural similarity measure inspired by kernel methods used in non-iterative learning algorithms (Culotta and Sorensen, 2004) This allows us to use more complex pattern models while retaining the semi-supervised iterative algorithm approach to acquiring new extraction patterns.
22/07/06Information Extraction Beyond the Document Learning Extraction Patterns Iterative Learning Algorithm 1.Begin with set of seed patterns which are known to be good extraction patterns 2.Compare every other pattern with the ones known to be good 3.Choose the highest scoring of these and add them to the set of good patterns 4.Stop if enough patterns have been learned, else repeat from step 2. Rank Patterns Candidates Seeds
22/07/06Information Extraction Beyond the Document Learning Extraction Patterns Such an algorithm requires for each IE task: Unannotated text from which to acquire patterns A small set of representative seed patterns Independent of the IE task this iterative algorithm requires : An extraction pattern model A measure of how similar two patterns are to each other
22/07/06Information Extraction Beyond the Document Extraction Patterns Linked chain model (Greenwood et al., 2005) used as pattern representation
22/07/06Information Extraction Beyond the Document Extraction Patterns
22/07/06Information Extraction Beyond the Document Structural Similarity Measure This similarity measure is inspired by the tree kernel proposed by Culotta and Sorensen (2004). It compares patterns by following their structure from the root nodes through the patterns until they diverge too far to be considered similar. Each node in a pattern has three features The word – n word The relation to a parent – n reln The part-of-speech (POS) tag – n pos Nodes can be compared by examining these features and by the semantic similarity between words.
22/07/06Information Extraction Beyond the Document Structural Similarity Measure A set of our functions F = {word, relation, pos, semantic} is used to compare nodes The first three correspond to the node features of the same name and return 1 if the value of the feature is equal for the two nodes and 0 otherwise. For example the pos function compares the values of the POS features for nodes n 1 and n 2 The semantic function return a value between 0 and 1 to signify the semantic similarity of the lexical items represented by the two nodes. We compute this using the WordNet (Fellbaum, 1998) similarity function introduced by Lin (1998).
22/07/06Information Extraction Beyond the Document Structural Similarity Measure The similarity of two nodes is zero if their POS tags are different, and otherwise is simply the sum of the scores provided by the four function from F. The similarity of a pair of linked chains l 1 and l 2 is given by: Where r 1 and r 2 are the root nodes of patterns l 1 and l 2 and C r is the set of children of node r.
22/07/06Information Extraction Beyond the Document Structural Similarity Measure The final part of the measure calculates the similarity between the child nodes of n 1 and n 2. As only the root nodes of the patterns have multiple children in all but the first application this formula simplifies to As the maximum similarity between two nodes is 4 we normalise by dividing the score by 4 times the size (in nodes) of the larger pattern to remove length bias.
22/07/06Information Extraction Beyond the Document Experiments - Overview We used the similarity measure in the iterative algorithm described earlier The four highest scoring patterns are accepted at each iteration Only if their score is within 0.95 of the highest scoring pattern We compare this approach with our (Stevenson and Greenwood, 2005) previous approach based on the vector space model and cosine similarity. Three separate configurations Cosine (SVO): uses the SVO model with the cosine similarity measure Cosine (Linked Chains): same as above but uses linked chains Structural (Linked Chains): uses linked chain patterns with the new structural similarity measure
22/07/06Information Extraction Beyond the Document Experiments - IE Scenario We use the data from the MUC-6 management succession task We use a sentence level version produced by Soderland (1999) This corpus contains four types of relation: Person-Person, Person- Post, Person-Organisation, and Post-Organisation At each iteration of the algorithm related items recognised by the current set of acquired patterns are extracted and evaluated. The texts have been previously annotated with named entities and MINIPAR is used to produce the dependency analysis.
22/07/06Information Extraction Beyond the Document Experiments – Seed Patterns COMPANY subj— appoint —obj PERSON COMPANY subj— elect —obj PERSON COMPANY subj— promote —obj PERSON PERSON subj— resign PERSON subj— depart PERSON subj— quit These seeds were choose due to their use in previously reported work. No tuning of this set was performed It should be noted that they do not contain the Person-Post or Post- Organisation relations
22/07/06Information Extraction Beyond the Document Results and Analysis The seed patterns achieve an F- measure of (P=0.833, R=0.022) Cosine Similarity performs poorly irrespective of the pattern model Linked chains perform better than SVO under this similarity measure which suggests the model is inherently superior Best result is the combination of linked chains and the structural similarity measure, F- measure of (P=0.434, R=0.265) after 190 iterations
22/07/06Information Extraction Beyond the Document Results and Analysis IterationCosine (SVO)Cosine (Linked)Structural (Linked) #PRFPRFPRF
22/07/06Information Extraction Beyond the Document Conclusions The results show that semi-supervised approaches to IE pattern acquisition benefit from the use of more expressive extraction pattern models. Using linked chains resulted in better performance than using SVO even when using the same similarity measure Similarity measures (such as kernel methods) developed for supervised learning can be adapted and applied to semi- supervised approaches. Future work should look at other similarity functions used in supervised learning to see if they can also be adapted for use with semi-supervised approaches. The structural similarity measure introduced here outperforms a previously proposed method based on cosine similarity and a vector space representation.
22/07/06Information Extraction Beyond the Document Any Questions?
22/07/06Information Extraction Beyond the Document Bibliography Razvan Bunescu and Raymond Mooney A Shortest Path Dependency Kernel for Relation Extraction. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages , Vancouver, B.C. Aron Culotta and Jeffery Sorensen Dependency Tree Kernels for Relation Extraction. In 42 nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain. Christiance Fellbaum, editor WordNet: An Electronic Lexical Database and some of its Applications. MIT Press, Cambridge, MA. Dekang Lin An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), Madison, Wisconsin. Dekang Lin MINIPAR: A Minimalist Parser. In Maryland Linguistics Colloquium, University of Maryland, College Park. Stephen Soderland Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 31(1-3): Mark Stevenson and Mark A. Greenwood. A Semantic Approach to IE Pattern Induction. In Proceedings of the 43 rd Annual Meeting of the Association for Computational Linguistics, pages , Ann Arbor, MI. Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In Proceedings of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages , Sapporo, Japan. Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja Huttenen Automatic Acquisition of Domain Knowledge for Information Extraction. In Proceedings of the 18 th International Conference on Computational Linguistics (CLOING 2000), pages , Saarbrücken, Germany.