Presentation is loading. Please wait.

Presentation is loading. Please wait.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Similar presentations


Presentation on theme: "4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1."— Presentation transcript:

1 4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1

2 The Problem Relate extracted entities – unstructured text not partitioned into records Various competitions – MUC – ACE – BioCreAtIvE II Protein-Protein Interaction 9/7/2012CS 652, Peter Lindes2

3 Groups of Relationships ACE: – located at, near, part, role, social for entities: – person, organization, facility, location, and geo- political entity Biomedical: gene-disease, protein-protein, subcellular regularizations NAGA knowledge base: 26 relationships such as: isA, bornInYear, establishedInYear, hasWonPrize, locatedIn, politicianOf, … 9/7/2012CS 652, Peter Lindes3

4 Three Problem Levels First case: – Entities preidentified in unstructured text – Given a pair of entities, find type of relationship Second case: – Given relationship type r, entity name e – Extract entities with which e has relationship r Third case: – Open-ended corpus – the web – Given relationship type r, find entity pairs 9/7/2012CS 652, Peter Lindes4

5 Given Entity Pair, Find Relationship R: set of relationship types : R plus a special member for “other” x: a “snippet” of text (might be a sentence) E 1 and E 2 in x Identify relationships in between E 1 and E 2 Resources available: – Surface Tokens – Part of Speech tags – Syntactic Parse Tree Structure – Dependency Graph Use these clues to classify (x, E 1, E 2 ) into one of 9/7/2012CS 652, Peter Lindes5

6 Parse Tree 9/7/2012CS 652, Peter Lindes6

7 Dependency Graph 9/7/2012CS 652, Peter Lindes7

8 Methods to Extract Relationships Feature-based methods – String form, orthographic type, POS tag, etc. – Features from Dependency Graph – Features from Word Sequence – Features from Parse Trees Kernel-based methods – Kernel function K(X, X’) captures similarity – Support Vector Machine (SVM) classifier Rule-based methods 9/7/2012CS 652, Peter Lindes8

9 Given Relationship, Find Entity Pairs Given one or more relationship types Find all occurrences in a corpus Open document collection No labeled unstructured training data Instead, seeding for each relationship type is used 9/7/2012CS 652, Peter Lindes9

10 Seed Data for Relationship Type r The types of entities that are arguments of r – Often specified at a high level, eg. proper noun, common noun, numeric, etc. – Types such as “Person” or “Company” require patterns to recognize them A seed database S of entities that have r – May include negative examples A seed set or manually coded patterns – Easy for generic relationships, eg. hypernym or meronym (part-of) 9/7/2012CS 652, Peter Lindes10

11 3 Steps for Relationship Extraction Start with above seeding data – A corpus D – Relationship types r 1,…,r k – Entity types T r1, T r2 for each r – A set S of examples (E i1,E i2,r i ) 1 ≤ i ≤ N 1: Use S to learn extraction patterns M 2: Use a subset of patterns to create candidates 3: Validation: select a subset based on statistical tests 9/7/2012CS 652, Peter Lindes11

12 Example Data Relationships: “IsPhDAdvisorOf”, “Acquired” Entity types: “(Person, Person)”, “(Company, Company)” 9/7/2012CS 652, Peter Lindes12

13 Learn Patterns from Seed Triples Assume only one relationship for each pair Thus each example for r is negative for r’ 1: Find sentences with entity pairs – For (E 1,E 2,r) query for “E 1 NEAR E 2 ” – Filter out where E 1, E 2 don’t match T r1, T r2 2: Filter sentences for the relationship 3: Learn patterns from sentences 9/7/2012CS 652, Peter Lindes13

14 Filtering Sentences Example: Banko: a simple heuristic using the length of dependency links This fails for above example 9/7/2012CS 652, Peter Lindes14

15 Learn Patterns from Sentences Formulate as a standard classification problem Two practical problems: – No guarantee of positive examples Bunescu and Mooney: use SVM – Many sentences for each pair Bunescu and Mooney: down-weight correlated terms 9/7/2012CS 652, Peter Lindes15

16 Extract Candidate Entity Pairs Learned model M: (x,E 1,E 2 ) -> r Simple method: sequential scan over D – Look for T r1, T r2, then apply M Large, indexed corpus: retrieve relevant sentences – Use keyword search Pattern-based Keyword-based Agichtein and Gravano: iterative solution 9/7/2012CS 652, Peter Lindes16

17 Validate Extracted Relationships Extraction has high error rates Validation based on corpus-wide statistics Probabilities based on count of occurrences – Extract only high-confidence relationships Rare relationships: – Use contextual pattern – Alternative: correct entity boundary errors 9/7/2012CS 652, Peter Lindes17

18 Summary Setting 1: entities already marked – Feature-based and kernel-based methods – Clues from word sequence, parse trees, and dependency graphs – Training data with labeled relationships Setting 2: open corpus, given relationship types – No labeled unstructured data – Seed database of (E 1,E 2,r) examples – Bootstrapping from seed data – Filter based on relevancy Accuracy: – 50%-70% for closed benchmark datasets – Lots of special case handling for the web 9/7/2012CS 652, Peter Lindes18

19 Further Readings Concentrated here on binary relationships Natural extension: records with multi-way relationships Requires cross-sentence analysis: – Co-reference resolution – Discourse analysis Much literature on this topic Future research: discovering relevant relationship types 9/7/2012CS 652, Peter Lindes19


Download ppt "4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1."

Similar presentations


Ads by Google