Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Semantic Concept Relations

Similar presentations


Presentation on theme: "Extracting Semantic Concept Relations"— Presentation transcript:

1 Extracting Semantic Concept Relations
from Wikipedia Patrick Arnold, Erhard Rahm Leipzig University, Germany 4th International Conference on Web Intelligence, Mining and Semantics

2 Extracting Semantic Concept Relations From Wikipedia
1. Introduction Background Knowledge: Crucial and effective strategy for schema/ontology matching Dictionaries, thesauri, domain-specific ontologies Especially helpful where generic strategies reach their limits string-based, structural, instance-based, probabilistic etc. Exploited by several approaches S-Match, TaxoMap, ASMOV, ... 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

3 Extracting Semantic Concept Relations From Wikipedia
1. Introduction 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

4 Extracting Semantic Concept Relations From Wikipedia
1. Introduction Problems of present background knowledge sources: Limited amount of high-quality resources Limited Scope WordNet: 156,000 words (117,000 nouns) Currentness WordNet: Latest version from 2006 Often focus on instance data, not on concept data Like DBpedia, FreeBase, Yago 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

5 Extracting Semantic Concept Relations From Wikipedia
1. Introduction Our contributions Extract semantic concept relations from Wikipedia articles Store them in a repository (thesaurus) Exploit repository as additional background knowledge source for matching tasks Benefits of Wikipedia Very extensive (defines practically any common noun) Free access High text quality Up-to-date 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

6 Extracting Semantic Concept Relations From Wikipedia
1. Introduction General Idea Find semantic patterns in definition sentence Find the concepts that are linked by these patterns Build the semantic relations Determine the following relations: equals is-a has-a part-of refers-to 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

7 Extracting Semantic Concept Relations From Wikipedia
1. Introduction 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

8 Extracting Semantic Concept Relations From Wikipedia
Agenda 1. Introduction 2. Workflow Overview 3. Workflow Details 4. Evaluation 5. Conclusions and Next Steps 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

9 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview General Workflow: Download Wikipedia dump and extract all articles Process each article and extract the semantic relations Insert the relations in a repository (graph database) Running example: Stationery cabinet 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

10 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 1: Extract first sentence of an article A stationery cabinet (sometimes referred to as a stationery cupboard) is a large steel cabinet with shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

11 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 2: Perform some preprocessing POS-Tagging, parenthesis removal etc. A stationery cabinet (sometimes referred to as a stationery cupboard) is a large steel cabinet with shelves inside, used for storing a variety of items. A_DT stationery_NN cabinet_NN, sometimes_NNS referred_VBD to_TO as_IN a_DT stationery_NN cupboard)_NN … 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

12 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 3: Detect the semantic relation patterns Pattern puts two terms into a specific relation (is-a or part-of) A stationery cabinet, sometimes referred to as a stationery cupboard, is a large steel cabinet with shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

13 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 4: Split sentence at the patterns Patterns are not part of the fragments A stationery cabinet, sometimes referred to as a stationery cupboard, is a large steel cabinet with shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

14 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 5: Find the concepts in each sentence fragment. A stationery cabinet, sometimes referred to as a stationery cupboard, is a Subject Concepts large steel cabinet with Object Concepts shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

15 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 6: Build the semantic relations Perform some post-processing (stemming etc.) Subjects Pattern 1 1st Level Objects Pattern 2 2nd Level Objects stationery cabinet, stationery cupboard IS A steel cabinet HAS A shelf Subject Relation Object stationery cabinet EQUAL stationary cupboard IS A steel cabinet stationery cupboard HAS A shelf 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

16 Extracting Semantic Concept Relations From Wikipedia
2. Workflow Overview Step 7: Add relations to the repository 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

17 Extracting Semantic Concept Relations From Wikipedia
Agenda 1. Introduction 2. Workflow Overview 3. Workflow Details 4. Evaluation 5. Conclusions and Next Steps 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

18 3. Workflow Details – Pattern Detection
Pattern detection: Using Finite State Machines Parse sentence word-by-word Check, whether the current word is an anchor term of the FSM If word is anchor term, use FSM to extract the full pattern Pattern is determined if final state is reached FSM is able to determine most is-a, has-a and part-of patterns is a is typically a is one of several is generally any form of is used as a is a variety of the many is defined as a as part of a used in within a having a with a consisting of 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

19 3. Workflow Details – Pattern Detection
Example: is a specific form of 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

20 3. Workflow Details – Concept Detection
Concept detection: Similar approach More complex FSM Detect multiple terms in a fragment Distinguish concept nouns from additional nouns Expressions like „in the context of“ Local Information like „British English“ Field references Field References: Describe the domain of the article Suggest that the subject refers to this field Occur only occasionally 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

21 3. Workflow Details – Concept Detection
11/30/2018 Extracting Semantic Concept Relations From Wikipedia

22 3. Workflow Details – Concept Detection
11/30/2018 Extracting Semantic Concept Relations From Wikipedia

23 Extracting Semantic Concept Relations From Wikipedia
Agenda 1. Introduction 2. Workflow Overview 3. Workflow Details 4. Evaluation 5. Conclusions and Next Steps 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

24 Extracting Semantic Concept Relations From Wikipedia
4. Evaluation Tested our approach on 4 manually generated benchmarks Each benchmark is a complete Wikipedia category or article list Tested our approach in different domains 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

25 Extracting Semantic Concept Relations From Wikipedia
4. Evaluation Not all articles are ”parsable“ Some articles do not contain any semantic relation pattern Example: Hutchinson‘s triad is named after Sir Jonathan Hutchinson. In our evaluation we only regard the parsable articles Benchmark Domain Articles Parsable Articles Furniture General 186 169 Infectious Diseases Medical 107 91 Optimization Algorithms Comp. Science 122 113 Vehicles 94 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

26 Extracting Semantic Concept Relations From Wikipedia
4. Evaluation Two tests: How many parsable articles could be fully processed? Detect at least 1 semantic pattern Determine at least 1 subject and 1 object How many relations were detected? How many were correct? 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

27 Extracting Semantic Concept Relations From Wikipedia
4. Evaluation Number of processed articles We can handle 74 – 96 % of the parsable articles in the benchmarks General domains slightly better than specific domains Extracted pattern mostly correct (precision: 96 – 100 %) Benchmark Parsable Articles Actually processed Recall Furniture 169 148 88 % Infectious Diseases 91 80 Optimization Algorithms 113 84 74 % Vehicles 87 96 % 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

28 Extracting Semantic Concept Relations From Wikipedia
4. Evaluation Number of extracted relations We can extract 64 – 76 % of all relations encoded in the articles 74 – 81 % of the extracted relations are correct Benchmark Containing relations Correctly extracted Falsely Recall Precision Furniture 497 373 87 75 % 81 % Infectious Diseases 323 206 67 64 % 76 % Optimization Algorithms 182 137 49 74 % Vehicles 413 280 66 68 % 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

29 Extracting Semantic Concept Relations From Wikipedia
4. Evaluation Some insights... We extracted 1.2 – 3.1 relations per parsable article Average: 2.1 Most articles contain 1 is-a pattern Some provide an additional has-a or part-of pattern Subsumption relations occur most frequently Maximum outcome of a single article was 28 relations 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

30 5. Conclusions and Next Steps
Wikipedia Article processing relatively successful Regular structure of definition sentences Not a 100 % precision, but acceptable for schema/ontology matching Allows extraction of large amount of information About 2 relations/article Next Steps Integrate concept relations in repository Exploit repository in mapping enrichment and/or matching Include further sources in the repository Wiktionary Existing benchmarks (mapping re-use) ... 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

31 Extracting Semantic Concept Relations From Wikipedia
Thank you 11/30/2018 Extracting Semantic Concept Relations From Wikipedia


Download ppt "Extracting Semantic Concept Relations"

Similar presentations


Ads by Google