Extracting Semantic Concept Relations

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Curriculum links Adult Literacy Rw/L2.3 Recognise and understand vocabulary associated with texts of different levels of accessibility, formality, complexity,
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Aki Hecht Seminar in Databases (236826) January 2009
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Software Engineer Report What should contains the report?!
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
A Semi-automatic Ontology Acquisition Method for the Semantic Web Man Li, Xiaoyong Du, Shan Wang Renmin University of China, Beijing WAIM May 2012.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
1 Discussion of “Computer- Assisted Tools for Auditing XBRL- Related Documents” Symposium on Information Integrity & Information Systems Assurance David.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
1 Ontology Evolution within Ontology Editors Presentation at EKAW, Sigüenza, October 2002 L. Stojanovic, B. Motik FZI Research Center for Information Technologies.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Wolfgang Runte Slide University of Osnabrueck, Software Engineering Research Group Wolfgang Runte Software Engineering Research Group Institute.
Automatic Writing Evaluation
Linguistic Graph Similarity for News Sentence Searching
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Introduction to MarcEdit
Genomics research paper presentation
Web News Sentence Searching Using Linguistic Graph Similarity
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Presented by: Hassan Sayyadi
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Patrick Arnold, DBS-Oberseminar in December 2013
Clustering Semantically Enhanced Web Search Results
Presented by: Prof. Ali Jaoua
Enriching Structured Knowledge with Open Information
Introduction Task: extracting relational facts from text
[jws13] Evaluation of instance matching tools: The experience of OAEI
Searching and browsing through fragments of TED Talks
IL Step 2: Searching for Information
DBpedia 2014 Liang Zheng 9.22.
Text Mining & Natural Language Processing
Danyun Xu, Gong Cheng*, Yuzhong Qu
MOMA - A Mapping-based Object Matching System
Chaitali Gupta, Madhusudhan Govindaraju
Semantic Enrichment of Ontology Mappings
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Extracting Semantic Concept Relations from Wikipedia Patrick Arnold, Erhard Rahm Leipzig University, Germany 4th International Conference on Web Intelligence, Mining and Semantics

Extracting Semantic Concept Relations From Wikipedia 1. Introduction Background Knowledge: Crucial and effective strategy for schema/ontology matching Dictionaries, thesauri, domain-specific ontologies Especially helpful where generic strategies reach their limits string-based, structural, instance-based, probabilistic etc. Exploited by several approaches S-Match, TaxoMap, ASMOV, ... 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 1. Introduction 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 1. Introduction Problems of present background knowledge sources: Limited amount of high-quality resources Limited Scope WordNet: 156,000 words (117,000 nouns) Currentness WordNet: Latest version from 2006 Often focus on instance data, not on concept data Like DBpedia, FreeBase, Yago 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 1. Introduction Our contributions Extract semantic concept relations from Wikipedia articles Store them in a repository (thesaurus) Exploit repository as additional background knowledge source for matching tasks Benefits of Wikipedia Very extensive (defines practically any common noun) Free access High text quality Up-to-date 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 1. Introduction General Idea Find semantic patterns in definition sentence Find the concepts that are linked by these patterns Build the semantic relations Determine the following relations: equals is-a has-a part-of refers-to 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 1. Introduction 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia Agenda 1. Introduction 2. Workflow Overview 3. Workflow Details 4. Evaluation 5. Conclusions and Next Steps 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview General Workflow: Download Wikipedia dump and extract all articles Process each article and extract the semantic relations Insert the relations in a repository (graph database) Running example: Stationery cabinet 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 1: Extract first sentence of an article A stationery cabinet (sometimes referred to as a stationery cupboard) is a large steel cabinet with shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 2: Perform some preprocessing POS-Tagging, parenthesis removal etc. A stationery cabinet (sometimes referred to as a stationery cupboard) is a large steel cabinet with shelves inside, used for storing a variety of items. A_DT stationery_NN cabinet_NN, sometimes_NNS referred_VBD to_TO as_IN a_DT stationery_NN cupboard)_NN … 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 3: Detect the semantic relation patterns Pattern puts two terms into a specific relation (is-a or part-of) A stationery cabinet, sometimes referred to as a stationery cupboard, is a large steel cabinet with shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 4: Split sentence at the patterns Patterns are not part of the fragments A stationery cabinet, sometimes referred to as a stationery cupboard, is a large steel cabinet with shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 5: Find the concepts in each sentence fragment. A stationery cabinet, sometimes referred to as a stationery cupboard, is a Subject Concepts large steel cabinet with Object Concepts shelves inside, used for storing a variety of items. 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 6: Build the semantic relations Perform some post-processing (stemming etc.) Subjects Pattern 1 1st Level Objects Pattern 2 2nd Level Objects stationery cabinet, stationery cupboard IS A steel cabinet HAS A shelf Subject Relation Object stationery cabinet EQUAL stationary cupboard IS A steel cabinet stationery cupboard HAS A shelf 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 2. Workflow Overview Step 7: Add relations to the repository 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia Agenda 1. Introduction 2. Workflow Overview 3. Workflow Details 4. Evaluation 5. Conclusions and Next Steps 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

3. Workflow Details – Pattern Detection Pattern detection: Using Finite State Machines Parse sentence word-by-word Check, whether the current word is an anchor term of the FSM If word is anchor term, use FSM to extract the full pattern Pattern is determined if final state is reached FSM is able to determine most is-a, has-a and part-of patterns is a is typically a is one of several is generally any form of is used as a is a variety of the many is defined as a as part of a used in within a having a with a consisting of 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

3. Workflow Details – Pattern Detection Example: is a specific form of 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

3. Workflow Details – Concept Detection Concept detection: Similar approach More complex FSM Detect multiple terms in a fragment Distinguish concept nouns from additional nouns Expressions like „in the context of“ Local Information like „British English“ Field references Field References: Describe the domain of the article Suggest that the subject refers to this field Occur only occasionally 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

3. Workflow Details – Concept Detection 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

3. Workflow Details – Concept Detection 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia Agenda 1. Introduction 2. Workflow Overview 3. Workflow Details 4. Evaluation 5. Conclusions and Next Steps 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 4. Evaluation Tested our approach on 4 manually generated benchmarks Each benchmark is a complete Wikipedia category or article list Tested our approach in different domains 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 4. Evaluation Not all articles are ”parsable“ Some articles do not contain any semantic relation pattern Example: Hutchinson‘s triad is named after Sir Jonathan Hutchinson. In our evaluation we only regard the parsable articles Benchmark Domain Articles Parsable Articles Furniture General 186 169 Infectious Diseases Medical 107 91 Optimization Algorithms Comp. Science 122 113 Vehicles 94 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 4. Evaluation Two tests: How many parsable articles could be fully processed? Detect at least 1 semantic pattern Determine at least 1 subject and 1 object How many relations were detected? How many were correct? 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 4. Evaluation Number of processed articles We can handle 74 – 96 % of the parsable articles in the benchmarks General domains slightly better than specific domains Extracted pattern mostly correct (precision: 96 – 100 %) Benchmark Parsable Articles Actually processed Recall Furniture 169 148 88 % Infectious Diseases 91 80 Optimization Algorithms 113 84 74 % Vehicles 87 96 % 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 4. Evaluation Number of extracted relations We can extract 64 – 76 % of all relations encoded in the articles 74 – 81 % of the extracted relations are correct Benchmark Containing relations Correctly extracted Falsely Recall Precision Furniture 497 373 87 75 % 81 % Infectious Diseases 323 206 67 64 % 76 % Optimization Algorithms 182 137 49 74 % Vehicles 413 280 66 68 % 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia 4. Evaluation Some insights... We extracted 1.2 – 3.1 relations per parsable article Average: 2.1 Most articles contain 1 is-a pattern Some provide an additional has-a or part-of pattern Subsumption relations occur most frequently Maximum outcome of a single article was 28 relations 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

5. Conclusions and Next Steps Wikipedia Article processing relatively successful Regular structure of definition sentences Not a 100 % precision, but acceptable for schema/ontology matching Allows extraction of large amount of information About 2 relations/article Next Steps Integrate concept relations in repository Exploit repository in mapping enrichment and/or matching Include further sources in the repository Wiktionary Existing benchmarks (mapping re-use) ... 11/30/2018 Extracting Semantic Concept Relations From Wikipedia

Extracting Semantic Concept Relations From Wikipedia Thank you 11/30/2018 Extracting Semantic Concept Relations From Wikipedia