Download presentation
Presentation is loading. Please wait.
Published byTracy Barber Modified over 9 years ago
1
Text Mining & NLP based Algorithm to populate ontology with A-Box individuals and object properties Alexandre Kouznetsov and Christopher J. O. Baker, University of New Brunswick, Saint John, joint work with Innovatia Inc April 13th, 2010 Motivation Ontologies can play a very important role in information systems, particularly in facilitating information retrieval and data integration. In this contribution we present a semi-automatic method for extracting information, specifically named entities and their relations, from texts and populating a domain ontology. While previous work has proposed solutions to extract named entities and populate them to classes an ontology, we are focused on the problem of accurately extracting and populating multiple relations between the same named entities and presenting them as distinct object properties between A-box individuals in an OWL-DL ontology. Methodology Ontology-based information retrieval applies Natural Language processing (NLP) to link text segments, named entities and relations between named entities to existing ontologies. In our algorithm we leverage a customized gazetteer list, including lists specific to object property synonyms and score A-box property candidates by using functions of distance between co- occurred terms. Using ontology reasoning we build Confidence Thresholds on A-box property candidate scores. A-box Property prediction and population based on these scores and thresholds. Multiple Relations Problem T-Box level A-Box level NLP Processor Confidence Level Processor Co-occurrence Based Scores generator Decision Rule Axioms – to - Thresholds convector Yes/No answer Should Current property be Populated? Current T-box Current A-box All related content Terms representation Scores Factors Thresholds Source Documents XML Pre processing Synonyms Lists Text Segments Processing Text Segments Separation Sentences Tables Bullet Lists Ontology unpopulated (OWL) Term List (Excel) Ontology Population Named Entities Single Relations Multi Relations Populated Ontology Using Ontology Reasoning Visualizing Visual Queries Connecting Recourses Algorithm for multi relation detection Semi-Automatic Ontology populating pipeline Algorithm main modules 1)NLP processor: to extract term(s) to represent each object property 2) Confidence Level processor: to convert settings to Threshold Factors 3) Axioms-to- Thresholds convector: to extract A-box related axioms and convert into decision boundary Thresholds on property candidate scores 4) Co-occurrence Based Scores generator: to calculate scores based on normalized distances between domain, range and property terms 5) Decision Rule to populate properties that obtained scores over Thresholds Acknowledgment We would like to thank Bradley Shoebottom for his help with Telecom knowledge engineering. Delivery Semi- Automatical Ontology populating pipeline prototype is under testing on BioMed (Lipids) and Telecom (Innovatia/Nortel) ontologies Implementation tools Java, OWLAPI, GATE/JAPE, PELLET Domain Class Man Range Class Woman hasSister hasMother Domain Instance Samuel Range Instance Mary hasSister ? hasMother ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.