Aramis Rodríguez Blanco

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Logics for Data and Knowledge Representation Projects and thesis introduction.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
The Semantic Web. The Web Today Designed for Human to read Cannot express meaning Architecture: URL –Decentralized: Link structure Language: html.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
NAMESPACES … and ontologies. Namespaces The goal is to ensure that domains with similar characteristics use a shared vocabulary as much as possible XML.
The Web of data with meaning... By Michael Griffiths.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Automatic Web Page Categorization by Link and Context Analysis Giuseppe Attardi Antonio Gulli Fabrizio Sebastiani.
The Semantic Web – WEEK 5: RDF Schema + Ontologies The “Layer Cake” Model – [From Rector & Horrocks Semantic Web cuurse]
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Nancy Ide Vassar College USA Resource Definition Framework A Tutorial EUROLAN 2003 July 28 - August 8 Bucharest - Romania.
Information Retrieval – and projects we have done. Group Members: Aditya Tiwari ( ) Harshit Mittal ( ) Rohit Kumar Saraf ( ) Vinay.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Knowledge Discovery in Ontology Learning A survey.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Survey of Semantic Annotation Platforms
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Logics for Data and Knowledge Representation
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Semantic Web - an introduction By Daniel Wu (danielwujr)
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Text Mining & NLP based Algorithm to populate ontology with A-Box individuals and object properties Alexandre Kouznetsov and Christopher J. O. Baker, University.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Problems with XML & XML Schemas XML falls apart on the Scalability design goal. 1.The order in which elements appear in an XML document is significant.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
Chapter 5 The Semantic Web 1. The Semantic Web  Initiated by Tim Berners-Lee, the inventor of the World Wide Web.  A common framework that allows data.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Karen Wickett School of Information University of Texas at Austin
Semantic Parsing for Question Answering
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Presented by: Hassan Sayyadi
Natural Language Processing (NLP)
University of Computer Studies, Mandalay
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Social Knowledge Mining
Extracting Semantic Concept Relations
How to publish in a format that enhances literature-based discovery?
Text Mining & Natural Language Processing
CS246: Information Retrieval
Natural Language Processing (NLP)
Filtering Properties of Entities By Class
Semantic-Web, Triple-Strores, and SPARQL
Natural Language Processing (NLP)
Presentation transcript:

Aramis Rodríguez Blanco Extraction of Linked Data from unstructured information applying PLN techniques and Ontologies Authors: Aramis Rodríguez Blanco Alfredo Simón Cuevas Wenny Hojas Mazo Jose M. Perea Ortega

Introduction Rising of smart applications: Semantic search Popularity of web technologies Web content reorganized in semantic data Web focused on documents Web focused on interconnected data Linked Data Search, analysis, knowledge engineering, integration, information retrieval

Problematic Linked Data Extracted from: Semi-structured sources: Wikipedia (DBpedia), Structured sources: databases, Unstructured sources: text, English approaches, Information extraction limitations,

Proposed approach Extraction of Linked Data from unstructured text in English and Spanish, based on automatic extraction of conceptualization from text as Concept Map, transformed to RDF data model. Concept Map Ontology

Conceptualization Extraction Proposed approach phases Pre-processing (NLP techniques). Extraction of conceptualization (NLP techniques and ontologies). Generation of RDF Model (rules based on patterns). HTML Text Pre-processing Plain text extraction Language recognition Segmentation POS Tagging Morpho-Syntactic Analysis Conceptualization Extraction OWL Ontology Concepts Extraction Links Extraction Normalization RDF triples RDF Data Model Generation

Pre-processing Text extraction from documents (pdf,txt,doc,html). Text segmentation (by sentences). Language recognition (English or Spanish). Syntactic analysis of text (using FreeLing parser). Token segmentation (separate words). Morphological syntactic analysis (extract lemma and POS tagging). Shallow parsing analysis (group tokens in noun groups, adjetive groups, verb groups, etc.). Deep parsing analysis (dependency tree).

Conceptualization extraction Concept extraction Text mapping of terms represented as classes or instances in OWL ontology. Use of predefined patterns which are applied on noun or adjective groups (Patterns were defined from DBpedia ontology). Patterns Examples NN processes Z NN Five files NNP Ernest Hemingway JJ impressive JJ NN Informatics system JJ JJ NN Operative logical processor NN NNP Chief Alfred Z JJ NN Three efficient methods Legend: NN: common noun; NNP: proper noun; Z: number or numeral; JJ: adjective

Conceptualization extraction Relations extraction Identification of explicit relations in dependency tree. Identification of concept nodes and linking phrase. Identification of different kinds of relations (mainly verbal and prepositional). Patterns Examples SN0 ‘such as’ {SN1,SN2…,(and|or)} SNn …animals such as dog and bird. SN1 ‘is a kind of’ SN0 concept map is a kind of concept graph SN1 ‘is a’ SN0 dolphins are smart animals SN {,} ‘has’ {SN,}+{and|or} SN …humans and monkeys have similar characteristics Legend: SN: noun group; +: concatenation of elements; |: disjunction of elements; { }: optional elements; ( ): group of elements

Conceptualization extraction Relations extraction Identification of implicit relations through relations in OWL ontology. Identification of taxonomic relation between concepts C0 y C1 if both are ontological classes. Other semantic relations if: C0 and C1 are related by a Object property relation. Identification of class-instance relations.

RDF Model Generation Every relations is coded as a triplet in RDF data Model. The source concept is coded as subject and destiny concept is coded as object, and the linking phrase is coded as predicate. URIs in triplets are built from URLs of processed html file, the URL of the own file, as those from hyperlinks related to terms represented as concepts.

Characterization of test collections: Results and discussion Evaluation focused on the amount and quality of the information extracted from text (concepts and relations), using or not the ontology. Abstracts from corpuses DBPedia_ES and DBPedia_EN: http://wiki.dbpedia.org/Datasets and IA: http://azouaq.athabascua.ca/goldstandards.htm Characterization of test collections: Characteristics DBpedia_ES DBpedia_EN IA Language Spanish English Documents 50 6 Word average 75,9 75,2 1111,83 Sentence average 3,0 3,4 46,83

Results and discussion Test method without and with the ontological model. DBPedia_EN and IA were tested with available ontology. Evaluation through human evaluators, considering aspects such as: amount of information extracted, Contribution level of use of ontology, Accuracy in concept and relations extracted. Right concept: nouns or adjectives with important meaning in text, names of entities or phrases with sense. Right proposition: can be interpreted with own sense (ex. When both concepts are correct and link is well defined).

Results and discussion From information provided by evaluators, accuracy was calculated as the reason between extracted correctly and total. Aspects DBpedia_ES DBpedia_EN IA NoOnt YesOnt AC 7,88 10,06 10,2 112,4 120,2 AE 2,2 4,86 5,08 29,2 36,2 AR 6,10 8,38 8,48 103 114,8 ACOnt. - 3,72 37,2 AROnt. 8,6 PC 93,66 89,21 92,58 96,81 98,57 PR 53,92 51,26 53,97 66,67 83,10 Legend: AC: Average amount of concepts; AE: Average amount of entities; AR: Average amount of relations; CCOnt.: Prom. cant. conceptos obtenidosde la ontología; AROnt.: Average amount of relations identified in ontology; PC: Precision in concept identification; PR: Precision in relations identification; NoOnt: Not usig ontology; YesOnt: using ontology.

Case of use To exemplify the method it is selected a text abstract from DBpedia_ES. Then, it is built the map, and finally the RDF model La bronquitis crónica es una enfermedad inflamatoria de los bronquios respiratorios asociada con exposición prolongada a irritantes respiratorios no específicos, incluyendo microorganismos y acompañado por hipersecreción de moco y ciertas alteraciones estructurales en el bronquio, tales como fibrosis, descamación celular, hiperplasia de la musculatura lisa, etc.

Case of use Map created automatically from previous sentence:

Case of use RDF model obtained automatically by the method: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:daml="http://www.daml.org/2001/03/daml+oil#" > <rdf:Description rdf:about="owl:hiperplasia_de_musculatura_liso"> <rdfs:subClassOf rdf:resource="owl:alteración_estructural"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/> </rdf:Description> <rdf:Description rdf:about="owl:irritante_respiratorio_específico"> <rdf:Description rdf:about="owl:bronquitis_crónico"> <rdfs:subClassOf rdf:resource="owl:enfermedad_inflamatorio_de_bronquio_respiratorio"/> <rdf:Description rdf:about="owl:microorganismo"> <rdfs:subClassOf rdf:resource="owl:irritante_respiratorio_específico"/> <rdf:Description rdf:about="owl:fibrosis"> <rdf:Description rdf:about="owl:descamación_celular"> <rdf:Description rdf:about="owl:enfermedad_inflamatorio_de_bronquio_respiratorio"> <rdf:Description rdf:about="owl:alteración_estructural"> </rdf:RDF>

Case of use RDF model obtained automatically by the method:

Conclusions It is presented a method to extract Linked Data from unstructured texts in English and Spanish, and not domain dependent. It is combined NLP techniques for text processing and information extraction, allowing to obtain more information. Use of linguistic patterns to identify concepts and semantic relations between them, and dependency parsing, which has let to gain in text coverage. Concepts extraction, including entities, as the semantics links identified between them enrich Linked Data.

Conclusions Tests show promising results with over 90% in concept identification and over 50% in relations identification. It is shown the benefits of using an ontological reference model, based on the amount of information extracted, and the improve in accuracy.