CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
SIG2: Ontology Language Standards WebOnt Briefing Ian Horrocks University of Manchester, UK.
Lecture 1 Introduction to the ABAP Workbench
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Software Architecture Design Instructor: Dr. Jerry Gao.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Approaching Web-Based Expertise with Semantic Web Kimmo Salmenjoki: Department of Computer Science, University of Vaasa, Vagan Terziyan: Department.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
Overview of Search Engines
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Editing Description Logic Ontologies with the Protege OWL Plugin.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
New trends in Semantic Web Cagliari, December, 2nd, 2004 Using Standards in e-Learning Claude Moulin UMR CNRS 6599 Heudiasyc University of Compiègne (France)
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
FP OntoGrid: Paving the way for Knowledgeable Grid Services and Systems WP8: Use case 1: Quality Analysis for Satellite Missions.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Search Engines and Information Retrieval Chapter 1.
Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Help Desk System How to Deploy them? Author: Stephen Grabowski.
Master Thesis Defense Jan Fiedler 04/17/98
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Aquenergy Portal Elisabetta Zuanelli, University of Rome “Tor Vergata”, Italy E-Age 2014 Muscat december.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
IST Programme - Key Action III Semantic Web Technologies in IST Key Action III (Multimedia Content and Tools) Hans-Georg Stork CEC DG INFSO/D5
Search Engine Architecture
IHE Profile – SOA Analysis: In Progress Update Brian McIndoe January 18, 2011.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
PBA Front-End Programming Development Organisation.
Knowledge Management: The On-To-Knowledge Project Hans Akkermans Free University Amsterdam VUA.
Towards the Semantic Web 6 Generating Ontologies for the Semantic Web: OntoBuilder R.H.P. Engles and T.Ch.Lech 이 은 정
Digital libraries and web- based information systems Mohsen Kamyar.
OWL Representing Information Using the Web Ontology Language.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
On-To-Knowledge: Dissemination and Use Hans Akkermans Free University Amsterdam VUA.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Condor Technology Solutions, Inc. Grace Performance Chemicals HRIS Intranet Project.
On-To-Knowledge review Juan-Les-Pins/France, October 06, 2000 Hans Akkermans, VUA Hans-Peter Schnurr, AIFB Rudi Studer, AIFB York Sure, AIFB KMKMMethodology.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Viewpoint Modeling and Model-Based Media Generation for Systems Engineers Automatic View and Document Generation for Scalable Model- Based Engineering.
Role of Metadata in dissemination of census data Regional Seminar on dissemination and spatial analysis of census data, Nairobi, September, 2010.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
Semantic Markup for Semantic Web Tools:
Search Engine Architecture
Presentation transcript:

CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s

Overview 1.On-To-Knowledge project 2.CORPORUM 3.CORPORUM-OntoExtract 4.Discussion 5.Conclusion

What is Knowledge Management? Knowledge Management is the collection of processes that govern the creation, dissemination, and utilization of knowledge. --- Brian Newman, 1991

What is On-To-Knowledge (OTK) project? Goals: develop tools and methods for supporting knowledge management relying on sharable and reusable knowledge ontologies. The technical backbone of On-To-Knowledge is the use of ontologies for the various tasks of information integration and mediation.

What is On-To-Knowledge (OTK) project? European project in EU Information Society Technologies (IST) Program: EU-IST Duration: 2.5 years, January June 2002 Total effort & cost: 26 personyears, 2.5+ M EUR Partners: 1.CognIT a.s 2.AIdministrator 3.AIFB (University of Karlsruhe) 4.BT Research 5.Enersearch 6.Swiss Life Information Systems Research Group

CognIT a.s Established in Halden, Norway in employees - 3 with PhD CORPORUM TM Develops Technology for: 1.intelligent search by means of agents 2.text analysis and extraction 3.structuring and fusing data to build knowledge 4.knowledge bases and feedback of experience 5.data mining and text mining

On-to-Knowledge workbench CORPORUM-OntoExtract: extract ontologies from unstructured documents and represent them in XML/RDF/OWL CORPORUM-OntoWrapper: extract ontologies from structured documents and represent them in XML/RDF/OWL RDF-DB (Sesame) RDF-Ferret: interface between users and RQL OntoEdit (Ontology Editor) RQL engine: query RDF-DB DAML-OIL: representation language

The OnToKnowledge system architecture

Introduction of CORPORUM CORPORUM is a tool for information retrieval and extraction developed by CognIT a.s. crawl the internet and intranet analyzing relevance and content maintain knowledge base (RDF-DB) focus on the content searches, cataloguing, summaries and extractions can be performed according to user interests founded on CognlT’s Mimir technology Features:

The overall CORPORUM architecture

Introduction of CORPORUM Core technology -- MIMIR includes: Linguistic analysis through all levels and generate user interested ontology in RDF. Similar analysis: obtain documents which are most pertinent to a specific analyzed text. (information retrieval and extraction)

“Classical” Natural Language processing decomposed.

Mimir architecture

Informaton distribution Introduction of CORPORUM Histogram showing where the desired content in the document can be found and to what degree it is pertinent.

CORPORUM-OntoExtract: The web-based version of a CORPORUM version Use same architecture as the CORPORUM Extract ontologies from unstructured web pages Represent extracted ontologies in XML/RDF/OIL

CORPORUM-OntoExtract: CMOntoBuild: taken care of overall control of the system and co- ordinating all information flows CMWebHandler: responisble for collecting all (text-) documents from a specific site CMCogLib: analysis texts, extracts information, exports a variety of formats CMLexEn: language dependent support module for CMCoglib CMWebInteract: communication component that takes care of all interaction of CORPORUM-OntoExtract with the RDF database. Responsible for querying the RDF-DB, as well as submitting final analysis results. DOMhandler: integrated in CMWebInteract, the OpenXML DOM handler takes care of the interpretation of the results which are returned from the RDF server

CORPORUM-OntoExtract performs the following tasks: CMOntoBuild is invoked by the user CMWebHandler is invoked by CMOntoBuild CMWebHandler retrieves the domain that is specified from the intra/internet and returns it to CMOntoBuild CMOntoBuild passes texts to the CMCoglib that analyses, interprets and extracts information from these texts, and returns a basic RDF representation to CMOntoBuild CMOntoBuild now analyses the generated RDF and queries the RDF Ontology repository to try to find knowledge that can augment the previously generated RDF When all querying that could be performed is done, and the RDF is augmented, the final RDF ontology for a specific document is sent to the RDF server together with a reference to the original text.

Client/Server based System Architecture of CORPORUM-OntoExtract

The overall CORPORUM architecture

CORPORUM-OntoExtract output: Namespace definitions Dublin Core based metadata Property definitions Ontology Facts/instances Cross-taxonomic relations

Content in natural language vs. content in structure CORPORUM-OntoExtracte can capture content without considering the layout and structure of the texts. In some cases, the structure of texts has to be considered. Contracts, licenses. CORPORUM-OntoWrapper Discussion on use of CORPORUM technology in OntoExtract

Diversity of web pages (unknown intention) Diversity of documents on the web It is difficult to analyze a text according to the intention of the writers Combination of CORPORUM-OntoExtract with CORPORUM-OntoWrapper might some of these issues Discussion on use of CORPORUM technology in OntoExtract

Representational issues (A-box vs. T-box reasoning) TBox: Tbox consists of (class) concept inclusion axioms (and/or equivalence) -- e.g., "C subsumes D“. ABox: Abox consists of individual/tuple membership axioms - e.g., "x is an instance of C" or " is an instance of R". Most of the CORPORUM-OntoExtract generated knowledge is TBox knowledge. Discussion on use of CORPORUM technology in OntoExtract

Domain specificity of extracted knowledge Since the ontologies are extracted from specified domains, the extracted information is expected to be restricted in these domains. Positive: while many of the searches will also be rather domain specific, and knowledge about cross-taxonomic relations might come in very handy. Negative: one may like to build up domain independent knowledge bases. Discussion on use of CORPORUM technology in OntoExtract

Conclusion CORPORUM helps web become more semantic. Semantic-based technology. Enhance usability of formal knowledge representations for end-users Decrease initial efforts when defining an ontology in new domains

Dynamicity of the analysis, i.e. ease of use in dynamic environments Offer new ways of navigating knowledge bases and documents sets by visualization of contents and by means of semantic-based, graphic structures Extract of content-based meta-data from documents, such as important concepts, semantic structures, etc. Ability to offer domain-specific information as related-keywords Conclusion

Comments Description is too general. No examples and details. Weak sentences. Complicate sentence structures.