Presentation is loading. Please wait.

Presentation is loading. Please wait.

CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s.

Similar presentations

Presentation on theme: "CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s."— Presentation transcript:

1 CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s

2 Overview 1.On-To-Knowledge project 2.CORPORUM 3.CORPORUM-OntoExtract 4.Discussion 5.Conclusion

3 What is Knowledge Management? Knowledge Management is the collection of processes that govern the creation, dissemination, and utilization of knowledge. --- Brian Newman, 1991

4 What is On-To-Knowledge (OTK) project? Goals: develop tools and methods for supporting knowledge management relying on sharable and reusable knowledge ontologies. The technical backbone of On-To-Knowledge is the use of ontologies for the various tasks of information integration and mediation.

5 What is On-To-Knowledge (OTK) project? European project in EU Information Society Technologies (IST) Program: EU-IST-10132 Duration: 2.5 years, January 2000 - June 2002 Total effort & cost: 26 personyears, 2.5+ M EUR Partners: 1.CognIT a.s 2.AIdministrator 3.AIFB (University of Karlsruhe) 4.BT Research 5.Enersearch 6.Swiss Life Information Systems Research Group

6 CognIT a.s Established in Halden, Norway in 1996. 20 employees - 3 with PhD CORPORUM TM Develops Technology for: 1.intelligent search by means of agents 2.text analysis and extraction 3.structuring and fusing data to build knowledge 4.knowledge bases and feedback of experience mining and text mining

7 On-to-Knowledge workbench CORPORUM-OntoExtract: extract ontologies from unstructured documents and represent them in XML/RDF/OWL CORPORUM-OntoWrapper: extract ontologies from structured documents and represent them in XML/RDF/OWL RDF-DB (Sesame) RDF-Ferret: interface between users and RQL OntoEdit (Ontology Editor) RQL engine: query RDF-DB DAML-OIL: representation language

8 The OnToKnowledge system architecture

9 Introduction of CORPORUM CORPORUM is a tool for information retrieval and extraction developed by CognIT a.s. crawl the internet and intranet analyzing relevance and content maintain knowledge base (RDF-DB) focus on the content searches, cataloguing, summaries and extractions can be performed according to user interests founded on CognlT’s Mimir technology Features:

10 The overall CORPORUM architecture

11 Introduction of CORPORUM Core technology -- MIMIR includes: Linguistic analysis through all levels and generate user interested ontology in RDF. Similar analysis: obtain documents which are most pertinent to a specific analyzed text. (information retrieval and extraction)

12 “Classical” Natural Language processing decomposed.

13 Mimir architecture

14 Informaton distribution Introduction of CORPORUM Histogram showing where the desired content in the document can be found and to what degree it is pertinent.

15 CORPORUM-OntoExtract: The web-based version of a CORPORUM version Use same architecture as the CORPORUM Extract ontologies from unstructured web pages Represent extracted ontologies in XML/RDF/OIL

16 CORPORUM-OntoExtract: CMOntoBuild: taken care of overall control of the system and co- ordinating all information flows CMWebHandler: responisble for collecting all (text-) documents from a specific site CMCogLib: analysis texts, extracts information, exports a variety of formats CMLexEn: language dependent support module for CMCoglib CMWebInteract: communication component that takes care of all interaction of CORPORUM-OntoExtract with the RDF database. Responsible for querying the RDF-DB, as well as submitting final analysis results. DOMhandler: integrated in CMWebInteract, the OpenXML DOM handler takes care of the interpretation of the results which are returned from the RDF server

17 CORPORUM-OntoExtract performs the following tasks: CMOntoBuild is invoked by the user CMWebHandler is invoked by CMOntoBuild CMWebHandler retrieves the domain that is specified from the intra/internet and returns it to CMOntoBuild CMOntoBuild passes texts to the CMCoglib that analyses, interprets and extracts information from these texts, and returns a basic RDF representation to CMOntoBuild CMOntoBuild now analyses the generated RDF and queries the RDF Ontology repository to try to find knowledge that can augment the previously generated RDF When all querying that could be performed is done, and the RDF is augmented, the final RDF ontology for a specific document is sent to the RDF server together with a reference to the original text.

18 Client/Server based System Architecture of CORPORUM-OntoExtract

19 The overall CORPORUM architecture

20 CORPORUM-OntoExtract output: Namespace definitions Dublin Core based metadata Property definitions Ontology Facts/instances Cross-taxonomic relations

21 Content in natural language vs. content in structure CORPORUM-OntoExtracte can capture content without considering the layout and structure of the texts. In some cases, the structure of texts has to be considered. Contracts, licenses. CORPORUM-OntoWrapper Discussion on use of CORPORUM technology in OntoExtract

22 Diversity of web pages (unknown intention) Diversity of documents on the web It is difficult to analyze a text according to the intention of the writers Combination of CORPORUM-OntoExtract with CORPORUM-OntoWrapper might some of these issues Discussion on use of CORPORUM technology in OntoExtract

23 Representational issues (A-box vs. T-box reasoning) TBox: Tbox consists of (class) concept inclusion axioms (and/or equivalence) -- e.g., "C subsumes D“. ABox: Abox consists of individual/tuple membership axioms - e.g., "x is an instance of C" or " is an instance of R". Most of the CORPORUM-OntoExtract generated knowledge is TBox knowledge. Discussion on use of CORPORUM technology in OntoExtract

24 Domain specificity of extracted knowledge Since the ontologies are extracted from specified domains, the extracted information is expected to be restricted in these domains. Positive: while many of the searches will also be rather domain specific, and knowledge about cross-taxonomic relations might come in very handy. Negative: one may like to build up domain independent knowledge bases. Discussion on use of CORPORUM technology in OntoExtract

25 Conclusion CORPORUM helps web become more semantic. Semantic-based technology. Enhance usability of formal knowledge representations for end-users Decrease initial efforts when defining an ontology in new domains

26 Dynamicity of the analysis, i.e. ease of use in dynamic environments Offer new ways of navigating knowledge bases and documents sets by visualization of contents and by means of semantic-based, graphic structures Extract of content-based meta-data from documents, such as important concepts, semantic structures, etc. Ability to offer domain-specific information as related-keywords Conclusion

27 Comments Description is too general. No examples and details. Weak sentences. Complicate sentence structures.


Download ppt "CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s."

Similar presentations

Ads by Google