Ontology-Driven Data Preparation for Data Mining Martin Zeman, KSI MFF UK Martin Ralbovský, KIZI FIS VŠE.

Slides:



Advertisements
Similar presentations
Profiles Construction Eclipse ECESIS Project Construction of Complex UML Profiles UPM ETSI Telecomunicación Ciudad Universitaria s/n Madrid 28040,
Advertisements

16/11/ IRS-II: A Framework and Infrastructure for Semantic Web Services Motta, Domingue, Cabral, Gaspari Presenter: Emilia Cimpian.
Chronos: A Tool for Handling Temporal Ontologies in Protégé
Ontological Logic Programming by Murat Sensoy, Geeth de Mel, Wamberto Vasconcelos and Timothy J. Norman Computing Science, University of Aberdeen, UK 1.
AHRT: The Automated Human Resources Tool BY Roi Ceren Muthukumaran Chandrasekaran.
Vassilis Papataxiarhis, V.Tsetsos, I.Karali, P.Stamatopoulos, and S.Hadjiefthymiades Department of Informatics and Telecommunications University.
Modularity. Methods & Class as program unit A method is comprised of statement sequences and is often viewed as the smallest program unit to be considered.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Adaptive Database Application Modeling API Final Project Report SOURENA NASIRIAMINI CS 491 6/2/2005.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
A Tool to Support Ontology Creation Based on Incremental Mini-Ontology Merging Zonghui Lian Data Extraction Research Group Supported by.
Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Database Architecture The Relational Database Model.
Students: Ilya Paskhover, Itay Gal Supervisors: Oleg Rokhlenko, Nadav Golbandi.
CSE314 Database Systems Data Modeling Using the Entity- Relationship (ER) Model Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Scientific Workflows Within the Process Mining Domain Martina Caccavale 17 April 2014.
Martin Ralbovský KIZI FIS VŠE The GUHA method Provides a general mainframe for retrieving interesting information from data Strong foundations.
An Introduction to Description Logics. What Are Description Logics? A family of logic based Knowledge Representation formalisms –Descendants of semantic.
Provenance Metadata for Shared Product Model Databases Etiel Petrinja, Vlado Stankovski & Žiga Turk University of Ljubljana Faculty of Civil and Geodetic.
Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.
Modeling Tools for Healthcare Technical Overview April 8, 2009.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
SQL Databases are a Moving Target Juan F. Sequeda – Syed Hamid Tirmizi –
Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Rodrigo RizziStarr, Jose´ Maria Parente de Oliveira IS Concept maps as the first.
Semantic Web Fred: Project Objectives & SWF Framework Michael Stollberg Reinhold Herzog Peter Zugmann - 07 April
Development in the Ferda project December 2006 Martin Ralbovský.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Ferda Visual Environment for Data Mining Martin Ralbovský.
Dimitrios Skoutas Alkis Simitsis
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Export experiments in Corese. October 10th Export experiments in Corese Olivier Corby October 10th, 2005 Interoperability Working Days October 10th-11th,
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
GREGORY SILVER KUSHEL RIA BELLPADY JOHN MILLER KRYS KOCHUT WILLIAM YORK Supporting Interoperability Using the Discrete-event Modeling Ontology (DeMO)
IFS310: Module 6 3/1/2007 Data Modeling and Entity-Relationship Diagrams.
Task 1.2 Context: definition and specification. Leuven, 14 oktober 2004 Outline Introduction Work method Context definition Context specification  Overview.
Text Mining & NLP based Algorithm to populate ontology with A-Box individuals and object properties Alexandre Kouznetsov and Christopher J. O. Baker, University.
ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife Transitioning Relational Databases to Ontologies Farid Cerbah Dassault.
IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Building an Operational Product Ontology System Written by Taehee Lee, Ig-hoon Lee, Suekyung Lee, Sang-goo Lee (IDS Lab. SNU) Dongkyu Kim, Jonghoon Chun.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Identifying Domain Expertise of Developers from Source Code Presenter : Wu, Jia-Hao Authors : Renuka.
Modeling Security-Relevant Data Semantics Xue Ying Chen Department of Computer Science.
Chapter 7 K NOWLEDGE R EPRESENTATION, O NTOLOGICAL E NGINEERING, AND T OPIC M APS L EO O BRST AND H OWARD L IU.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Universität Innsbruck Leopold Franzens  Copyright 2007 DERI Innsbruck Second TTF Technical Fair 12 December 2007 Mediation Component Second.
Ccs.  Ontologies are used to capture knowledge about some domain of interest. ◦ An ontology describes the concepts in the domain and also the relationships.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Dmitry Mouromtsev, Aleksei Romanov, Dmitry Volchek and Fedor Kozlov Laboratory ITMO University, St. Petersburg, Russia “Metadata Extraction from.
Data Modeling Using the Entity- Relationship (ER) Model
Agenda Federated Enterprise Architecture Vision
Director – Engineering
A Methodology for Finding Bad Data
Databases and Database Management Systems Chapter 9
Presented by: Hassan Sayyadi
Methontology: From Ontological art to Ontological Engineering
[jws13] Evaluation of instance matching tools: The experience of OAEI
Entity-Relationship (E-R) Modeling
Access Control What’s New?
Tasks Task #1: Operational Plan Task #2: Specification Mapping
Presentation transcript:

Ontology-Driven Data Preparation for Data Mining Martin Zeman, KSI MFF UK Martin Ralbovský, KIZI FIS VŠE

Possible usage of domain ontologies in the KDD process Knowledge discovery x knowledge storage Data understanding phase Knowledge from ontology helps to comprehend the domain Task design phase Define meaningful tasks with aid of ontology Result interpretation phase How do KDD results cope with ontology knowledge

Previous works Theoretically high (methodology) Practically low  (manual experiments, no real software support) Main goal: software support for some of the ontology support ideas Implementation platform: Ferda

How to load ontology? 1 st problem: how to load ontology? Ontology language – OWL 1.1 Available software usage – OWL API Technical situation Ferda -.NET + ICE Middleware OWL API – Java

How to load ontology? Ontology Module OWL API Java Ontology Box Java.NET ICE Box API.NET

Mapping 2 nd problem: how to connect ontology and database? Columns Table or database Classes and instance Mapping Relation- 1:N, M:1, M:N?

Creation of attributes Proper categorization of domains – crucial step for successful KDD (not only in GUHA) Example: blood pressure above 140/90 mm Hg is considered as hypertension Categorization information available in ontology?

Additional information Cardinality (nominal/ordinal/ordinal cyclic/cardinal) Maximum Minimum Domain dividing values Distinct values Saving information to ontology Datatype properties Domain: metaclass owl:class Advantages Inherent part of the domain Reusability Not restricted to KDD (GUHA)

Diastolic blood pressure

Attribute creation algorithm IF (cardinality == nominal OR cardinality == ordinal cyclic) each value one category return ELSE IF (count of categories <= 5) each value one category return ELSE find the domain range (minimum, maximum) IF (exist domain dividing values) split according domain dividing values IF (exist distinct values) create category for each distinct value

Identification of semantically related attributes Analytical question: “What is the relation between blood pressure levels and hypertension?” What are the attributes corresponding to blood pressure/hypertension? Boxes asking for creation mechanism can help Experiment

Conclusions Implemented support for: Mapping ontology and database concepts Semi – automatic creation of right categorization Identification of related attributes