Planning to Learn with a Knowledge Discovery Ontology Monika Žáková, Petr Křemen, Filip Železný (Czech Technical University, Prague) Nada Lavrač (Institute.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
SEVENPRO – STREP KEG seminar, Prague 8/November/2007 © SEVENPRO Consortium Relational Data Mining through Propositionalization and Subsequent.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Analyzing Minerva1 AUTORI: Antonello Ercoli Alessandro Pezzullo CORSO: Seminari di Ingegneria del SW DOCENTE: Prof. Giuseppe De Giacomo.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
THE MODEL OF ASIS FOR PROCESS CONTROL APPLICATIONS P.Andreeva, T.Atanasova, J.Zaprianov Institute of Control and System Researches Topic Area: 12. Intelligent.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Querying Structured Text in an XML Database By Xuemei Luo.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
The Volcano Optimizer Generator Extensibility and Efficient Search.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011.
1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
1 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to.
DEDUCTION PRINCIPLES AND STRATEGIES FOR SEMANTIC WEB Chain resolution and its fuzzyfication Dr. Hashim Habiballa University of Ostrava.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Mathematical Service Matching Using Description Logic and OWL Kamelia Asadzadeh Manjili
Ontology Technology applied to Catalogues Paul Kopp.
MDD-Kurs / MDA Cortex Brainware Consulting & Training GmbH Copyright © 2007 Cortex Brainware GmbH Bild 1Ver.: 1.0 How does intelligent functionality implemented.
Semantic metadata in the Catalogue Frédéric Houbie.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Model Discovery through Metalearning
Web Ontology Language for Service (OWL-S)
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
CSc4730/6730 Scientific Visualization
ece 627 intelligent web: ontology and beyond
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

Planning to Learn with a Knowledge Discovery Ontology Monika Žáková, Petr Křemen, Filip Železný (Czech Technical University, Prague) Nada Lavrač (Institute Jozef Stefan, Ljubljana)

Motivation FP6 SEVENPRO project: “semantic engineering environment”  integration of knowledge from various sources e.g. different CAD software, ERP, etc. by means of a layer of semantic annotations  a significant part of engineering knowledge has a rich relational structure (CAD designs, documents, simulation models, ERP databases)  traditional ML techniques and tools unsuitable Goals:  making implicit knowledge contained e.g. in CAD designs explicit for reuse, training, quality control  develop a tool for RDM capable of dealing with semantic annotations and producing results in a semantic format

Design Example

Example in the CAD ontology: declaring it in background knowledge: subclass(prismSolFeature, solidExtrude). hasFeature(B, F1):-hasFeature(B,F2),subclassTC(F1,F2). problem with subsumption: C = liner(P):- hasBody(P,B),hasFeature(B,prismSolFeature). D = liner(P):-hasBody(P,B),hasFeature(B,solidExtrude). it does not hold C   D  clause D not obtained by applying a specialization refinement operator onto clause C our approach: extend refinement operator with taxonomies on predicates and terms

Sorted Refinement Downward Δ,Σ-refinement  extension of sorted refinement proposed by Frisch  defined using 3 refinement rules: 1.adding a literal to the conjunction 2.replacing a sort with pred 1 (x 1 :τ 1,…,x n :τ n ) with one of its direct subsorts pred 1 (x 1 :τ 1 ’,…,x n :τ n ) 3.replacing a literal pred 1 (x 1 :τ 1,…,x n :τ n ) with one of its direct subrelations pred 2 (x 1 :τ 1,…,x n :τ n )

Feature Taxonomy  information about feature subsumption hierarchy stored and passed to the propositional learner  assume that features f 1,…, f n have been generated with corresponding conjunctive bodies b 1,…, b n  elementary subsumption matrix E of n rows and n columns is defined such that E i,j = 1 whenever b i ∈ ρ Δ,Σ (b i ) and E i,j = 0 otherwise  exclusion matrix X of n rows and n columns is defined such that X i,j = 1 whenever i = j or b i ∈ ρ Δ,Σ (ρ Δ,Σ (… ρ Δ,Σ (b j ) …)) and X i,j = 0 otherwise.

Propositional Rule Learning 2 propositional algorithms adapted to utilize matrices E, X 1. Top-down deterministic algorithm –stems from the rule inducer of RSD 2.Stochastic local DNF algorithm –(Rückert 2003, Paes 2006) –search in the space of DNF formulas –refinement done by local non-deterministic DNF term changes  using matrices E, X can: –prevent the combination of a feature and its subsumee within the conjunction (both) –specialize a conjunction by replacing a feature with its direct subsumee (Top-down only)

RDM Core Overview Feature subsumption table Feature construction Propositional rule learning (adapted) Features Subsumption and exclusion matrix Predicates declarations mode hasBody( +CADPart, -Body). mode hasMaterial(+CADPart, -Material). mode hasSketch(+CADPart, -Sketch). mode hasLength(+Sketch, -float). Sort theory subClassOf(CADPart,CADEntity). subClassOf(CADAssembly,CADEntity). … subPropertyOf(hasCircularSketch, hasSketch). subPropertyOf(firstFeature, hasFeature). Examples eItem(eItemT_BA1341). eItem(eItemT_BA1342). eItem(eItemT_BA1343). Background knowledge (Horn logic) Propositional learning (Weka, R)

RDM Manager = tool developed for running the RDM tasks Functionalities: 1.Obtaining relevant data by means of SPARQL query to semantic repository 2.Converting data from semantic representation into format acceptable by the DM algorithms (Prolog, arff, csv, etc.) 3.Propositionalization by generating first order features 4.Enhanced propositional rule learning algorithms 5.Third party propositional learning algorithms integrated by means of wrappers e.g. –rule learner RIPPER (Cohen 1995) –association rules - Apriori –decision trees – J48 algorithm (for all above WEKA implementation used) –clustering – distance-based PCA (implemented in R) 6.Storing information about DM processes and their results in semantic representation

Knowledge Discovery Ontology Foreseen queries that guided the design of the ontology  User: –Give me all rule-based classifiers found for class C on dataset D with error estimate < 5% –Give me the rule-based algorithm with shortest average runtime for datasets D, E and F  Developer: –Give me all pairs of model classes with equivalent expressiveness for which no conversion program is available –Give me all parameter settings for experiments with dataset D and algorithm A and their respective runtimes accuracy results

Example Queries to the KD ontology  Obvious idea: if the system knows all it can do, it can plan complex KD workflows  Example: a planning system queries to the ontology for generating decision tree from a relational dataset through propositionalization –Give me a program that takes a classified relational dataset represented as Prolog facts and produces an arff file –A program that take an arff file and produces a decision tree

Motivation for Workflow Generation  user: –RDM algorithms utilizing background knowledge and relational learning through propositionalization and subsequent propositional learning quite complex  we want to hide as much of complexity as possible from the user  developer/data miner: –storing information about the whole process  repeatability of experiments –individual components developed by different people  can focus on experimenting with parameters of some components and view other as black box

Main Classes of KD Ontology  main notions : Knowledge and Algorithm  representation language: OWL-DL –densely interlinked knowledge structures, not just taxonomies –highly optimized reasoners available (Pellet, RacerPro, Fact++,...)

Knowledge 5 subclasses:  Dataset  LogicalKnowledge  NonLogicalKnowledge  Pattern = MiningResult  multiple formats may be attached to each Knowledge class –each knowledge instance has a specified KnowledgeFormat Knowledge and example some Example subclassOf Knowledge and hasExpressivity some Expressivity and hasFormat some KnowledgeFormat Knowledge and not LogicalKnowledge Knowledge and producedBy some AlgorithmExecution

Expressivity Expressivity hierarchy Protégé

Algorithms Algorithm  a mapping from knowledge to knowledge  not just induction, all executable elements incl. preprocessing,...  definition of inputs, outputs and parameters Apriori subclassOf NamedAlgorithm and input some (Dataset and hasExpressivity only SingleRelationStructure and format only {ARFF,CSV}) and output some (MiningResult and contains only AssociationRule) and minMetric some double and minSupport some double and numOfRules some positiveInteger

Algorithms (2)  atomic (named) vs. composite (workflows)  types of algorithms modeled as classes e.g. ClusteringAlgorithm  each algorithm description is modeled as a subclass of class NamedAlgorithm (like Apriori above)  instances of class AlgorithmExecution represent executions of algorithms  thus, to access a particular algorithm, we need to pose a schema query to the OWL ontology – SPARQL-DL

Pattern  Result of a data mining algorithm  Describes a mapping from knowledge to knowledge  Defined as:  Example: association rules AssociationRule subclassOf AtomicKnowledge and antecedent some And and consequent some And and confidence some double and support some double Knowledge and producedBy some AlgorithmExecution subclassOf contains only (AtomicKnowledge and singleResultAnnotation some anySimpleType) MiningResult and producedBy only AssociationRulesAlgorithmExecution and contains only AssociationRule

Anticipated Usage of the KD Ontology  a specialization of relevant OWL-S ontology parts – mainly the Process class.  during the planning inputs and outputs will be matched w.r.t. their format and expressivity to filter out invalid algorithm bindings  beyond the workflow generation : –management of the SoA knowledge in the KD domain –storing and managing KD workflow results – for example for meta-learning, experiment repeatibility

Workflow Construction Automatic workflow construction 1.Converting KD task described using classes from the KD ontology into a planning problem described in PDDL 2.Generating a plan using a planning algorithm 3.Storing the generated abstract workflow in form of semantic annotation 4.Instantiating the abstract workflow with specific algorithm configurations available in the KD ontology

Workflow-related Classes of KD Ontology KD ontology extended with workflow-related classes:  ProblemDescription – defined using properties –init specifying the available input data and knowledge –goal specifying the desired results  Action – defined by –Algorithm, which is executed –startTime, duration and –immediately preceeding Action s  Workflow – currently a DAG of Action s with a link to ProblemDescription from which it was generated

Problem Description Example  Example: generating relational association rules from a classified relational dataset with relational background knowledge expressed in OWL-DL RelationalAssociationRules subClassOf ProblemDescription and goal some (MiningResult and contains only AssociationRule) and init some (LogicalKnowledge and hasExpressivity some OWL-DL and hasFormat some {RDFXML}) and init some (LogicalKnowledge and hasExpressivity some RelationalStructure and hasFormat some {RDFXML}) and init some (ClassifiedInstanceSet and hasFormat some {RDFXML})

Conversion into a Planning Task Described in PDDL  ontology classified using FACT reasoner to generate inferred hierarchy on algorithms, knowledge and patterns  names generated for classes defined using OWL restrictions  domain description in PDDL –generated by converting Algorithm s into PDDL actions, with inputs specifying the preconditions and outputs specifying the effects –both inputs and outputs are currently restricted to conjunction of OWL classes  problem description in PDDL –generated in the same way from ProblemDescription

Algorithm Definition Example Description in KD ontology (in DL formalism ) Description used for planning (in PDDL ) (:action AprioriAlgorithm :parameters ( ?v0 – Dataset_SingleRelationStructure ?v1 – ARFF ?v2 – MiningResult_contains_AssociationRule) :precondition (and (available ?v0) (format (?v0 ?v1)) :effect (and (available ?v2)) Apriori subClassOf NamedAlgorithm and input some (Dataset and hasExpressivity only SingleRelationStructure and format only {ARFF}) and output some (MiningResult and contains only AssociationRule) and minMetric some double and minSupport some double and numOfRules some positiveInteger

Planning Algorithm  based on Fast-Forward planning system (Hoffman, 2001)  enforced hill climbing algorithm to perform forward state space search  goal distances estimated using relaxed GRAPHPLAN –i.e. ignoring delete lists of the operators  returns the discovered workflows with lowest number of processing steps

Generated Workflow for CAD Designs

RDM Manager implementation RDM GUI Semantic Server Agent RDM Manager Tool RDM Web Service RDM Engine Algorithm Implementation 1 Algorithm Implementation n … RDMOntologyRDMOntology

RDM GUI

Related Work (planning to learn)  Most relevant: NEXT System [Bernstein & Deanzer]  (Our best understanding:) –Linear plans –Preprocessing-Induction-Postprocessing template  We try for a template-free plan (DAG) Multi- relational data Feature construction (inductive) Feature evaluation (deductive) Propositional learning (inductive) Propositionalized Data

Related Work (DM workflows and DM assistants )  workflows for DM –myGrid/Taverna, Triana, DataMiningGrid, Kepler, KnowledgeGrid, CAMLET, Pegasus, MiningMart –manual workflow composition, focus on workflow execution –focus on DM from relational databases –relevant efforts in formalization of DM processes  DM assistants –MetaL, StatLog - classification of DM methods, metrics for comparing the methods, finding suitable methods for a given dataset

Related Work (DM ontologies)  existing DM ontologies –ontologies for classical DM - 3 stages: induction, pre- and post-processing –focus on hierarchy of DM algorithms and propositional dataset description –DAMON – KnowledgeGrid project [Cannataro & Comito] –DataMiningGrid application description schema [Stankovski et al.] –DM ontology for IDEA [Bernstein et al.] –myGrid ontology – for bioinformatics, includes biological domain concepts  other work towards KD process formalization –CinQ and IQ projects (EU FP6) –Sašo Džeroski: Towards a General Framework for Data Mining

Related Work (Semantic Web Service Composition)  essentially creating workflows based on semantic description of the ingredients  popular approach: convert semantic description to PDDL and use suitably adapted planning techniques [Klusch et al.], [Liu et al.]  we have adapted this approach for DM workflows using KD ontology  future work: individual DM algorithms as web services?

Open Issues  Reactive planning / exploration –Currently planning towards a desired kind of result, not quality  Conversion of knowledge –From more to less expressive –How can we constrain what should remain from the original information? –Can this be done at all without semantic meta-data?

Open Issues  Tighter integration of the ontology with planning –Currently: simple rewriting of algorithm annotations into PDDL actions –Work-in-progress: planner poses SPARQL queries to retrieve relevant actions  Computational platform: –GRID or web services?