1 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to.

Slides:



Advertisements
Similar presentations
Explanation-Based Learning (borrowed from mooney et al)
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Is the shape below a function? Explain. Find the domain and range.
Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory Department of Computer Science Iowa State University Ames, IA 50011, USA
1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
Visual Data Mining: Concepts, Frameworks and Algorithm Development Student: Fasheng Qiu Instructor: Dr. Yingshu Li.
Inductive Logic Programming: The Problem Specification Given: –Examples: first-order atoms or definite clauses, each labeled positive or negative. –Background.
SEVENPRO – STREP KEG seminar, Prague 8/November/2007 © SEVENPRO Consortium Relational Data Mining through Propositionalization and Subsequent.
CPSC 322, Lecture 19Slide 1 Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt ) February, 23, 2009.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
SQL Sangeeta Devadiga CS157A, Fall Outline Background Data Definition Basic Structure Set Operation.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Database Systems Chapter 6 ITM Relational Algebra The basic set of operations for the relational model is the relational algebra. –enable the specification.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Chapter 5 Other Relational Languages By Cui, Can B.
Database Systems More SQL Database Design -- More SQL1.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.
Midterm 1 Concepts Relational Algebra (DB4) SQL Querying and updating (DB5) Constraints and Triggers (DB11) Unified Modeling Language (DB9) Relational.
Graph-RAT Overview By Daniel McEnnis. 2/32 What is Graph-RAT  Relational Analysis Toolkit  Database abstraction layer  Evaluation platform  Robustly.
Inductive Logic Programming (for Dummies) Anoop & Hector.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
Studying the Presence of Genetically Modified Variants in Organic Oilseed Rape by using Relational Data Mining Aneta Ivanovska 1, Celine Vens 2, Sašo Džeroski.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Dimitrios Skoutas Alkis Simitsis
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
1 Redundant Feature Elimination for Multi-Class Problems Annalisa Appice, Michelangelo Ceci Dipartimento di Informatica, Università degli Studi di Bari,
Chapter 8: SQL. Data Definition Modification of the Database Basic Query Structure Aggregate Functions.
Chapter 5 Relational Algebra and Relational Calculus Pearson Education © 2009.
Matic Perovšek, Anže Vavpeti č, Nada Lavra č Jožef Stefan Institute, Slovenia A Wordification Approach to Relational Data Mining: Early Results.
ECML 2001 A Framework for Learning Rules from Multi-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS.
1 Mark-A. Krogel, Magdeburg University, Knowledge Discovery and Machine Learning Group KDD Cup 2001: Gene/Protein Function Prediction Using the Multirelational.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
Slide 6- 1 Additional Relational Operations Aggregate Functions and Grouping A type of request that cannot be expressed in the basic relational algebra.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Machine Learning Concept Learning General-to Specific Ordering
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
1 CS 430 Database Theory Winter 2005 Lecture 10: Introduction to SQL.
Data Mining and Decision Support
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CSE202 Database Management Systems
More SQL: Complex Queries,
Object-Oriented Database Management System (ODBMS)
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
The Relational Algebra and Relational Calculus
CSc4730/6730 Scientific Visualization
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Implementation of Learning Systems
Presentation transcript:

1 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to Propositionalization Mark-A. Krogel,Otto-von-Guericke-Universität Magdeburg Simon Rawles, University of Bristol Filip Zelezný, Czech Technical University and University of Wisconsin, Madison Peter A. Flach, University of Bristol Nada Lavrač, Institute Jozef Stefan, Ljubljana Stefan Wrobel, Friedrich-Wilhelms-Universität Bonn and Fraunhofer-Institut AiS

2 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Introduction o Propositionalization: largely automatic transformation of relational data into a single-table representation and application of propositional learners o In principle less powerful than searching full first-order hypothesis space o In practice often sufficient, efficient, and flexible o Here: first comparative study using representatives of logic-oriented approaches (RSD, SINUS) and database-oriented approaches (RELAGGS)

3 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Propositionalization o An ILP learning task: given ground facts of target predicate (examples) and clauses of background predicates, find hypothesis to explain together with background theory some properties of examples o Complete vs. partial approches, general-purpose vs. special-purpose approaches o Clauses constructed from relational background knowledge and structural properties of individuals, calls of clauses for individuals produce feature values

4 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization RSD o Declarative bias similar to Progol/Aleph, e.g. :-modeb(3,hasCar(+train,-car). o Step 1: identification of all closed feature definitions (Prolog queries) corresponding to declarations hasCar(Train,Car), shape(Car,Shape), instantiate(Shape) o Step 2: instantiation of variables plus feature filtering, e.g. hasCar(Train,Shape), shape(Shape,bucket) o Step 3: creation of propositionalized representation

5 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization RSD: Constraints & Pruning o Language o argument modes & types, predicate recall o max feature length & variable depth o undecomposability: f1 <> f2 & f3 o Evaluation o non-triviality: |cov(f)| < |Data| o relevance: |cov(f)| > min o uniqueness: if cov(f1) = cov(f2) then discard the longer o Pruning: o large subspaces identified containing only decomposable f. o eg. EW Trains: SearchTime -> +inf as MaxLength -> +inf o with pruning: SearchTime -> const as MaxLength -> +inf o if |cov(f)| < min then don’t refine f

6 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization SINUS: Overview o Developed from LINUS and its feature generation extension o A modular transformational ILP experimentation platform o Automated type construction o Feature reduction o Invocation of learner and back-translation of induced theory to first-order form. o Data as flattened Prolog facts + data definition o Declarative bias similar to 1BC, e.g. train 1 train cwa train2car 2 1:train *:#car * cwa cshape 2 car #shape * cwa

7 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization SINUS: Step by step o Step 1: construction of instantiated feature definitions, e.g. f_aaaa(A) :- train(A), hasCar(A,B),shape(B,bucket). Recursive left-to-right considering current variable types and bindings. o Constraining maximum literals, variable, values in a type and the nature of variable reuse. o Step 2: feature set reduction (REDUCE) o Step 3: creation of propositionalized representation o After learning: result transformation into first-order hypothesis

8 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization RELAGGS o Declarative bias from foreign key relationships in relational database schema o After example identifier propagation to non-target relations: o Step 1: summarize each non-target relation by example id, avg, max, min, sum, stdev, range, quartiles for numeric data, count possible values for nominal attributes, plus some two-column aggregates o Step 2: creation of propositionalized representation by concatenating aggregate function values to target relation

9 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Learning Tasks o Trains: 20 trains east- or west-bound? o King-Rook-King: 1000 board states legal or not? o Mutagenesis: 188 molecules mutagenic or not? o PKDD Challenges 1999/2000: 682 loans problematic or not? o KDD Cup 2001: 862 genes/proteins with certain function or not and with certain localization or not? o Numbers of predicates/relations depend on modeling issues.

10 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Procedure o Mostly starting point: Prolog representation of target predicate facts and background predicate definitions, SQL scripts generated from those if necessary o Manual construction of declarations, propagation of id‘s if necessary o Application of RSD, SINUS, and RELAGGS to produce single- table representations of relational input data, with different parameter settings to produce feature sets of different sizes o Application of WEKA‘s J48 (10-fold stratified cross-validation) to those tables

11 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Accuracies (1)

12 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Accuracies (2)

13 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Accuracies (3)

14 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Accuracies (4)

15 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Accuracies (5)

16 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Accuracies (6)

17 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Results: Runtimes o Different platforms, hence times only indicators o RSDSINUSRELAGGS o Trains< 1 sec min< 1 sec o King-Rook-King< 1 sec2 - 6 minn. a. o Mutagenesis5 min min30 sec o PKDD sec2 – 30 min30 sec o KDD01 fct3 min30 min1 min o KDD01 loc3 min30 min1 min

18 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Discussion o Not generally conclusive in favor of any approach: each winner on two tasks o Aggregation strong in some domains, where counting features are relevant (Trains) or many numeric attributes exist in the original data o Differences between RSD and SINUS mainly due to differences in constraining the language bias o RELAGGS most efficient for many tasks, differences between RSD and SINUS possibly caused by pruning or Prolog systems

19 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Related Work o LINUS/DINUS (Lavrač and Džeroski 1994) o Stochastic propositionalization (Kramer et al. 1998) o Bottom-up propositionalization (Kramer 2000) o Lazy propositionalization (Alphonse and Rouveirol 2000) o...

20 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Future Work and Conclusion o General: o Completion of formal framework o Comparison to other ILP approaches such as Progol and Tilde o Extension of feature subset selection mechanisms o Experiments with other propositional learners such as SVMs o Combination of the features produced by the approaches here o RSD: construction of first-order hypotheses o SINUS:improvements of feature elimination, bias control o RELAGGS:integration with dynamic relational databases o Promising approaches with many questions left open!