Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Anna Atramentov and Vasant Honavar* Artificial Intelligence Laboratory Department of Computer Science Iowa State University Ames, IA 50011, USA
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
The Volcano/Cascades Query Optimization Framework
Decision Trees with Numeric Tests
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Classification Algorithms
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Classification Techniques: Decision Tree Learning
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Decision Tree Algorithm
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Induction of Decision Trees
Tree-based methods, neutral networks
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Classification.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
An Exercise in Machine Learning
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Issues with Data Mining
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Inductive learning Simplest form: learn a function from examples
Slides for “Data Mining” by I. H. Witten and E. Frank.
by B. Zadrozny and C. Elkan
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
Appendix: The WEKA Data Mining Software
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Studying the Presence of Genetically Modified Variants in Organic Oilseed Rape by using Relational Data Mining Aneta Ivanovska 1, Celine Vens 2, Sašo Džeroski.
Anna Atramentov Major: Computer Science Program of Study Committee: Vasant Honavar, Major Professor Drena Leigh Dobbs Yan-Bin Jia Iowa State University,
Learning from Observations Chapter 18 Through
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
CS690L Data Mining: Classification
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
1 Mark-A. Krogel, Magdeburg University, Knowledge Discovery and Machine Learning Group KDD Cup 2001: Gene/Protein Function Prediction Using the Multirelational.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
An Exercise in Machine Learning
1 Krogel, Rawles, Železný, Flach, Lavrač, Wrobel: Comparative Evaluation of Approaches to Propositionalization Comparative Evaluation of Approaches to.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining and Decision Support
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
DECISION TREES An internal node represents a test on an attribute.
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 6 Classification and Prediction
Introduction to Data Mining, 2nd Edition by
Data Mining Concept Description
Machine Learning: Lecture 3
Analysis models and design models
Presentation transcript:

Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence Laboratory Department of Computer Science and Graduate Program in Bioinformatics and Computational Biology Iowa State University Ames, IA 50011, USA * Support provided in part by National Science Foundation, Carver Foundation, and Pioneer Hi-Bred, Inc. Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm

Motivation Importance of multi-relational learning: Growth of data stored in MRDB Techniques for learning unstructured data often extract the data into MRDB Expanding of the techniques for multi-relational learning: Blockeel’s framework (ILP)(1998) Getoor’s framework (first order extensions of PM)(2001) Knobbe’s framework (MRDM)(1999) Problem: no experimental results available Goals Perform experiments and evaluate performance of the Knobbe’s framework Understand strengths and limits of the approach

Multi-Relational Learning Literature  Inductive Logic Programming  First order extensions of probabilistic models  Multi-Relational Data Mining  Propositionalization methods  PRMs extension for cumulative learning for learning and reasoning as agents interact with the world  Approaches for mining data in form of graph Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, 1997;Holder and Cook, 2000; Gonzalez et al., 2000 Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, 1997; Holder and Cook, 2000; Gonzalez et al., 2000

Problem Formulation Example of multi-relational database Given: Data stored in relational data base Goal: Build decision tree for predicting target attribute in the target table schema instances Department d1Math1000 d2Physics300 d3Computer Science400 Staff p1Daled1Professor k p2Martind3Postdoc30-40k p3Victord2Visitor Scientist 40-50k p4Davidd3Professor80-100k Graduate Student s1John2.04p1d3 s2Lisa3.510p4d3 s3Michel3.93p4d4 Department ID Specialization #Students Staff ID Name Department Position Salary Grad.Student ID Name GPA #Publications Advisor Department

No {d3, d4}{d1, d2} {d1, d2, d3, d4} Tree_induction(D: data) A = optimal_attribute(D) if stopping_criterion (D) return leaf(D) else D left := split(D, A) D right := split complement (D, A) child left := Tree_induction(D left ) child right := Tree_induction(D right ) return node(A, child left, child right ) Propositional decision tree algorithm. Construction phase DayOutlookTemp-reHumidityWindPlay Tennis d1SunnyHotHighWeakNo d2SunnyHotHighStrongNo d3OvercastHotHighWeakYes d4OvercastColdNormalWeakNo Outlook not sunny … … … … Temperature hot not hot No Yes {d3} {d4} sunny DayOutlookTempHum-tyWindPlayT d1SunnyHotHighWeakNo d2SunnyHotHighStrongNo DayOutlookTempHum-tyWindPlayT d3OvercastHotHighWeakYes d4OvercastColdNormalWeakNo

MR setting. Splitting data with Selection Graphs IDSpecialization#Students d1Math1000 d2Physics300 d3Computer Science400 DepartmentGraduate Student IDNameDepartmentPositionSalary p1Daled1Professor k p2Martind3Postdoc30-40k p3Victord2Visitor Scientist 40-50k p4Davidd3Professor80-100k Staff IDNameGPA#Public.AdvisorDepartment s1John2.04p1d3 s2Lisa3.510p4d3 s3Michel3.93p4d4 Staff Grad. Student GPA >2.0 Department Staff Grad.Student complement selection graphs StaffGrad. Student GPA >2.0 StaffGrad. Student IDNameDepartmentPositionSalary p1Daled1Professor70-80k IDNameDepartmentPositionSalary p4Davidd3Professor80-100k IDNameDepartmentPositionSalary p2Martind3Postdoc30-40k p3Victord2Visitor Scientist 40-50k

What is selection graph? Staff Grad.Student GPA >3.9 Grad.Student Department It corresponds to the subset of the instances from target table Nodes correspond to the tables from the database Edges correspond to the associations between tables Open edge = “have at least one” Closed edge = “have non of ” Department Staff Grad.Student Specialization =math

Automatic transforming selection graphs into SQL query Staff Grad. Student Select T0.id Select distinct T0.id From From Staff Where T0.position=Professor Position = Professor Select T0.id Select distinct T0.id From T0, Graduate_Student T1 From Staff T0, Graduate_Student T1 Where T0.id=T1.Advisor Select T0.id Select distinct T0.id From T0 From Staff T0 Where T0.id not in ( Select T1. id ( Select T1. id From Graduate_Student T1) From Graduate_Student T1) GPA >3.9 Select distinct T0. id Graduate_Student T1 From Staff T0, Graduate_Student T1 T0.id=T1.Advisor Where T0.id=T1.Advisor T0. id not in ( Select T1. id From Graduate_Student T1 From Graduate_Student T1 Where T1.GPA > 3.9) Where T1.GPA > 3.9) Generic query: select distinct T0.primary_key from table_list where join_list and condition_list

MR decision tree Staff …… …… … … Grad. Student GPA >3.9 Grad.Student Each node contains selection graph Each children selection graph is a supergraph of the parent selection graph

How to choose selection graphs in nodes? Problem: There are too many supergraph selection graphs to choose from in each node Solution: start with initial selection graph find greedy heuristic to choose supergraph selection graphs: refinements use binary splits for simplicity for each refinement get complement refinement choose the best refinement based on information gain criterion Problem: Some potentially good refinements may give no immediate benefit Solution: look ahead capability Staff …… …… … … Grad. Student GPA >3.9 Grad.Student

Refinements of selection graph add condition to the node - explore attribute information in the tables add present edge and open node – explore relational properties between the tables Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student Specialization =math

Refinements of selection graph Position = Professor Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Position != Professor Staff Grad.Student GPA >3.9 Grad.Student Department refinement complement refinement Department Staff Grad.Student add condition to the node add condition to the node add present edge and open node Specialization =math

Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department GPA >2.0 Staff Grad.Student GPA >3.9 Grad.Student Department Grad.Student GPA >2.0 Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student add condition to the node add condition to the node add present edge and open node refinement complement refinement Specialization =math

Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department #Students >200 Staff Grad.Student GPA >3.9 Grad.Student Department #Students >200 Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student add condition to the node add condition to the node add present edge and open node refinement complement refinement Specialization =math

Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student add condition to the node add present edge and open node add present edge and open node refinement complement refinement Note: information gain = 0 Specialization =math

Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student refinement complement refinement add condition to the node add present edge and open node add present edge and open node Specialization =math

Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student DepartmentStaff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student refinement complement refinement add condition to the node add present edge and open node add present edge and open node Specialization =math

Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student DepartmentGrad.S Staff Grad.Student GPA >3.9 Grad.Student DepartmentGrad.S Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student refinement complement refinement add condition to the node add present edge and open node add present edge and open node Specialization =math

Look ahead capability Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department refinement complement refinement Specialization =math

Look ahead capability Department Staff Grad.Student #Students > 200 Staff Grad.Student GPA >3.9 Grad.Student Department refinement complement refinement #Students > 200 Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Specialization =math

MR decision tree algorithm. Construction phase Staff …… …… …… Grad.Student StaffGrad. Student GPA >3.9 Staff Grad.Student GPA >3.9 Grad.Student for each non-leaf node: consider all possible refinements and their complements of the node’s selection graph choose the best ones based on information gain criterion create children nodes

MR decision tree algorithm. Classification phase Staff …… …… … … Grad. Student GPA >3.9 Grad.Student StaffGrad. Student GPA >3.9 Department Spec=math StaffGrad. Student GPA >3.9 Department Spec=physics Position = Professor …………… k80-100k for each leaf: apply selection graph of the leaf to the test data classify resulting instances with classification of the leaf

Experimental results. Mutagenesis Most widely DB used in ILP. Describes molecules of certain nitro aromatic compounds. Goal: predict their mutagenic activity (label attribute) – ability to cause DNA to mutate. High mutagenic activity can cause cancer. Class distribution. CompoundsActiveInactiveTotal Regression friendly Regression unfriendly Total levels of background knowledge: B0, B1, B2, B3, B4. They provide richer descriptions of the examples. The first three levels (B0, B1, B2) are used only.

Experimental results. Mutagenesis Results of 10-fold cross-validation for regression friendly set. SystemsAccuracy (%)Time (secs.) B0B1B2B0B1B2 Progol Progol k64k42k FOIL TILDE MRDTL Size of decision trees. SystemsNumber of nodes B0B1B2 MRDTL15351

Experimental results. Mutagenesis Results of leave-one-out cross-validation for regression unfriendly set. BackgroundAccuracyTime#Nodes B070%0.6 secs.1 B181%86 secs.24 B281%60 secs.22 Two recent approaches (Sebag and Rauveirol, 1997) and (Kramer and De Raedt, 2001) using B3 have achieved 93.6% and 94.7%, respectively for mutagenesis database.

Experimental results. KDD Cup 2001 Consists of a variety of details about the various genes of one particular type of organism. Genes code for proteins, and these proteins tend to localize in various parts of cells and interact with one another in order to perform crucial functions. Task: Prediction of gene/protein localization (15 possible values) Target table: Gene Target attribute: Localization 862 training genes, 381 test genes.  Challenge: many attribute values are missing.  Approach: using a special value to encode a missing value. Result: accuracy of 50% Have to find good techniques for filling in missing values.

Experimental results. KDD Cup 2001 Approach: Replacing missing values by the most common value of the attribute for the class. Results: - accuracy of around 85% with a decision tree of 367 nodes, with no limit in the number of times an association can be instantiated. - accuracy of 80%, when limiting the number of times an association can be instantiated. - accuracy of around 75% is obtained when following associations only in the forward direction. This shows that providing reasonable guesses for missing values can significantly enhance the performance of MRDTL on real world data sets. In practice, since the class labels for test data are unknown, it is not possible to apply this method. Approach: Extension of the Naïve Bayes algorithm for relational data Result: -no improvement comparing to the first approach Have to incorporate handling missing values into decision tree algorithm

Experimental results. Adult database Result after removal of missing values and using original train/test split: 82.2%. Filling missing values with Naïve Bayes approach yields 83% C4.5 result: 84.46% TrainingTestTotal >50k<=50k>50k<=50k With missing values W/o missing values Suitable for propositional learning. One table, 6 numerical attributes, 8 nominal attributes. Information from 1994 census. Task: determine whether a person makes over 50k a year. Class distribution for adult database:

Summary the algorithm is a promising alternative to existing algorithms, such as Progol, Foil, and Tilde the running time is comparable with the best existing approaches if equipped with principled approaches to handle missing values it is an effective algorithm for learning real-world relational data the approach is an extension of propositional learning, and can be successfully applied for propositional learning Questions: - why can’t we split the data based on the value of the attribute in arbitrary table right away? - is there less restrictive and more simple way of representing the splits of data than selection graphs? - the running time for computing the first nodes in decision tree is much less then for the rest of the nodes. Is it unavoidable? Can we implement the same idea more efficiently?

Future work Incorporation of the more sophisticated techniques for handling missing values Incorporating of more sophisticated pruning techniques or complexity regularizations More extensive evaluation of MRDTL on real-world data sets Development of ontology-guided multi-relational decision tree learning algotihms to generate classifiers at multiple levels of abstraction [Zhang et al., 2002] Development of variants of MRDTL for classification tasks where the classes are not disjoint, based on the recently developed propositional decision tree counterparts of such algorithms [Caragea et al., 2002] Development of variants of MRDTL that can learn from heterogeneous, distributed, autonomous data sources, based on recently developed techniques for distributed learning and ontology based data integration