NeSy-2006, ECAI-06 Workshop 29 August, 2006, Riva del Garda, Italy Jim Prentzas & Ioannis Hatzilygeroudis Construction of Neurules from Training Examples:

Slides:

Advertisements

Similar presentations

Problems and Their Classes

Advertisements

University of Patras and CTI, Patras Multi-inference with Multi- neurules I. Hatzilygeroudis J. Prentzas 2 nd Panhellenic Conference on Artificial Intelligence,

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :

Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort

AI - Week 13 Knowledge Representation, Logic, Semantic Web Lee McCluskey, room 2/07

1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Describing Syntax and Semantics

Experimental Evaluation

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Sampling Distributions

Knowledge Acquisition from Game Records Takuya Kojima, Atsushi Yoshikawa Dept. of Computer Science and Information Engineering National Dong Hwa University.

Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,

1 USING EXPERT SYSTEMS TECHNOLOGY FOR STUDENT EVALUATION IN A WEB BASED EDUCATIONAL SYSTEM Ioannis Hatzilygeroudis, Panagiotis Chountis, Christos Giannoulis.

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

Measurement theory - for the interested student Erland Jonsson Department of Computer Science and Engineering Chalmers University of Technology.

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.

LDK R Logics for Data and Knowledge Representation Modeling First version by Alessandro Agostini and Fausto Giunchiglia Second version by Fausto Giunchiglia.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Benk Erika Kelemen Zsolt

Learning from Observations Chapter 18 Through

CHAPTER 18 SECTION 1 – 3 Learning from Observations.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.

CpSc 810: Machine Learning Evaluation of Classifier.

1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,

Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.

Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.

Machine Learning Chapter 5. Evaluating Hypotheses

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classiﬁcation models with taxonomy.

Mahmut Ali GÖKÇEIndustrial Systems IEU Introduction to System Engineering ISE 102 Spring 2007 Notes & Course Materials Asst. Prof. Dr. Mahmut.

1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,

1 Estimation of Population Mean Dr. T. T. Kachwala.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Chapter 18 Section 1 – 3 Learning from Observations.

Robust Estimation Course web page: vision.cis.udel.edu/~cv April 23, 2003  Lecture 25.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Learning From Observations Inductive Learning Decision Trees Ensembles.

Artificial Intelligence Knowledge Representation.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.

Learning from Observations

Learning from Observations

DECISION TREES An internal node represents a test on an attribute.

Introduce to machine learning

Data Science Algorithms: The Basic Methods

Analytical Learning Discussion (4 of 4):

Presented By S.Yamuna AP/CSE

Perceptrons Lirong Xia.

Feature Selection for Pattern Recognition

A Fault-Tolerant Routing Strategy for Fibonacci-Class Cubes

Ensemble learning.

Learning from Observations

Inference on the Mean of a Population -Variance Known

Learning from Observations

Distributed Edge Computing

Data Mining CSCI 307, Spring 2019 Lecture 21

Perceptrons Lirong Xia.

Data Mining CSCI 307, Spring 2019 Lecture 23

Presentation transcript:

NeSy-2006, ECAI-06 Workshop 29 August, 2006, Riva del Garda, Italy Jim Prentzas & Ioannis Hatzilygeroudis Construction of Neurules from Training Examples: A Thorough Investigation University of Patras, Dept of Computer Engin. & Informatics & TEI of Lamia, Dept of Informatics & Computer Technology GREECE

Outline Neurules: An overview Neurules: Syntax and semantics Production process-Splitting Neurules and Generalization Conclusions

Outline Neurules: An overview Neurules: Syntax and semantics Production process-Splitting Neurules and Generalization Conclusions

Neurules: An Overview (1) Neurules integrate symbolic (propositional) rules and the adaline neural unit Give pre-eminence to the symbolic framework Neurules were initially designed as an improvement to propositional rules (as far as efficiency is concerned) and produced from them To facilitate knowledge acquisition, a method for producing neurules from empirical data was specified

Neurules: An Overview (2) Preserve naturalness and modularity of production rules in some (large?) degree Reduce the size of the produced knowledge base Increase inference efficiency Allow for efficient and natural explanations

Outline Neurules: An overview Neurules: Syntax and semantics Production process-Splitting Neurules and Generalization Conclusions

Neurules: Syntax and Semantics (1) C i : conditions (‘fever is high’) D: conclusion (‘disease is inflammation’) sf 0, sf i : bias, significance factors (sf 0 ) if C 1 (sf 1 ), C 2 (sf 2 ), …, C n (sf n ) then D

Neurules: Syntax and Semantics (2)... D (sf 1 ) (sf 2 ) (sf n ) (sf 0 ) C 1 C 2 C n 1 f(x) x C i  {1, -1, 0} {true, false, unknown} D  {1, -1} {success, failure}

Outline Neurules: An overview Neurules: Syntax and semantics Production process-Splitting Neurules and Generalization Conclusions

Initial Neurules Construction 1.Make one initial neurule for each possible conclusion either intermediate or final (i.e. for each value of intermediate and output attributes according to dependency information). 2.The conditions of each initial neurule include all attributes that affect its conclusion, according to dependency information, and all their values. 3.Set the bias and significant factors to some initial values (e.g. 0).

Neurules Production Process 1.Use dependency information to construct the initial neurules. 2.For each initial neurule create its training set from the available empirical data. 3. Train each initial neurule with its training set 3.1 If the training succeeds, produce the resulted neurule 3.2 If not, split the training set in two subsets of close examples and apply recursively steps 3.1 and 3.2 for each subset.

Splitting Strategies (1) DEFINITIONS training example: [v 1 v 2 … v n d] success example: d = 1, failure example: d = -1 closeness: the number of common v i between two success examples least closeness pair (LCP): a pair of success examples with the least closeness in a training set

Splitting Strategies (2) REQUIREMENTS 1.Each subset contains all failure examples, to avoid misactivations 2.Each subset contains at least one success example, to assure activation of the corresponding neurule 3.The two subsets should not contain common success examples, to avoid activation of more than one neurule for the same data

Splitting Strategies (1) STRATEGY: CLOSENESS-SPLIT 1.Find the LCPs of the training set S and choose one. Its elements are called pivots. 2.Create two subsets of S, each containing one of the pivots and the success examples of S that are closer to its pivot. 3.Insert in both subsets all the failure examples of S. 4.Train two copies of the initial neurule, one with each subset.

Splitting Strategies (2) C1C2C3C4C5C6C7C8C9C10C11C12C13D P1 P2 P3 P4 P5 AN EXAMPLE: Training Set

Splitting Strategies (3) P1-P5: Success examples, F: Set of failure examples AN EXAMPLE: Splitting Tree

Splitting Strategies (4) (-13.5) if venous-conc is slight (12.4), venous-conc is moderate (8.2), venous-conc is normal (8.0), venous-conc is high (1.2), blood-conc is moderate (11.6), blood-conc is slight (8.3), blood-conc is normal (4.4), blood-conc is high (1.6), arterial-conc is moderate (8.8), arterial-conc is slight (-5.7), cap-conc is moderate (8.4), cap-conc is slight (4.5), scan-conc is normal (8.4) then disease is inflammation AN EXAMPLE: A produced neurule

Splitting Strategies (5) STRATEGY: ALTERN-SPLIT1 1.If all success examples or only failure examples are misclassified, use closeness based split. 2.If some of the success examples and none of the failure examples are misclassified, split the training set in two subsets: one containing the correctly classified success examples and one containing the misclassified success examples. Add all failure examples in both subsets. 3.If some (not all) of the success and some or all of the failure examples are misclassified, split the training set in two subsets: one containing the correctly classified success examples and the other the misclassified success examples. Add all failure examples in both subsets.

Splitting Strategies (6) STRATEGY: ALTERN-SPLIT2 1.If all success examples or only failure examples are misclassified, use closeness based split. 2.If some of the success examples and none of the failure examples are misclassified, split the training set in two subsets: one containing the correctly classified success examples and one containing the misclassified success examples. Add all failure examples in both subsets. 3.If some of the success and failure examples are misclassified, use closeness based split.

Splitting Strategies (7) LCP SELECTION HEURISTICS Random Choice (RC) –Pick up an LCP at random Best Distribution (BD) –Choose the LCP that results in distribution of the elements of the other LCPs in different subsets Mean Closeness (MC) –Choose the LCP that creates subsets with the greatest mean closeness

Experimental Results (1) Dataset CLOSENESS-SPLITALTERN-SPLIT1ALTERN-SPLIT2 RCMCBDRCMCBDRCMCBD Monks1_train (124) Monks2_train (169) Monks3_train (122) Tic-Tac-Toe (958) Car (1728) Nursery (12960) Comparing LCP Selection Heuristics

Experimental Results (1) Dataset CLOSENESS-SPLITALTERN-SPLIT1ALTERN-SPLIT2 RCMCBDRCMCBDRCMCBD Monks1_train (124) Monks2_train (169) Monks3_train (122) Tic-Tac-Toe (958) Car (1728) Nursery (12960) Comparing LCP Selection Heuristics

Experimental Results (2) Dataset CLOSENESS-SPLITALTERN-SPLIT1ALTERN-SPLIT2 RCMCBDRCMCBDRCMCBD Monks1_train (124) Monks2_train (169) Monks3_train (122) Tic-Tac-Toe (958) Car (1728) Nursery (12960) Comparing Splitting Strategies

Experimental Results (3) LCP Selection Heuristics –BD performs better in most cases –MC although is the computationally most expensive is rather the worst –RC although the simplest does quite well Splitting Strategies –CLOSENESS-SPLIT does better than the others –ALTER-SPLIT2 does better than ALTER-SPLIT1 –The ‘closeness’ heuristic is proved to be a good choice

Outline Neurules: An overview Neurules: Syntax and semantics Production process-Splitting Neurules and Generalization Conclusions

Neurules and Generalization Generalization is an important characteristic of NN-based systems Neurules never tested as far as their generalization capabilities are concerned, due to the way they were used We present here an investigation of their generalization capabilities in comparison with the Adaline Unit and the BPNN We use the same data sets used for the comparison of the strategies

Experimental Results (1) Impact of LCP Selection Heuristics on Generalization Dataset RCMCBD Monks1100% Monks296.30%96.99%97.92% Monks392.36%93.52%96.06% Tic-Tac-Toe98.85%97.50%98.12% Car94.44%94.56%94.50% Nursery99.63%99.53%99.52%

Experimental Results (2) Neurules Generalization vs Adaline and BPNN Dataset Adaline UnitNeurulesBPNN Monks167.82%100% Monks243.75%97.92%100% Monks392.13%96.06%97.22% Tic-Tac-Toe61.90%98.85%98.23% Car78.93%94.56%95.72% Nursery82.26%99.63%

Experimental Results (3) LCP Selection Heuristics Impact –None of RC, MC, BD has a clearly better impact, but RC and BD seem to do better than MC Neurules –Do quite better than Adaline itself –Less good, but very close to BPNN –Creation of the BPNN more time consuming than that of a neurule base

Outline Neurules: An overview Neurules: Syntax and semantics Production process-Splitting Neurules and Generalization Conclusions

The “closeness” heuristic used in the process of production of neurules is proved to be quite effective The random choice selection heuristic does adequately well Neurules generalize quite well

Future Plans Compare ‘closeness’ with other known machine learning heuristics (e.g. distance- based heuristics) Use neurules for rule extraction