Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.

Slides:



Advertisements
Similar presentations
Negative Selection Algorithms at GECCO /22/2005.
Advertisements

Slides from: Doug Gray, David Poole
DECISION TREES. Decision trees  One possible representation for hypotheses.
Evolutionary Neural Logic Networks for Breast Cancer Diagnosis A.Tsakonas 1, G. Dounias 2, E.Panourgias 3, G.Panagi 4 1 Aristotle University of Thessaloniki,
Data Mining Classification: Alternative Techniques
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.
Machine Learning Neural Networks
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Classification.
WELCOME TO THE WORLD OF FUZZY SYSTEMS. DEFINITION Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept.
Chapter 5 Data mining : A Closer Look.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Artificial Intelligence (AI) Addition to the lecture 11.
Data Mining Techniques
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
An Introduction to Artificial Intelligence and Knowledge Engineering N. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering,
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Chapter 9 Neural Network.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
1 Introduction to Neural Networks And Their Applications.
Classification Techniques: Bayesian Classification
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Fuzzy Systems Michael J. Watts
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 12-1 Chapter 12 Advanced Intelligent Systems.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Data Mining and Decision Support
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Application of Data Mining Techniques on Survey Data using R and Weka
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
1 Azhari, Dr Computer Science UGM. Human brain is a densely interconnected network of approximately neurons, each connected to, on average, 10 4.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
Machine Learning with Spark MLlib
Fuzzy Systems Michael J. Watts
School of Computer Science & Engineering
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Chapter 12 Advanced Intelligent Systems
Intelligent Systems and
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Nearest Neighbors CSC 576: Data Mining.
A task of induction to find patterns
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology

Krzysztof Mossakowski – Computational Intelligence Seminar –  Black boxes  Rule extraction  Neural networks for rule extraction  Sample problems  Bibliography

Krzysztof Mossakowski – Computational Intelligence Seminar – BLACK BOXES

Krzysztof Mossakowski – Computational Intelligence Seminar – Black-Box Models  Aims of many data analysis’s methods (pattern recognition, neural networks, evolutionary computation and related): building predictive data models adapting internal parameters of the data models to account for the known (training) data samples allowing for predictions to be made on the unknown (test) data samples

Krzysztof Mossakowski – Computational Intelligence Seminar – Dangers  Using a large number of numerical parameters to achieve high accuracy overfitting the data many irrelevant attributes may contribute to the final solution

Krzysztof Mossakowski – Computational Intelligence Seminar – Drawbacks  Combining predictive models with a priori knowledge about the problem is difficult  No systematic reasoning  No explanations of recommendations  No way to control and test the model in the areas of the future  Unacceptable risk in safety-critical domains (medical, industrial)

Krzysztof Mossakowski – Computational Intelligence Seminar – Reasoning with Logical Rules  More acceptable to human users  Comprehensible, provides explanations  May be validated by human inspection  Increases confidence in the system

Krzysztof Mossakowski – Computational Intelligence Seminar – Machine Learning  Explicit goal: the formulation of symbolic inductive methods methods that learn from examples  Discovering rules that could be expressed in natural language rules similar to those a human expert might create

Krzysztof Mossakowski – Computational Intelligence Seminar – Neural Networks as Black Boxes  Perform mysterious functions  Represent data in an incomprehensible way  Two issues: 1.understanding what neural networks really do 2.using neural networks to extract logical rules describing the data.

Krzysztof Mossakowski – Computational Intelligence Seminar – Techniques for Feretting Out Information from Trained ANN  Sensitivity analysis  Neural Network Visualization  Rule Extraction

Krzysztof Mossakowski – Computational Intelligence Seminar – Sensitivity Analysis  Probe ANN with test inputs, and record the outputs  Determining the impact or effect of an input variable on the output hold the other inputs to some fixed value (e.g. mean or median value), vary only the input while monitoring the change in outputs

Krzysztof Mossakowski – Computational Intelligence Seminar – Automated Sensitivity Analysis  For backpropagation ANN: keep track of the error terms computed during the back propagation step measure of the degree to which each input contributes to the output error  the largest error  the largest impact the relative contribution of each input to the output errors can be computed by acumulating errors over time and normalizing them

Krzysztof Mossakowski – Computational Intelligence Seminar – Neural Network Visualization  Using power of human brain to see and recognize patterns in two- and three-dimensional data

Krzysztof Mossakowski – Computational Intelligence Seminar – Visualization Samples

Krzysztof Mossakowski – Computational Intelligence Seminar – ... ... ...  ...  23KA23KA23KA23KA weight of connection from input neuron representing Ace of Hearts to the last hidden neuron weight of connection from the first hidden neuron to the output neuron

Krzysztof Mossakowski – Computational Intelligence Seminar – RULE EXTRACTION

Krzysztof Mossakowski – Computational Intelligence Seminar – Propositional Logic Rules  Standard crisp (boolean) propositional rules:  Fuzzy version is a mapping from X space to the space of fuzzy class labels  Crisp logic rules should give precise yes or no answers

Krzysztof Mossakowski – Computational Intelligence Seminar – Condition Part of Logic Rule  Defined by a conjuction of logical predicate functions  Usually predicate functions are tests on a single attribute if feature k has values that belong to a subset (for discrete features) or to an interval or (fuzzy) subsets for attribute K

Krzysztof Mossakowski – Computational Intelligence Seminar – Decision Borders (a) - general clusters (b) - fuzzy rules (c) - rough rules (d) - crisp logical rules source: Duch et.al, Computational Intelligence Methods..., 2004

Krzysztof Mossakowski – Computational Intelligence Seminar – Linguistic Variables  Attempts to verbalize knowledge require symbolic inputs (called linguistic variables)  Two types of linguistic variables: context-independent - identical in all regions of the feature space context-dependent - may be different in each rule

Krzysztof Mossakowski – Computational Intelligence Seminar – Decision Trees  Fast and easy to use  Hierarchical rules that they generate have somewhat limited power source: Duch et.al, Computational Intelligence Methods..., 2004

Krzysztof Mossakowski – Computational Intelligence Seminar – NEURAL NETWORKS FOR RULE EXTRACTION

Krzysztof Mossakowski – Computational Intelligence Seminar – Neural Rule Extraction Methods  Neural networks are regarded commonly as black boxes but can be used to provide simple and accurate sets of logical rules  Many neural algorithms extract logical rules directly from data have been devised

Krzysztof Mossakowski – Computational Intelligence Seminar – Categorizing Rule-Extraction Techniques  Expressive power of extracted rules  Translucency of the technique  Specialized network training schemes  Quality of extracted rules  Algorithmic complexity  The treatment of linguistic variables

Krzysztof Mossakowski – Computational Intelligence Seminar – Expressive Power of Extracted Rules  Types of extracted rules: crisp logic rules fuzzy logic rules first-order logic form of rules - rules with quantifiers and variables

Krzysztof Mossakowski – Computational Intelligence Seminar – Translucency  The relationship between the extracted rules and the internal architecture of the trained ANN  Categories: decompositional (local methods) pedagogical (global methods) eclectic

Krzysztof Mossakowski – Computational Intelligence Seminar – Translucency - Decompositional Approach  To extract rules at the level of each individual hidden and output unit within the trained ANN some form of analysis of the weight vector and associated bias of each unit rules with antecedents and consequents expressed in terms which are local to the unit a process of aggregation is required

Krzysztof Mossakowski – Computational Intelligence Seminar – Translucency - Pedagogical Approach  The trained ANN viewed as a black box  Finding rules that map inputs directly into outputs  Such techniques typically are used in conjunction with a symbolic learning algorithm use the trained ANN to generate examples for the training algorithm

Krzysztof Mossakowski – Computational Intelligence Seminar – Specialized network training schemes  If specialized ANN training regime is required  It provides some measure of the "portability" of the rule extraction technique across various ANN architectures  Underlaying ANN can be modifief or left intact by the rule extraction process

Krzysztof Mossakowski – Computational Intelligence Seminar – Quality of extracted rules  Criteria: accuracy - if can correctly classify a set of previously unseen examples fidelity - if extracted rules can mimic the behaviour of the ANN consistency - if generated rules will produce the same classification of unseen examples comprehensibility - size of the rules set and number of antecendents per rule

Krzysztof Mossakowski – Computational Intelligence Seminar – Algorithmic complexity  Important especially for decompositional approaches to rule extraction usually the basic process of searching for subsets of rules at the level of each (hidden and output) unit in the trained ANN is exponential in the number of inputs to the node

Krzysztof Mossakowski – Computational Intelligence Seminar – The Treatment of Linguistic Variables  Types of variables which limit usage of techniques: binary variables discretized inputs continuous variables that are converted to linguistic variables automatically

Krzysztof Mossakowski – Computational Intelligence Seminar – Techniques Reviews  Andrews et.al, A survey and critique..., techniques described in detail  Tickle et.al, The truth will come to light..., more techiques added  Jacobsson, Rule extraction from recurrent..., 2005, techniques for recurrent neural networks

Krzysztof Mossakowski – Computational Intelligence Seminar – SAMPLE PROBLEMS

Krzysztof Mossakowski – Computational Intelligence Seminar – Wisconsin Breast Cancer  Data details: 699 cases 9 attributes f1-f9 (1-10 integer values) two classes: 458 benign (65.5%) 241 malignant (34.5%). for 16 instances one attribute is missing source:

Krzysztof Mossakowski – Computational Intelligence Seminar – Wisconsin Breast Cancer - results  Single rule: IF f2 = [1,2] then benign else malignant 646 correct (92.42%), 53 errors  5 rules for malignant: R1:f1<9 & f4<4 & f6<2 & f7<5 R2:f1<10 & f3<4 & f4<4 & f6<3 R3:f1<7 & f3<9 & f4<3 & f6=[4,9] & f7<4 R4:f1=[3,4] & f3<9 & f4<10 & f6<6 & f7<8 R5:f1<6 & f3<3 & f7<8 ELSE: benign 692 correct (99%), 7 errors source:

Krzysztof Mossakowski – Computational Intelligence Seminar – The MONKs Problems  Robots are described by six diferent attributes: x1: head_shape  round square octagon x2: body_shape  round square octagon x3: is_smiling  yes no x4: holding  sword balloon flag x5: jacket_color  red yellow green blue x6: has_tie  yes no source: ftp://ftp.funet.fi/pub/sci/neural/neuroprose/thrun.comparison.ps.Z

Krzysztof Mossakowski – Computational Intelligence Seminar – The MONKs Problemscont.  Binary classification task  Each problem is given by a logical description of a class  Only a subset of all 432 possible robots with its classification is given source: ftp://ftp.funet.fi/pub/sci/neural/neuroprose/thrun.comparison.ps.Z

Krzysztof Mossakowski – Computational Intelligence Seminar – The MONKs Problemscont.  M1: (head_shape = body_shape) or (jacket_color = red) 124 randomly selected training samples  M2: exactly two of the six attributes have their first value 169 randomly selected training samples  M3: (jacket_color is green and holding a sword) or (jacket_color is not blue and body shape is not octagon) 122 randomly selected training samples with 5% misclassifications (noise in the training set)

Krzysztof Mossakowski – Computational Intelligence Seminar – M1, M2, M3 – best results  C-MLP2LN algorithm (100% accuracy): M1: 4 rules + 2 exception, 14 atomic formulae M2: 16 rules and 8 exceptions, 132 atomic formulae M3: 33 atomic formulae source:

Krzysztof Mossakowski – Computational Intelligence Seminar – BIBLIOGRAPHY

Krzysztof Mossakowski – Computational Intelligence Seminar – References  Duch, W., Setiono, R., Zurada, J.M., Computational Intelligence Methods for Rule-Based Data Understanding, Proceedings of the IEEE, 2004, vol. 92, Issue 5, pp

Krzysztof Mossakowski – Computational Intelligence Seminar – Surveys  R. Andrews, J. Diederich, and A. B. Tickle, A survey and critique of techniques for extracting rules from trained artificial neural networks, Knowl.- Based Syst., vol. 8, pp. 373–389, 1995  A. B. Tickle, R. Andrews, M. Golea, and J. Diederich, The truth will come to light: Directions and challenges in extracting the knowledge embedded within trained artificial neural networks, IEEE Trans. Neural Networks, vol. 9, pp. 1057–1068, Nov

Krzysztof Mossakowski – Computational Intelligence Seminar – Surveyscont.  I. Taha, J. Ghosh, Symbolic interpretation of artifcial neural networks, Knowledge and Data Engineering vol. 11, pp , 1999  H. Jacobsson, Rule extraction from recurrent neural networks: A Taxonomy and Review, 2005 [citeseer]

Krzysztof Mossakowski – Computational Intelligence Seminar – Problems  S.B. Thrun et al., The MONK’s problems: a performance comparison of different learning algorithms, Carnegie Mellon University, CMU-CS (December 1991)  (prof. Włodzisław Duch)