Rule Extraction From Trained Neural Networks Brian Hudson University of Portsmouth, UK.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Rule Extraction from trained Neural Networks Brian Hudson, Centre for Molecular Design, University of Portsmouth.
Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Jürgen Sühnel Institute of Molecular Biotechnology, Jena Centre for Bioinformatics Jena / Germany Supplementary Material:
CMPUT 466/551 Principal Source: CMU
1 PharmID: A New Algorithm for Pharmacophore Identification Stan Young Jun Feng and Ashish Sanil NISSMPDM 3 June 2005.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Herding: The Nonlinear Dynamics of Learning Max Welling SCIVI LAB - UCIrvine.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Classification Continued
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
CS Instance Based Learning1 Instance Based Learning.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Ensemble Learning (2), Tree and Forest
Application and Efficacy of Random Forest Method for QSAR Analysis
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.
Designing a Tri-Peptide based HIV-1 protease inhibitor Presented by, Sushil Kumar Singh IBAB,Bangalore Submitted to Dr. Indira Ghosh AstraZeneca India.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Wang Y 1,2, Damaraju S 1,3,4, Cass CE 1,3,4, Murray D 3,4, Fallone G 3,4, Parliament M 3,4 and Greiner R 1,2 PolyomX Program 1, Department.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Pattern Recognition 1 Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Human pose recognition from depth image MS Research Cambridge.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for Support Vector Machine with Power Series Kernel Dr. Robertas Damaševičius.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Use of Machine Learning in Chemoinformatics
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Classification and Regression Trees
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Artificial Intelligence as a Technology of Approximation Derek Partridge, University of Exeter, UK
10. Decision Trees and Markov Chains for Gene Finding.
General-Purpose Learning Machine
Instance Based Learning
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining extracting knowledge from a large amount of data
Department of Electrical Engineering
Presentation transcript:

Rule Extraction From Trained Neural Networks Brian Hudson University of Portsmouth, UK

Artificial Neural Networks Advantages High accuracy Robust Noisy data Disadvantages Lack of comprehensibilty

Trepan A method for extracting a decision tree from an artificial neural network (Craven, 1996). The tree is built by expanding nodes in a best first manner, producing an unbalanced tree. The splitting tests at the nodes are m-of-n tests e.g. 2-of-{x 1, ¬x 2, x 3 }, where the x i are Boolean conditions The network is used as an oracle to answer queries during the learning process.

Splitting Tests Start with a set of candidate tests binary tests on each value for nominal features binary tests on thresholds for real-valued features Find optimal splitting test by a beam search, initializing beam with candidate test maximizing the information gain.

Splitting Tests To each m-of-n test in the beam and each candidate test, apply two operators: m-of-(n+1) e.g. 2-of-{x 1, x 2 } => 2-of-{x 1, x 2, x 3 } (m+1)-of-(n+1) e.g. 2-of-{x 1, x 2 } => 3-of-{x 1, x 2, x 3 } Admit new tests to the beam if they increase the information gain and differ significantly (chi-squared) from existing tests.

Data Modelling The amount of training data reaching each node decreases with depth of tree. TREPAN creates new training cases by sampling the distributions of the training data empirical distributions for nominal inputs kernel density estimates for continuous inputs Apply oracle (i.e. neural network) to new training cases to assign output values.

Application to Bioinformatics Prediction of Splice Junction sites in Eukaryotic DNA

Splice Junction Sites

Consensus Sequences Donor C/G A G | G T A/G A G T Acceptor C/T C/T C/T C/T C/T C/T C/T C/T C/T C/T A G | G

EBI Dataset Clean dataset generated at EBI (Thanaraj, 1999) Donors training set: 567 positive, 943 negative test set: 229 positive, 373 negative Acceptors training set: 637 positive, 468 negative test set: 273 positive, 213 negative

Results

TREPAN Donor Tree 3 of {-2=A, -1=G, +3=A, +4=A, +5=G} Positive 869:74 Negative 43:533 C/G A G | G T A/G A G T YesNo

C5 Donor Tree (extract) p5=G p3=C or p3=T => NEGATIVE p3=A p2=G => POSITIVE p2=A p4=A or p4=G => POSITIVE p4=C or p4=T => NEGATIVE p2=C p4=A => POSITIVE else => NEGATIVE p2=T p6=A or p6=G => NEGATIVE p6=C or p6=T => POSITIVE p3=G p4=T => NEGATIVE p4=C p6=T => POSITIVE else => NEGATIVE

Trepan Acceptor Tree 1 of {-3=G, -5=G} NEGATIVE {-3=A} NEGATIVE POSITIVE NEGATIVE 2 of {+1!=G, -5=G} C/T … C/T A G | G

Application to Chemoinformatics 1. Learning general rules 2. Conformational Analysis 3. QSAR dataset

Oprea Dataset 137 diverse compounds Classification 62 leads, 75 drugs 14 descriptors (from Cerius-2) MW, MR, AlogP Ndonor, Nacceptor, Nrotbond Number of Lipinski violations T.I. Oprea, A.M. Davis, S.J. Teague & P.D. Leeson, “Is there a difference between Leads & Drugs? A Historical Perspective”, J. Chem. Inf. & Comput. Sci., 41, , (2001).

C5 tree MW <= 380 [ Mode: lead ] Rule of 5 Violations = 0 [ Mode: lead ] Hbond acceptor lead Hbond acceptor > 2 [ Mode: drug ] => drug Rule of 5 Violations > 0 [ Mode: lead ] => lead MW > 380 [ Mode: drug ] => drug

Trepan Oprea Tree 1 of { MW<296, MR<85 } Lead 52:3 Unclassified 12:49 MW<454 Drug 1:20

Conformational Analysis 300 conformations from 5ns MD simulation of rosiglitazone Classified by length of long axis into Extended – distance > 10A Folded – distance < 10A 8 torsion angles In house data.

Rosiglitazone Agonist of PPAR gamma Nuclear Receptor Regulates HDL/LDL and triglycerides Active ingredient of Avandia for Type II Diabetes

Distances

C5 tree T5 <= 269 [ Mode: extended ] T5 <= 52 [ Mode: extended ] T7 extended T7 > 185 [ Mode: folded ] T6 folded T6 > 75 [ Mode: extended ] T5 <= 41 [ Mode: folded ] T8 folded T8 > 249 [ Mode: extended ] => extended T5 > 41 [ Mode: extended ] => extended T5 > 52 [ Mode: extended ] T6 <= 73 [ Mode: extended ] T8 <= 242 [ Mode: extended ] T5 <= 7 [ Mode: extended ] T8 extended T8 > 22 [ Mode: folded ] => folded T5 > 7 [ Mode: extended ] => extended T8 > 242 [ Mode: extended ] => extended T6 > 73 [ Mode: extended ] => extended T5 > 269 [ Mode: folded ] => folded

Trepan Conformation Tree T5 < 180 Extended 133:0 Unclassified 2:5 2 of { T7 172} Folded 0:161

Ferreira Dataset “typical” QSAR dataset 48 HIV-1 Protease inhibitors Activity as pIC50 Low pIC50 < 8.0 High pIC50 > descriptors (mostly topological) R. Kiralj and M.M.C. Ferreira, “A-priori Molecular Descriptors in QSAR : a case of HIV-1 protease inhibitors I. The Chemometric Approach”, J. Mol. Graph. & Modell. 21, , (2003)

Original Results PLS model Activity determined by X9,X11,X10,X13 R 2 = 0.91, Q 2 =0.85, Ncomps=3

C5 tree X11 <= 2.5 [ Mode: low ] X13 low X13 > 16.7 [ Mode: high ] => high X11 > 2.5 [ Mode: high ] => high

Trepan Ferreira Tree 1 of { X13<16.1, X9<3.4 } High 1:24 X1<552 Low 17:1 Low 4:1 High 0:1 X6<0.04

Accuracy

Conclusions Reasonable Accuracy Comprehensible Rules

Acknowledgements David Whitley. Tony Browne. Martyn Ford. BBSRC grant reference BIO/12005.