Biological Data Mining A comparison of Neural Network and Symbolic Techniques

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu
Rule Extraction from trained Neural Networks Brian Hudson, Centre for Molecular Design, University of Portsmouth.
Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
Decision Tree Approach in Data Mining
Data Mining Classification: Alternative Techniques
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Structural bioinformatics
Decision Tree Rong Jin. Determine Milage Per Gallon.
Rule Extraction From Trained Neural Networks Brian Hudson University of Portsmouth, UK.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Classification Continued
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Classification.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Modified by Longin Jan Latecki Some slides by Piyush Rai Intro.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Data Mining – Intro.
Neural Networks in Data Mining “An Overview”
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Data Mining: Classification
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Predicting the Cellular Localization Sites of Proteins Using Decision Tree and Neural Networks Yetian Chen
Data Mining Techniques in Stock Market Prediction
NEURAL NETWORKS FOR DATA MINING
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
CS690L Data Mining: Classification
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Chong Ho Yu.  Data mining (DM) is a cluster of techniques, including decision trees, artificial neural networks, and clustering, which has been employed.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Regression Methods. Linear Regression  Simple linear regression (one predictor)  Multiple linear regression (multiple predictors)  Ordinary Least Squares.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
- Sachin Singh. Data Mining - Concepts Extracting meaningful knowledge from huge chunk of ‘raw’ data. Types –Association –Classification –Temporal.
Data Mining – Intro.
General-Purpose Learning Machine
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 6 Classification and Prediction
RESEARCH APPROACH.
SMA5422: Special Topics in Biotechnology
NBA Draft Prediction BIT 5534 May 2nd 2018
Decision Trees Greg Grudic
Prepared by: Mahmoud Rafeek Al-Farra
Presentation transcript:

Biological Data Mining A comparison of Neural Network and Symbolic Techniques

People Centre for Molecular Design, University of Portsmouth Professor Martyn Ford Dr David Whitley Dr Shuang Cang (Mar - Sept 2000) Dr Abul Azad (Jan ) Dr Antony Browne, London Guildhall University. Professor Philip Picton, University College Northampton.

1. Objectives The project aims: –to develop and validate techniques for extracting explicit information from bioinformatic data –to express this information as logical rules and decision trees –to apply these new procedures to a range of scientific problems related to bioinformatics and cheminformatics

2. Methods for Extracting Information Artificial Neural Networks –good predictive accuracy –hard to decipher –often regarded as ‘black boxes’ Decision Trees –symbolic rules easier to interpret –more likely to reveal relationships in the data –allow behaviour of individual cases to be explained

3. Extracting Decision Trees The Trepan procedure (Craven,1996) extracts decision trees from a neural network and a set of training cases by recursively partitioning the input space. The decision tree is built in a best-first manner, expanding the tree at nodes where there is greatest potential for increasing the fidelity of the tree to the network.

4. Splitting Tests The splitting tests at the nodes are m-of-n expressions, e.g. 2-of-{x 1, ¬x 2, x 3 }, where the x i are Boolean conditions. Start with a set of candidate tests –binary tests on each value for nominal features –binary tests on thresholds for real-valued features Use a beam search with a beam width of two. Initialize the beam with the candidate test that maximizes the information gain.

5. Splitting Tests (II) To each m-of-n test in the beam and each candidate test, apply two operators: m-of-n+1e.g. 2-of-{x 1, x 2 } => 2-of-{x 1, x 2, x 3 } m+1-of-n+1e.g. 2-of-{x 1, x 2 } => 3-of-{x 1, x 2, x 3 } Admit new tests to the beam if they increase the information gain and are significantly different (chi-squared) from existing tests.

6. Example: Substance P Binding to NK1 Receptors Substance P is a neuropeptide with amino acid sequence H-Arg-Pro-Lys-Pro-Gln-Gln-Phe-Phe-Gly-Leu-Met-NH 2 Wang et al. (1993) used the multipin technique to synthesize 512 = 2 9 stereoisomers generated by systematic replacement of L- by D-amino acids at 9 positions, and measured binding potencies to central NK1 receptors. The objective was to identify the positions at which stereo-chemistry affects binding strength.

7. Application of Trepan A series of networks with 9:9:1 architectures were trained using 90% of the data as a training set. For each network a decision tree was grown using Trepan. The positions identified agree with the FIRM (Formal Inference-based Recursive Modelling) analysis of Young and Hawkins (1999).

8. A Typical Trepan Tree

9. Future Work Complete the implementation of the Trepan algorithm. –model the distribution of the input data and generate from this a set of query instances that are classified using the network and used as additional training cases during extraction of the tree. Extend the algorithm to enable the extraction of regression trees. Provide a Bayesian formulation for the decision tree extraction algorithm. Compare the performance of these algorithms with existing symbolic data mining techniques (ID3/C5). Apply Trepan to ligand-receptor binding problems.