Biological Data Mining A comparison of Neural Network and Symbolic Techniques

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Slides from: Doug Gray, David Poole
Rule Extraction from trained Neural Networks Brian Hudson, Centre for Molecular Design, University of Portsmouth.
Rule extraction in neural networks. A survey. Krzysztof Mossakowski Faculty of Mathematics and Information Science Warsaw University of Technology.
Decision Tree Approach in Data Mining
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Asking translational research questions using ontology enrichment analysis Nigam Shah
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
High Throughput Computing and Protein Structure Stephen E. Hamby.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Computational Methods for Management and Economics Carla Gomes Module 3 OR Modeling Approach.
Rule Extraction From Trained Neural Networks Brian Hudson University of Portsmouth, UK.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
19 April, 2017 Knowledge and image processing algorithms for real-life applications. Dr. Maria Athelogou Principal Scientist & Scientific Liaison Manager.
00/4/103DVIP-011 Part Three: Descriptions of 3-D Objects and Scenes.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
Inductive learning Simplest form: learn a function from examples
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Software Requirements Presented By Dr. Shazzad Hosain.
10/6/2015 1Intelligent Systems and Soft Computing Lecture 0 What is Soft Computing.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
The Project – Database Design. The following is the high mark band for the Database design: Analysed a given situation and produced and analysed a given.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Classification using Decision Trees 1.Data Mining and Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Data Mining and Decision Support
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Sub-fields of computer science. Sub-fields of computer science.
General-Purpose Learning Machine
Classification of models
RESEARCH APPROACH.
SMA5422: Special Topics in Biotechnology
Bayesian Refinement of Protein Functional Site Matching
Model Development Weka User Manual.
Virtual Screening.
Machine Learning Interpretability
Introduction to Bioinformatic
Artificial Intelligence 12. Two Layer ANNs
Presentation transcript:

Biological Data Mining A comparison of Neural Network and Symbolic Techniques

Grantholder Professor Martyn Ford Centre for Molecular Design University of Portsmouth

Collaborators Dr Anthony Browne School of Computing, Information Systems and Mathematics, London Guildhall University. Professor Philip Picton School of Technology and Design, University College Northampton. Dr David Whitley Centre for Molecular Design, University of Portsmouth.

Objectives The project aims: – to develop & validate techniques for extracting explicit information from bioinformatic data –to express this information as logical rules and decision trees –to apply these new procedures to a range of scientific problems related to bioinformatics and cheminformatics

Extracting information Artificial neural networks (ANNs) can be used to identify the non-linear relationships that underlie bioinformatic data, but... –trained ANNs do not lead to a concise and explicit model –specifying the underlying structure is therefore difficult –as a result, ANNs are often regarded as ‘black boxes’

Data Mining and Neural Networks Standard data mining algorithms exist (such as ID3 or C5) so why use an ANN? It would be advantageous if the rules extracted: –Give a better fit to the data with the same number of rules (i.e. explain the data more accurately); –Give the same fit to the data with less rules (i.e. explain the data more comprehensibly); or –Give both a better fit to the data and use less rules (i.e. explain the data more comprehensibly and more accurately).

Extracting Decision Trees The TREPAN procedure (Craven,1996) –extracts decision trees from ANNs –performs better than the symbolic learning algorithms ID3 and C5 –the current implementation is restricted to a particular network architecture, but –the underlying algorithm is independent of network architecture

Trepan Builds a decision tree representing the function the ANN has learnt by recursively partitioning the input space. Draws query instances by taking into account the distribution of instances in the problem domain. For real-valued features uses kernel density estimates to generate a model of the underlying data that is used to select instances for presentation to the network.

Trepan Builds the decision tree in a best-first manner: –as each node is added the fidelity of the decision tree to the ANN is maximised –this is done by examining the significance of the distributions at consecutive levels of the tree (Kolmogorov-Smirnoff test for real valued features, chi-squared for discrete ones) Allows the user to control the size of the final tree by selecting appropriate stopping criteria.

Aims Implement the TREPAN algorithm in a portable format, independent of network architecture. Extend the algorithm to enable the extraction of regression trees. Provide a Bayesian formulation for the decision tree extraction algorithm. Compare the performance of these algorithms with existing symbolic data mining techniques (ID3/C5).

Aims Apply the extracted decision trees –to searches of bioinformatic databases protein databases genomic databases –to searches of cheminformatic databases chemical libraries natural product databases –to investigate ligand/receptor binding –to quantify molecular similarity/diversity –to identify new leads and optimise properties

Case study: ligand interaction with GPCRs 28 GPCRs a number of putative interaction sites 3 principal properties of amino acids (AAs) MLR results for 2 ligands