Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004

Slides:



Advertisements
Similar presentations
Adjusting Active Basis Model by Regularized Logistic Regression
Advertisements

Perceptron Learning Rule
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Computer Science Department FMIPA IPB 2003 Neural Computing Yeni Herdiyeni Computer Science Dept. FMIPA IPB.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
School of Cybernetics, School of Systems Engineering, University of Reading Presentation Skills Workshop March 22, ‘11 Diagnosis of Breast Cancer by Modular.
Neural Computing Group Department of Computing University of Surrey In-situ Learning in Multi-net Systems 25 th August 2004
Machine Learning Neural Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Support Systems
Integrated Learning in Multi-net Systems Neural Computing Group Department of Computing University of Surrey Matthew Casey 6 th February 2004
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Learning From Data Chichang Jou Tamkang University.
An Illustrative Example
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Three kinds of learning
Chapter Seven The Network Approach: Mind as a Web.
L ++ An Ensemble of Classifiers Approach for the Missing Feature Problem Using learn ++ IEEE Region 2 Student Paper Contest University of Maryland Eastern.
1 Combining Multiple Modes of Information using Unsupervised Neural Classifiers Neural Computing Group, Department.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
LOGO Classification III Lecturer: Dr. Bo Yuan
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Aula 4 Radial Basis Function Networks
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
A Value-Based Approach for Quantifying Scientific Problem Solving Effectiveness Within and Across Educational Systems Ron Stevens, Ph.D. IMMEX Project.
NEURAL NETWORKS Introduction
Radial-Basis Function Networks
Radial Basis Function Networks
Machine Learning CS 165B Spring 2012
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Issues with Data Mining
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
Chapter 9 Neural Network.
NEURAL NETWORKS FOR DATA MINING
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo.
Benk Erika Kelemen Zsolt
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Model-based learning: Theory and an application to sequence learning P.O. Box 49, 1525, Budapest, Hungary Zoltán Somogyvári.
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
IE 585 History of Neural Networks & Introduction to Simple Learning Rules.
COSC 4426 AJ Boulay Julia Johnson Artificial Neural Networks: Introduction to Soft Computing (Textbook)
Ferdinando A. Mussa-Ivaldi, “Modular features of motor control and learning,” Current opinion in Neurobiology, Vol. 9, 1999, pp The focus on complex.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Neural networks 2. 2 Introduction: Neural networks The nervous system contains 10^12 interconnected neurons.
“Principles of Soft Computing, 2 nd Edition” by S.N. Sivanandam & SN Deepa Copyright  2011 Wiley India Pvt. Ltd. All rights reserved. CHAPTER 2 ARTIFICIAL.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
CS 9633 Machine Learning Support Vector Machines
Neuro-Computing Lecture 4 Radial Basis Function Network
Evolutionary Ensembles with Negative Correlation Learning
The Network Approach: Mind as a Web
Presentation transcript:

Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April

2 Neural Computing Group Novel neural network architectures –Multi-net systems: ensemble and modular approaches –Combining GA with NN: representation Theoretical underpinnings –Multi-net systems: extending traditional techniques to define architecture, algorithm and explore properties

3 Neural Computing Group Cognitive modelling –Extrapolation: generalising to patterns not found in the training data –Numeric and language abilities: simulating child abilities and exploring biological aspects of neural networks Applications –Classification; clustering; prediction; information retrieval; bioinformatics

4 Multi-net Systems Learning and collaboration in multi-net systems –Single-net versus multi-net systems –In-situ learning Experimental results –Parallel combination of networks: ensemble –Sequential combination of networks: modular Formalising multi-net systems

5 Multi-net Systems Biological motivation for neural networks: –Hebb’s neurophysiological postulate 1 –Learning across cell assemblies: neural integration –Functional specialism: analogy to multi-net systems Theoretical motivation –Generalisation improvements with multi-net systems –Ensemble and modular Learning in collaboration with modularisation

6 Single-net Systems Systems of one or more artificial neurons combined together in a single network –Parallel distributed processing 2 systems: (multi-layer) perceptron systems –Unsupervised learning: Kohonen’s SOM 3

7 Single-nets as Multi-nets? x1x1 x2x2 y Combination of Linear Decision Boundaries True (1) False (-1) XOR

8 From Single-nets to Multi-nets Multi-net systems appear to be a development of the parallel processing paradigm Can multi-net systems improve generalisation? –Modularisation with simpler networks? –Limited theoretical and empirical evidence Generalisation: –Balance prior knowledge and training –VC Dimension 4,5,6 –Bias/variance dilemma 7

9 Multi-net Systems: Ensemble or Modular? Ensemble systems: –Parallel combination –Each network performs the same task –Simple ensemble –AdaBoost 8 Modular systems: –Each network performs a different (sub-)task –Mixture-of-experts 9 (top-down parallel competitive) –Min-max 10 (bottom-up static parallel/sequential)

10 Categorising Multi-net Systems Sharkey’s 11,12 combination strategies: –Parallel: co-operative or competitive top-down or bottom-up static or dynamic –Sequential –Supervisory Component networks may be 13 : –Pre-trained (independent) –Incrementally trained (iterative) –In-situ trained (simultaneous)

11 Multi-net Systems Categorisation schemes appear not to support the generalisation of multi-net system properties beyond specific examples –Ensemble: bias, variance and diversity 14 –Modular: bias and variance –What about measures such as the VC Dimension? Some use of in-situ learning –ME and HME 15 –Negative correlation learning 16

12 So? Multi-net systems appear to offer a way in which generalisation performance and learning speed can be improved: –Yet limited theoretical and empirical evidence –Focus on parallel systems Limited use of in-situ learning despite motivation –Existing results show improvement –Can the approach be generalised? No general framework for multi-net systems –Difficult to generalise properties from categorisation

13 Ongoing Research Explore (potential) benefit of multi-net systems –Can we combine ‘simple’ networks to solve ‘complex’ problems: ‘superordinate’ systems with faster learning? Can learning improve generalisation? –Parallel: in-situ learning in the simple ensemble –Sequential: combining networks with in-situ learning –Does in-situ learning provide improved generalisation? Can we formally define multi-net systems? –A method to describe the architecture and learning algorithm for a general multi-net system

14 In-situ Learning Simultaneous training of neural networks within a combined system –Existing techniques focus more on pre-training Systems being explored: –Simple learning ensemble (SLE) –Sequential learning modules (SLM) Classification –MONK’s problems (MONK 1, 2, 3) 17 –Wisconsin Breast Cancer Database (WBCD) 18

15 Simple Learning Ensemble (Pre-trained) ensembles: –Train each network individually –Parallel combination of network outputs: mean output –Pre-training: how can we understand or control the combined performance of the ensemble? –Incremental: AdaBoost 8 –In-situ: negative correlation 16 In-situ learning: –Train in-situ and assess combined performance during training using early stopping –Generalisation loss early stopping criterion 19

16 Sequential Learning Modules Sequential systems: –Can a combination of (simpler) networks give good generalisation and learning speed? –Typically pre-trained and for specific processing Sequential in-situ learning: –How can in-situ learning be achieved with sequential networks: target error/output? –Use unsupervised networks –Last network has target output and hence can be supervised

17 Systems 100 trials of each system, each with different random initial weights Generalisation assessed using test data to give mean response

18 Experimental Results

19 Comparison MONK’s problems –Comparison of maximum responses 17 –MONK 1: 100%; SLE 98.4%; SLM 84.7% –MONK 2: 100%; SLE 74.5%; SLM 81.0% –MONK 3: 97.2%; SLE 83.1%; SLM 87.5% –Small spread of values, especially SLE WBCD –Comparison of mean responses 20 –AdaBoost 97.6%; SLE 88.47%; SLM 97.29%

20 SLE Validation Error: MONK 1

21 SLE Epochs: MONK 1

22 In-situ Learning In-situ learning in multi-net systems: –Encouraging results from non-optimal systems –Comparison with existing single-net and multi-net systems (with and without early stopping) –Computational effort? Ensemble systems: –Effect of training times on bias, variance and diversity? Sequential systems: –Encouraging empirical results: theoretical results? –Automatic classification of unsupervised clusters

23 The Next Step? Empirical results are encouraging –But, what is the theoretical basis of multi-net systems? –Are multi-net systems any better than monolithic solutions, and if so, which configurations are better? Early work: a formal framework for multi- net systems

24 Multi-net System Framework Previous work: –Framework for the co-operation of learning algorithms 21 –Stochastic model 22 –Importance Sampled Learning Ensembles (ISLE) 23 –Focus on supervised learning and specific architectures Jordan and Xu’s (1995) definition of HME 24 : –Generalisation of HME –Abstraction of architecture from algorithm –Theoretical results for convergence

25 Multi-net System Framework Propose a modification to Jordan and Xu’s definition of HME to provide a generalised multi- net system framework –HME combines the output of the expert networks through a weighting generated by a gating network –Replace the weighting by the (optional) operation of a network –Can be used for parallel, sequential and supervisory systems

26 Example: HME

27 Example: Framework

28 Definition A multi-net system consists of the ordered tree of depth r defined by the nodes, with the root of the tree associated with the output, such that:

29 Multi-net System Framework Learning algorithm operates by modifying the state of the system as defined by associated with each node Includes: –Pre-training –In-situ training –Incremental training (through pre- or in-situ training)

30 Future Work In-situ learning: –Look further at in-situ learning – is it a valid approach –SLE: does learning promote diversity? –SLM: expand and explore limitations Framework: –Explore properties such as bias, variance, diversity and VC Dimension –Relate framework to existing systems and explore their properties

31 Questions?

32 References 1.Hebb, D.O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: John Wiley & Sons. 2.McClelland, J.L. & Rumelhart, D.E. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 2: Psychological and Biological Models. Cambridge, MA.: A Bradford Book, MIT Press. 3.Kohonen, T. (1982). Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, vol. 43, pp Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and Its Applications, vol. XVI(2), pp Baum, E.B. & Haussler, D. (1989). What Size Net Gives Valid Generalisation? Neural Computation, vol. 1(1), pp

33 References 6.Koiran, P. & Sontag, E.D. (1997). Neural Networks With Quadratic VC Dimension. Journal of Computer and System Sciences, vol. 54(1), pp Geman, S., Bienenstock, E. & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemma. Neural Computation, vol. 4(1), pp Freund, Y. & Schapire, R.E. (1996). Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the 13th International Conference, pp Morgan Kaufmann. 9.Jacobs, R.A., Jordan, M.I. & Barto, A.G. (1991). Task Decomposition through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks. Cognitive Science, vol. 15, pp Lu, B. & Ito, M. (1999). Task Decomposition and Module Combination Based on Class Relations: A Modular Neural Network for Pattern Classification. IEEE Transactions on Neural Networks, vol. 10(5), pp Sharkey, A.J.C. (1999). Multi-Net Systems. In Sharkey, A. J. C. (Ed), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pp London: Springer-Verlag.

34 References 12.Sharkey, A.J.C. (2002). Types of Multinet System. In Roli, F. & Kittler, J. (Ed), Proceedings of the Third International Workshop on Multiple Classifier Systems (MCS 2002), pp Berlin, Heidelberg, New York: Springer-Verlag. 13.Liu, Y., Yao, X., Zhao, Q. & Higuchi, T. (2002). An Experimental Comparison of Neural Network Ensemble Learning Methods on Decision Boundaries. Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN'02), vol. 1, pp Los Alamitos, CA.: IEEE Computer Society Press. 14.Kuncheva, L.I. (2002). Switching Between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32(2), pp Jordan, M.I. & Jacobs, R.A. (1994). Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, vol. 6(2), pp Liu, Y. & Yao, X. (1999a). Ensemble Learning via Negative Correlation. Neural Networks, vol. 12(10), pp

35 References 17.Thrun, S.B., Bala, J., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S.E., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R.S., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., van de Welde, W., Wenzel, W., Wnek, J. & Zhang, J. (1991). The MONK's Problems: A Performance Comparison of Different Learning Algorithms. Technical Report CMU-CS Pittsburgh, PA.: Carnegie-Mellon University, Computer Science Department. 18.Wolberg, W.H. & Mangasarian, O.L. (1990). Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology. Proceedings of the National Academy of Sciences, USA, vol. 87(23), pp Prechelt, L. (1996). Early Stopping - But When? In Orr, G. B. & Müller, K-R. (Ed), Neural Networks: Tricks of the Trade, 1524, pp Berlin, Heidelberg, New York: Springer-Verlag. 20.Drucker, H. Boosting Using Neural Networks. In Sharkey, A. J. C. (Ed), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pp London: Springer-Verlag, 1999.

36 References 21.Bottou, L. & Gallinari, P. (1991). A Framework for the Cooperation of Learning Algorithms. In Lippmann, R.P., Moody, J.E. & Touretzky, D.S. (Ed), Advances in Neural Information Processing Systems, vol. 3, pp Amari, S.-I. (1995). Information Geometry of the EM and em Algorithms for Neural Networks. Neural Networks, vol. 8(9), pp Friedman,J.H. & Popescu,B. (2003). Importance Sampling: An Alternative View of Ensemble Learning. Presented at the 4th International Workshop on Multiple Classifier Systems (MCS 2003). Guildford, UK. 24.Jordan, M.I. & Xu, L. (1995). Convergence Results for the EM Approach to Mixtures of Experts Architectures. Neural Networks, vol. 8, pp