Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004

Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004 http://www.computing.surrey.ac.uk/personal/st/M.Casey/

2 Neural Computing Group Novel neural network architectures –Multi-net systems: ensemble and modular approaches –Combining GA with NN: representation Theoretical underpinnings –Multi-net systems: extending traditional techniques to define architecture, algorithm and explore properties

3 Neural Computing Group Cognitive modelling –Extrapolation: generalising to patterns not found in the training data –Numeric and language abilities: simulating child abilities and exploring biological aspects of neural networks Applications –Classification; clustering; prediction; information retrieval; bioinformatics

4 Multi-net Systems Learning and collaboration in multi-net systems –Single-net versus multi-net systems –In-situ learning Experimental results –Parallel combination of networks: ensemble –Sequential combination of networks: modular Formalising multi-net systems

5 Multi-net Systems Biological motivation for neural networks: –Hebb’s neurophysiological postulate 1 –Learning across cell assemblies: neural integration –Functional specialism: analogy to multi-net systems Theoretical motivation –Generalisation improvements with multi-net systems –Ensemble and modular Learning in collaboration with modularisation

6 Single-net Systems Systems of one or more artificial neurons combined together in a single network –Parallel distributed processing 2 systems: (multi-layer) perceptron systems –Unsupervised learning: Kohonen’s SOM 3

7 Single-nets as Multi-nets? x1x1 x2x2 y 11 1 1 11 Combination of Linear Decision Boundaries 1 1 1 1 1 1 1 1 True (1) False (-1) XOR

8 From Single-nets to Multi-nets Multi-net systems appear to be a development of the parallel processing paradigm Can multi-net systems improve generalisation? –Modularisation with simpler networks? –Limited theoretical and empirical evidence Generalisation: –Balance prior knowledge and training –VC Dimension 4,5,6 –Bias/variance dilemma 7

9 Multi-net Systems: Ensemble or Modular? Ensemble systems: –Parallel combination –Each network performs the same task –Simple ensemble –AdaBoost 8 Modular systems: –Each network performs a different (sub-)task –Mixture-of-experts 9 (top-down parallel competitive) –Min-max 10 (bottom-up static parallel/sequential)

10 Categorising Multi-net Systems Sharkey’s 11,12 combination strategies: –Parallel: co-operative or competitive top-down or bottom-up static or dynamic –Sequential –Supervisory Component networks may be 13 : –Pre-trained (independent) –Incrementally trained (iterative) –In-situ trained (simultaneous)

11 Multi-net Systems Categorisation schemes appear not to support the generalisation of multi-net system properties beyond specific examples –Ensemble: bias, variance and diversity 14 –Modular: bias and variance –What about measures such as the VC Dimension? Some use of in-situ learning –ME and HME 15 –Negative correlation learning 16

12 So? Multi-net systems appear to offer a way in which generalisation performance and learning speed can be improved: –Yet limited theoretical and empirical evidence –Focus on parallel systems Limited use of in-situ learning despite motivation –Existing results show improvement –Can the approach be generalised? No general framework for multi-net systems –Difficult to generalise properties from categorisation

13 Ongoing Research Explore (potential) benefit of multi-net systems –Can we combine ‘simple’ networks to solve ‘complex’ problems: ‘superordinate’ systems with faster learning? Can learning improve generalisation? –Parallel: in-situ learning in the simple ensemble –Sequential: combining networks with in-situ learning –Does in-situ learning provide improved generalisation? Can we formally define multi-net systems? –A method to describe the architecture and learning algorithm for a general multi-net system

14 In-situ Learning Simultaneous training of neural networks within a combined system –Existing techniques focus more on pre-training Systems being explored: –Simple learning ensemble (SLE) –Sequential learning modules (SLM) Classification –MONK’s problems (MONK 1, 2, 3) 17 –Wisconsin Breast Cancer Database (WBCD) 18

15 Simple Learning Ensemble (Pre-trained) ensembles: –Train each network individually –Parallel combination of network outputs: mean output –Pre-training: how can we understand or control the combined performance of the ensemble? –Incremental: AdaBoost 8 –In-situ: negative correlation 16 In-situ learning: –Train in-situ and assess combined performance during training using early stopping –Generalisation loss early stopping criterion 19

16 Sequential Learning Modules Sequential systems: –Can a combination of (simpler) networks give good generalisation and learning speed? –Typically pre-trained and for specific processing Sequential in-situ learning: –How can in-situ learning be achieved with sequential networks: target error/output? –Use unsupervised networks –Last network has target output and hence can be supervised

17 Systems 100 trials of each system, each with different random initial weights Generalisation assessed using test data to give mean response

18 Experimental Results

19 Comparison MONK’s problems –Comparison of maximum responses 17 –MONK 1: 100%; SLE 98.4%; SLM 84.7% –MONK 2: 100%; SLE 74.5%; SLM 81.0% –MONK 3: 97.2%; SLE 83.1%; SLM 87.5% –Small spread of values, especially SLE WBCD –Comparison of mean responses 20 –AdaBoost 97.6%; SLE 88.47%; SLM 97.29%

20 SLE Validation Error: MONK 1

21 SLE Epochs: MONK 1

22 In-situ Learning In-situ learning in multi-net systems: –Encouraging results from non-optimal systems –Comparison with existing single-net and multi-net systems (with and without early stopping) –Computational effort? Ensemble systems: –Effect of training times on bias, variance and diversity? Sequential systems: –Encouraging empirical results: theoretical results? –Automatic classification of unsupervised clusters

23 The Next Step? Empirical results are encouraging –But, what is the theoretical basis of multi-net systems? –Are multi-net systems any better than monolithic solutions, and if so, which configurations are better? Early work: a formal framework for multi- net systems

24 Multi-net System Framework Previous work: –Framework for the co-operation of learning algorithms 21 –Stochastic model 22 –Importance Sampled Learning Ensembles (ISLE) 23 –Focus on supervised learning and specific architectures Jordan and Xu’s (1995) definition of HME 24 : –Generalisation of HME –Abstraction of architecture from algorithm –Theoretical results for convergence

25 Multi-net System Framework Propose a modification to Jordan and Xu’s definition of HME to provide a generalised multi- net system framework –HME combines the output of the expert networks through a weighting generated by a gating network –Replace the weighting by the (optional) operation of a network –Can be used for parallel, sequential and supervisory systems

26 Example: HME

27 Example: Framework

28 Definition A multi-net system consists of the ordered tree of depth r defined by the nodes, with the root of the tree associated with the output, such that:

29 Multi-net System Framework Learning algorithm operates by modifying the state of the system as defined by associated with each node Includes: –Pre-training –In-situ training –Incremental training (through pre- or in-situ training)

30 Future Work In-situ learning: –Look further at in-situ learning – is it a valid approach –SLE: does learning promote diversity? –SLM: expand and explore limitations Framework: –Explore properties such as bias, variance, diversity and VC Dimension –Relate framework to existing systems and explore their properties

31 Questions?

32 References 1.Hebb, D.O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: John Wiley & Sons. 2.McClelland, J.L. & Rumelhart, D.E. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 2: Psychological and Biological Models. Cambridge, MA.: A Bradford Book, MIT Press. 3.Kohonen, T. (1982). Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, vol. 43, pp. 59-69. 4.Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and Its Applications, vol. XVI(2), pp. 264-280. 5.Baum, E.B. & Haussler, D. (1989). What Size Net Gives Valid Generalisation? Neural Computation, vol. 1(1), pp. 151-160.

33 References 6.Koiran, P. & Sontag, E.D. (1997). Neural Networks With Quadratic VC Dimension. Journal of Computer and System Sciences, vol. 54(1), pp. 190-198. 7.Geman, S., Bienenstock, E. & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemma. Neural Computation, vol. 4(1), pp. 1-58. 8.Freund, Y. & Schapire, R.E. (1996). Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the 13th International Conference, pp. 148-156. Morgan Kaufmann. 9.Jacobs, R.A., Jordan, M.I. & Barto, A.G. (1991). Task Decomposition through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks. Cognitive Science, vol. 15, pp. 219-250. 10.Lu, B. & Ito, M. (1999). Task Decomposition and Module Combination Based on Class Relations: A Modular Neural Network for Pattern Classification. IEEE Transactions on Neural Networks, vol. 10(5), pp. 1244-1256. 11.Sharkey, A.J.C. (1999). Multi-Net Systems. In Sharkey, A. J. C. (Ed), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pp. 1-30. London: Springer-Verlag.

34 References 12.Sharkey, A.J.C. (2002). Types of Multinet System. In Roli, F. & Kittler, J. (Ed), Proceedings of the Third International Workshop on Multiple Classifier Systems (MCS 2002), pp. 108-117. Berlin, Heidelberg, New York: Springer-Verlag. 13.Liu, Y., Yao, X., Zhao, Q. & Higuchi, T. (2002). An Experimental Comparison of Neural Network Ensemble Learning Methods on Decision Boundaries. Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN'02), vol. 1, pp. 221-226. Los Alamitos, CA.: IEEE Computer Society Press. 14.Kuncheva, L.I. (2002). Switching Between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32(2), pp. 146-156. 15.Jordan, M.I. & Jacobs, R.A. (1994). Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, vol. 6(2), pp. 181-214. 16.Liu, Y. & Yao, X. (1999a). Ensemble Learning via Negative Correlation. Neural Networks, vol. 12(10), pp. 1399-1404.

35 References 17.Thrun, S.B., Bala, J., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S.E., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R.S., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., van de Welde, W., Wenzel, W., Wnek, J. & Zhang, J. (1991). The MONK's Problems: A Performance Comparison of Different Learning Algorithms. Technical Report CMU-CS-91-197. Pittsburgh, PA.: Carnegie-Mellon University, Computer Science Department. 18.Wolberg, W.H. & Mangasarian, O.L. (1990). Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology. Proceedings of the National Academy of Sciences, USA, vol. 87(23), pp. 9193-9196. 19.Prechelt, L. (1996). Early Stopping - But When? In Orr, G. B. & Müller, K-R. (Ed), Neural Networks: Tricks of the Trade, 1524, pp. 55-69. Berlin, Heidelberg, New York: Springer-Verlag. 20.Drucker, H. Boosting Using Neural Networks. In Sharkey, A. J. C. (Ed), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pp. 51-78. London: Springer-Verlag, 1999.

36 References 21.Bottou, L. & Gallinari, P. (1991). A Framework for the Cooperation of Learning Algorithms. In Lippmann, R.P., Moody, J.E. & Touretzky, D.S. (Ed), Advances in Neural Information Processing Systems, vol. 3, pp. 781- 788. 22.Amari, S.-I. (1995). Information Geometry of the EM and em Algorithms for Neural Networks. Neural Networks, vol. 8(9), pp. 1379-1408. 23.Friedman,J.H. & Popescu,B. (2003). Importance Sampling: An Alternative View of Ensemble Learning. Presented at the 4th International Workshop on Multiple Classifier Systems (MCS 2003). Guildford, UK. 24.Jordan, M.I. & Xu, L. (1995). Convergence Results for the EM Approach to Mixtures of Experts Architectures. Neural Networks, vol. 8, pp. 1409- 1431.

Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004

Similar presentations

Presentation on theme: "Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004

Similar presentations

Presentation on theme: "Neural Computing Group Department of Computing University of Surrey Matthew Casey 21 st April 2004"— Presentation transcript:

Similar presentations

About project

Feedback