Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrated Learning in Multi-net Systems Neural Computing Group Department of Computing University of Surrey Matthew Casey 6 th February 2004

Similar presentations

Presentation on theme: "Integrated Learning in Multi-net Systems Neural Computing Group Department of Computing University of Surrey Matthew Casey 6 th February 2004"— Presentation transcript:

1 Integrated Learning in Multi-net Systems Neural Computing Group Department of Computing University of Surrey Matthew Casey 6 th February 2004

2 2 Introduction In-situ learning in multi-net systems Classification –Parallel combination of networks: ensemble –Sequential combination of networks: modular Simulation –Parallel and sequential combination of networks –Quantification and addition Formal framework and algorithm for multi-net systems

3 3 Introduction Learning: –“process which leads to the modification of behaviour” 1 Biological motivation –Hebb’s neurophysiological postulate 2 –Learning across cell assemblies: neural integration –Functional specialism: analogy to multi-net systems Theoretical motivation –Generalisation improvements with multi-net systems –Ensemble and modular Learning in collaboration with modularisation

4 4 Single-net Systems Systems of one or more artificial neurons combined together in a single network –Parallel distributed processing 3 –Learning to generalise Learning algorithms –Supervised: delta 4,5,6, backpropagation 7,8,9 –Unsupervised: Hebbian 2, SOM 10

5 5 Single-nets as Multi-nets? x1x1 x2x2 y 11 1 1 11 Combination of Linear Decision Boundaries 1 1 1 1 1 1 1 1 True (1) False (-1) XOR

6 6 From Single-nets to Multi-nets Multi-net systems appear to be a development of the parallel processing paradigm Can multi-net systems improve generalisation? –Modularisation with simpler networks? –Limited theoretical and empirical evidence Generalisation: –Balance prior knowledge and training –VC Dimension 11,12,13 –Bias/variance dilemma 14

7 7 Multi-net Systems: Ensemble or Modular? Ensemble systems: –Parallel combination –Each network performs the same task –Simple ensemble –AdaBoost 15 Modular systems: –Each network performs a different (sub-)task –Mixture-of-experts 16 (top-down parallel competitive) –Min-max 17 (bottom-up static parallel/sequential)

8 8 Categorising Multi-net Systems Sharkey’s 18,19 combination strategies: –Parallel: co-operative or competitive top-down or bottom-up static or dynamic –Sequential –Supervisory Component networks may be 20 : –Pre-trained (independent) –Incrementally trained (iterative) –In-situ trained (simultaneous)

9 9 Multi-net Systems Categorisation schemes appear not to support the generalisation of multi-net system properties beyond specific examples –Ensemble: bias, variance and diversity 21 –Modular: bias and variance –What about measures such as the VC Dimension? Some use of in-situ learning –ME and HME 22 –Negative correlation learning 23

10 10 Research Multi-net systems seem to offer a way in which generalisation performance and learning speed can be improved: –Yet limited theoretical and empirical evidence –Focus on parallel systems Limited use of in-situ learning despite motivation –Existing results show improvement –Can the approach be generalised? No general framework for multi-net systems –Difficult to generalise properties from categorisation

11 11 Research Explore in-situ learning in multi-net systems: –Parallel: in-situ learning in the simple ensemble –Sequential: combining networks with in-situ learning –Does in-situ learning provide improved generalisation? –Can we combine ‘simple’ networks to solve ‘complex’ problems: ‘superordinate’ systems with faster learning? Propose a formal framework for multi-net systems –A method to describe the architecture and learning algorithm for a general multi-net system

12 12 Multi-net System Framework Previous work: –Framework for the co-operation of learning algorithms 24 –Stochastic model 25 –Importance Sampled Learning Ensembles (ISLE) 26 –Focus on supervised learning and specific architectures Jordan and Xu’s (1995) definition of HME 27 : –Generalisation of HME –Abstraction of architecture from algorithm –Theoretical results for convergence

13 13 Multi-net System Framework Propose a modification to Jordan and Xu’s definition of HME to provide a generalised multi- net system framework –HME combines the output of the expert networks through a weighting generated by a gating network –Replace the weighting by the (optional) operation of a network –Can be used for parallel, sequential and supervisory systems

14 14 Example: HME

15 15 Example: Framework

16 16 Definition A multi-net system consists of the ordered tree of depth r defined by the nodes, with the root of the tree associated with the output, such that:

17 17 Multi-net System Framework Learning algorithm operates by modifying the state of the system as defined by associated with each node Includes: –Pre-training –In-situ training –Incremental training (through pre- or in-situ training) Examples demonstrate how framework can be used to describe existing types of multi-net system –However does not rely upon categorisation schemes

18 18 In-situ Learning Evaluation of in-situ learning in multi-net systems Explore parallel and sequential in-situ learning with definitions using the proposed framework –Simple learning ensemble (SLE) –Sequential learning modules (SLM) Benchmark classification tasks 28 : –XOR –MONK’s problems (MONK 1, 2, 3) 29 –Wisconsin Breast Cancer Database (WBCD) 30 –Proben1 Thyroid (thyroid1 data set) 31

19 19 Simple Learning Ensemble Ensembles: –Train each network individually –Parallel combination of network outputs: mean output –Pre-training: how can we understand or control the combined performance of the ensemble? –Incremental: AdaBoost 15 –In-situ: negative correlation 23 In-situ learning: –Train in-situ and assess combined performance during training using early stopping –Generalisation loss early stopping criterion 32

20 20 Benchmark Results Compare SLE results with: –Simple ensemble (SE): all networks pre-trained using early stopping –Single-net: MLP with backpropagation – with and without early stopping For SLE and SE: –From 2 to 20 MLP networks –100 trials per configuration: mean response

21 21 Benchmark Results XOR –SLE and SE equivalent training (no early stopping) –SLE uses less epochs to give equivalent responses MONK’s problems/WBCD/Thyroid –SLE improves upon SE, SE improves upon single-net –SLE trains for longer: combined performance –Adding more networks gives better generalisation –More networks, more achieved desired performance MONK 1/2 –SLE improves upon non-early stopping single-net

22 22 Sequential Learning Modules Sequential systems: –Can a combination of (simpler) networks give good generalisation and learning speed? –Typically pre-trained and for specific processing Sequential in-situ learning: –How can in-situ learning be achieved with sequential networks: target error/output? –Use unsupervised networks –Last network has target output and hence can be supervised

23 23 Benchmark Results Compare SLM results with: –Single-net: single layer network with delta learning –Single-net: MLP with backpropagation – without early stopping Networks: –Combine a SOM with a single layer network with delta learning –Varying map sizes of SOM used to see effect on classification performance –100 trials per configuration: mean response

24 24 Benchmark Results XOR –Cannot be solved by SOM or single layer network –Can be solved by SLM with 3x3 map –Faster learning times than MLP with backpropagation MONK’s problems/WBCD –SLM can learn classification of training examples –Generalisation: improves upon SE with early stopping MONK 2/3 –SLM improves upon MLP without early stopping –MONK 3 improves upon SLE

25 25 Benchmark Results MONK 1/WBCD –Generalisation: similar, but slightly worse Thyroid –Poor learning of training examples –Can SOM perform sufficient pre-processing for single layer network? Results seem to depend upon problem type and map size

26 26 In-situ Learning In-situ learning in multi-net systems: –Can give better training and generalisation performance –Comparison with SE (early stopping) and single-net systems (with and without early stopping) –Computational effort? Ensemble systems: –Effect of training times on bias, variance and diversity? Sequential systems: –Encouraging empirical results: theoretical results? –Automatic classification of unsupervised clusters

27 27 In-situ Learning

28 28 In-situ Learning

29 29 In-situ Learning and Simulation Biological motivation for in-situ learning –Hebb’s ‘neural integration’ –Functional specialism in cognitive systems Use in-situ learning in modular multi-net systems to simulate numerical abilities: –Quantification: subitization and counting –Addition: fact retrieval and ‘count all’ –Combine ME and SLM to allow abilities to ‘compete’

30 30 Strategy Learning System

31 31 Multi-net Simulation of Quantification Single-net and multi-net systems –Trained on scenes consisting of ‘objects’ –Logarithmic probability model Subitization SOM: –Ordering of numbers with compressive scheme –Limit: training data, object frequency and map size Counting MLP with backpropagation: –Correct responses to training data –Conventional, stable non-conventional and nonstable

32 32 Multi-net Simulation of Quantification MNQ: –Subitization: SOM (pre-trained) –Single layer network with delta learning –Counting: MLP with backpropagation Successfully learnt to quantify –To subitize or count –Decision based upon input Subitization limit attributable to interaction of modules –Lower numbers subitized, higher numbers counted

33 33 Multi-net Simulation of Addition Single-net and multi-net systems –Trained on scenes consisting of two sets of ‘objects’ –Equal probability model Fact retrieval SOM: –Each addend associated with a map axis –Some overlap of problems –Commutative information used ‘Count all’ MLP with backpropagation: –Correct responses to training data –Some relationship to observed human errors

34 34 Multi-net Simulation of Addition MNA: –Fact retrieval: SOM –Single layer network with delta learning –‘Count all’: MLP with backpropagation Successfully learnt to add –To count or from facts –Decision based upon input Predominant use of ‘count all’, rather than facts –Does demonstrate change in strategy

35 35 In-situ Learning and Simulation In-situ learning in SLS: –Simulate developmental progression –At least as capable as equivalent monolithic solutions –Competition between supervised and unsupervised learning paradigms In-situ learning and simulation of the interaction between multiple abilities: –Interaction between different abilities: simulating psychological phenomena –Integrated learning

36 36 Contribution Multi-net systems: –Seem to offer empirical and theoretical improvement to generalisation –Properties under-explored Framework for multi-net systems –Generalisation of multi-net systems beyond categorisation schemes –Foundation to explore multi-net properties In-situ learning –Biological and theoretical motivation

37 37 Contribution Compared different multi-net techniques and the use of in-situ learning Demonstrated that in-situ learning can lead to improved training and generalisation SLE –Assessment of combined performance SLM –Combining ‘simple’ networks to solve ‘complex’ tasks –‘Superordinate’? –Also shows automatic classification using SOM

38 38 Contribution Integrated learning –Simulations using modular parallel and sequential networks –In-situ learning used to explore the interaction of modules during learning –Demonstrated simulation of abilities and observed psychological phenomena

39 39 Future Work Framework: –Modify learning algorithm – recursive using tree –Explore properties such as bias, variance, diversity and VC Dimension In-situ learning: –Further comparison –SLE: does learning promote diversity? –SLM: expand and explore limitations Simulations: –Explore further the effect of integrated learning

40 40 Questions?

41 41 References 1.Simpson, J.A. & Weiner, E.S.C. (Ed) (1989). Oxford English Dictionary, 2nd. Oxford, UK: Clarendon Press. 2.Hebb, D.O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: John Wiley & Sons. 3.McClelland, J.L. & Rumelhart, D.E. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 2: Psychological and Biological Models. Cambridge, MA.: A Bradford Book, MIT Press. 4.Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, vol. 65(6), pp. 386- 408. 5.Widrow, B. & Hoff, M.E.Jr. (1960). Adaptive Switching Circuits. IRE WESCON Convention Record, pp. 96-104. 6.Minsky, M.L. & Papert, S. (1988). Perceptrons: An Introduction to Computational Geometry, Expanded Ed. Cambridge, MA.: MIT Press. 7.Werbos, P.J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Unpublished doctoral thesis. Cambridge, MA.: Harvard University.

42 42 References 8.Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986). Learning Internal Representations by Error Propagation. In Rumelhart, D. E. & McClelland, J. L. (Ed), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, pp. 318-362. Cambridge, MA.: MIT Press. 9.Elman, J.L. (1990). Finding Structure in Time. Cognitive Science, vol. 14, pp. 179- 211. 10.Kohonen, T. (1982). Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, vol. 43, pp. 59-69. 11.Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and Its Applications, vol. XVI(2), pp. 264-280. 12.Baum, E.B. & Haussler, D. (1989). What Size Net Gives Valid Generalisation? Neural Computation, vol. 1(1), pp. 151-160. 13.Koiran, P. & Sontag, E.D. (1997). Neural Networks With Quadratic VC Dimension. Journal of Computer and System Sciences, vol. 54(1), pp. 190-198. 14.Geman, S., Bienenstock, E. & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemma. Neural Computation, vol. 4(1), pp. 1-58.

43 43 References 15.Freund, Y. & Schapire, R.E. (1996). Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the 13th International Conference, pp. 148-156. Morgan Kaufmann. 16.Jacobs, R.A., Jordan, M.I. & Barto, A.G. (1991). Task Decomposition through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks. Cognitive Science, vol. 15, pp. 219-250. 17.Lu, B. & Ito, M. (1999). Task Decomposition and Module Combination Based on Class Relations: A Modular Neural Network for Pattern Classification. IEEE Transactions on Neural Networks, vol. 10(5), pp. 1244-1256. 18.Sharkey, A.J.C. (1999). Multi-Net Systems. In Sharkey, A. J. C. (Ed), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pp. 1-30. London: Springer-Verlag. 19.Sharkey, A.J.C. (2002). Types of Multinet System. In Roli, F. & Kittler, J. (Ed), Proceedings of the Third International Workshop on Multiple Classifier Systems (MCS 2002), pp. 108-117. Berlin, Heidelberg, New York: Springer-Verlag. 20.Liu, Y., Yao, X., Zhao, Q. & Higuchi, T. (2002). An Experimental Comparison of Neural Network Ensemble Learning Methods on Decision Boundaries. Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN'02), vol. 1, pp. 221-226. Los Alamitos, CA.: IEEE Computer Society Press.

44 44 References 21.Kuncheva, L.I. (2002). Switching Between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32(2), pp. 146-156. 22.Jordan, M.I. & Jacobs, R.A. (1994). Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, vol. 6(2), pp. 181-214. 23.Liu, Y. & Yao, X. (1999a). Ensemble Learning via Negative Correlation. Neural Networks, vol. 12(10), pp. 1399-1404. 24.Bottou, L. & Gallinari, P. (1991). A Framework for the Cooperation of Learning Algorithms. In Lippmann, R.P., Moody, J.E. & Touretzky, D.S. (Ed), Advances in Neural Information Processing Systems, vol. 3, pp. 781-788. 25.Amari, S.-I. (1995). Information Geometry of the EM and em Algorithms for Neural Networks. Neural Networks, vol. 8(9), pp. 1379-1408. 26.Friedman,J.H. & Popescu,B. (2003). Importance Sampling: An Alternative View of Ensemble Learning. Presented at the 4th International Workshop on Multiple Classifier Systems (MCS 2003). Guildford, UK. 27.Jordan, M.I. & Xu, L. (1995). Convergence Results for the EM Approach to Mixtures of Experts Architectures. Neural Networks, vol. 8, pp. 1409-1431.

45 45 References 28.Blake,C.L. & Merz,C.J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA.: University of California, Irvine, Department of Information and Computer Sciences. 29.Thrun, S.B., Bala, J., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S.E., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R.S., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., van de Welde, W., Wenzel, W., Wnek, J. & Zhang, J. (1991). The MONK's Problems: A Performance Comparison of Different Learning Algorithms. Technical Report CMU-CS-91-197. Pittsburgh, PA.: Carnegie-Mellon University, Computer Science Department. 30.Wolberg, W.H. & Mangasarian, O.L. (1990). Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology. Proceedings of the National Academy of Sciences, USA, vol. 87(23), pp. 9193-9196. 31.Prechelt, L. (1994). Proben1: A Set of Neural Network Benchmark Problems and Benchmarking Rules. Technical Report 21 / 94. Karlsruhe, Germany: University of Karlsruhe. 32.Prechelt, L. (1996). Early Stopping - But When? In Orr, G. B. & Müller, K-R. (Ed), Neural Networks: Tricks of the Trade, 1524, pp. 55-69. Berlin, Heidelberg, New York: Springer-Verlag.

Download ppt "Integrated Learning in Multi-net Systems Neural Computing Group Department of Computing University of Surrey Matthew Casey 6 th February 2004"

Similar presentations

Ads by Google