What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson

Slides:

Advertisements

Similar presentations

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,

Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS

Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.

Poster template by ResearchPosters.co.za When do behavioural interventions work and why? Towards identifying principles for clinical intervention in developmental.

1 Language and kids Linguistics lecture #8 November 21, 2006.

Artificial Neural Networks - Introduction -

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Cognitive Processes PSY 334

Design The goal is to design a modular solution, using the techniques of: Decomposition Abstraction Encapsulation In Object Oriented Programming this is.

Symbolic Encoding of Neural Networks using Communicating Automata with Applications to Verification of Neural Network Based Controllers* Li Su, Howard.

Reference Book: Modern Compiler Design by Grune, Bal, Jacobs and Langendoen Wiley 2000.

Chapter Seven The Network Approach: Mind as a Web.

1 Representing Regularity: The English Past Tense Matt Davis William Marslen-Wilson Centre for Speech and Language Birkbeck College University of London.

Supervised Learning: Perceptrons and Backpropagation.

Rules or Connections in Past Tense Inflections Psychology 209 February 4, 2013.

Modeling Language Acquisition with Neural Networks A preliminary research plan Steve R. Howell.

Architectural separation (MVC, arch model, Seeheim).

11/2008AAAI Circuit sharing and the implementation of intelligent systems Michael L. Anderson Institute for Advanced Computer Studies Program in.

Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.

Plasticity and Sensitive Periods in Self-Organising Maps Fiona M. Richardson and Michael S.C. Thomas

Intro. To Knowledge Engineering

James L. McClelland Stanford University

Introduction Pinker and colleagues (Pinker & Ullman, 2002) have argued that morphologically irregular verbs must be stored as full forms in the mental.

11 C H A P T E R Artificial Intelligence and Expert Systems.

Multi-Layer Perceptrons Michael J. Watts

1 6. Feed-forward mapping networks Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering.

Week 3a Mechanisms for Adaptation. POLS-GEOG-SOC 495 Spring Lecture Overview Review –CAS –Principles of chaos How do systems “learn”? –“Credit.

Brief Overview of Connectionism to understand Learning Walter Schneider P2476 Cognitive Neuroscience of Human Learning & Instruction

SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar

Babies and Computers Are They Related? – Abel Nyamapfene.

Artificial Intelligence Techniques Multilayer Perceptrons.

An Instructable Connectionist/Control Architecture: Using Rule-Based Instructions to Accomplish Connectionist Learning in a Human Time Scale Presented.

Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.

Methodology of Simulations n CS/PY 399 Lecture Presentation # 19 n February 21, 2001 n Mount Union College.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

Modelling Language Acquisition with Neural Networks Steve R. Howell A preliminary research plan.

The Past Tense Model Psych /719 Feb 13, 2001.

Artificial Neural Network Building Using WEKA Software

Language, Mind, and Brain by Ewa Dabrowska Chapter 8: On rules and regularity, pt. 2.

An Unsupervised Connectionist Model of Rule Emergence in Category Learning Rosemary Cowell & Robert French LEAD-CNRS, Dijon, France EC FP6 NEST Grant.

Perceptual Learning, Roving and the Unsupervised Bias By Aaron Clarke, Henning Sprekeler, Wolfram Gerstner and Michael Herzog Brain Mind Institute École.

COSC 460 – Neural Networks Gregory Caza 17 August 2007.

Visualization Four groups Design pattern for information visualization

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Connectionist Modelling Summer School Lecture Three.

Chapter 8: Adaptive Networks

The Emergent Structure of Semantic Knowledge

Perceptrons Michael J. Watts

Chapter 6 Neural Network.

Engr 691 Special Topics in Engineering Science Software Architecture Spring Semester 2004 Lecture Notes.

Emergent Semantics: Meaning and Metaphor Jay McClelland Department of Psychology and Center for Mind, Brain, and Computation Stanford University.

1 LING 696B: Final thoughts on nonparametric methods, Overview of speech processing.

Connectionist Modelling Summer School Lecture Two.

The Emergentist Approach To Language As Embodied in Connectionist Networks James L. McClelland Stanford University.

Software Design.

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Comparison with other Models Exploring Predictive Architectures

James L. McClelland SS 100, May 31, 2011

6. Feed-forward mapping networks

Chapter 3. Artificial Neural Networks - Introduction -

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

CHAPTER I. of EVOLUTIONARY ROBOTICS Stefano Nolfi and Dario Floreano

Learning linguistic structure with simple recurrent neural networks

A connectionist model in action

Structure of a typical back-propagated multilayered perceptron used in this study. Structure of a typical back-propagated multilayered perceptron used.

Representation of Language Knowledge: Is it All in your Connections?

Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.

The Network Approach: Mind as a Web

Presentation transcript:

What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson Developmental Neurocognition Lab, Birkbeck College, University of London, UK. What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson Developmental Neurocognition Lab, Birkbeck College, University of London, UK. Abstract Method Results This research was supported by UK MRC CE Grant G to Michael Thomas Acknowledgements 1.Calabretta, R., Di Ferdinando, A., Wagner, G. P., & Parisi, D. (2003). What does it take to evolve behaviorally complex organisms? Biosystems, 69, Fodor, J. A. (1983). The modularity of mind. CUP. 3.Pinker, S. (1991). Rules of language, Science, 253, Pinker, S. (1997). How the mind works. Allen Lane. 5.Pinker, S. (1999). Words and rules. London: Weidenfeld & Nicolson 6.Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perceptron. Cognition, 38, References Table 1. Architectures with different modular commitments Discussion & Conclusions The modular solution was the least efficient use of the computational resources. How general is this finding? It is suggestive that in adaptive systems, co-operative use of resources to drive outputs is better than competitive use when the mechanisms received the same input. The modular solution may be superior when a common input must drive separate outputs, and the two output tasks rely on independent information present in the input. Figure 1. Computational resources used to learn past tense problem. (S) selection mechanism allows components to learn domain-specific mappings. (C) competition mechanism determines which mechanism drives the output (cf. Pinker’s ‘blocking’ device). Modular solution used S+C. Emergent solution uses neither. Redundant solution uses only C. Modularity was initially proposed as an efficient architecture for low-level perceptual processing (Fodor, 1983). Latterly it was extended as a principle that might apply to high-level cognition, with the architecture shaped by natural selection (Pinker, 1997). How do modular systems learn their abilities? Table 1 shows some simple architectures with different modular commitments. Calabretta et al. (2003) trained a network with a common visual input to output ‘what’ and ‘where’ information about objects presented on the retina. They found modular processing channels were the optimal architecture for learning (Table 1, #7 was better than #5). ‘What’ and ‘where’ information is independent and modularity prevents interference. Pinker (1991) proposed that modularity would aid language development. E.g., in the English past tense, there is a duality between regular verbs (talk-talked) + rule generalisation (wug-wugged) and exceptions (hit-hit, drink-drank, go- went). When children learn the past tense, they shown intermittent over- application of the rule to exceptions (e.g., *drinked). Pinker argued for a modular architecture, with a rule-learning mechanism and an exception- learning mechanism. Over-application errors arise as the child learns to co- ordinate the mechanisms. However, the model has never been implemented. We explored the developmental trajectory of a modular approach to past tense acquisition (Table 1, #3), and contrasted it with non-modular ways of using the same initial computational resources. Does the modular solution show the predicted advantage? Introduction Modular systems have been proposed as efficient architectures for high-level cognition. However, such architectures are rarely implemented as developmental systems. Taking the example of past tense, we find that pre- specified modularity is less efficient for learning than using the same computational resources in different ways (to produce emergent or redundant systems). Modularity is good for: When computational components drive separate outputs and the information required by each output is independent Modularity is bad for: When components receive information from a common input and have to drive a common output What’s the problem? The modules try to drive the common output in different ways and the competition between them must be resolved. Co-operative processing is more efficient Output Input Hidden S C U Phonological specification of past tense problem (Plunkett & Marchman, 1991) Use a 2-layer connectionist network for the optimised rule-learning device (Pinker’s rule learning device not specified in sufficient detail to implement) Use a 3-layer connectionist network for the optimal learning of arbitrary associations Can use these same resources in three ways: Pre-specified modularity => 2-layer network trained on regular verbs; 3-layer network trained on exceptions; strongest signal drives output Emergent specialisation => 2-layer network and 3- layer network both adapt to reduce error at output; networks demonstrate partial emergent specialisation of function (regulars+rule to 2-layer, exceptions to 3-layer) Redundant => 2-layer and 3-layer both separately trained on whole past tense problem; strongest signal drives output All architectures exhibited a phase of interference (*drinked) errors (Fig.2). These were not solely diagnostic of the modular solution. Was the modular solution best? No, it was worse than both emergent and redundant solutions, and indeed failed to learn the exception verbs to ceiling (Fig.3). The modular solution struggled to resolve the competition between the different output responses of the two modules. Indeed, because the regular mechanism was learning a simpler function, it produced a stronger response than the exception mechanism and generally overrode it. Note 1: Results were sensitive to hidden unit resource levels in the (3-layer) exception mechanism. Results for both low and high resources shown. Note 2: Pinker (1999) proposed a Revised Dual Mechanism model, in which the regular mechanism learns regulars but the exception mechanism attempts all verbs. Results also shown for this architecture. Figure 2. Interference errorsFigure 3. Developmental trajectories Hit-ed Drink-ed Go-ed Low exception resources Hit-ed Drink-ed Go-ed High exception resources Talked Hit Drank Went Wugged Gktted Training set Novel items Talked Hit Drank Went Wugged Gktted Frinked Frank Figure 4. Developmental trajectories while boosting the signal strength from the exception mechanism (biasing factor x1 to x1000) The modular solution had fast rule learning and strong generalisation of the rule. But when the signal from the exception mechanism was boosted to allow exceptions to drive the output, so that these verbs could be learned to ceiling, the advantage on rule learning was lost. No level of exception signal boosting gave the modular solution an advantage over emergent or redundant architectures.