DO LOCAL MODIFICATION RULES ALLOW EFFICIENT LEARNING ABOUT DISTRIBUTED REPRESENTATIONS ? A. R. Gardner-Medwin THE PRINCIPLE OF LOCAL COMPUTABILITY Neural.

Slides:

Advertisements

Similar presentations

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Advertisements

Review. 2 TEST A 1.What is the single most important property of vibrating structure? 2. What happens when structure is vibrating in resonance? 3.What.

Imbalanced data David Kauchak CS 451 – Fall 2013.

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.

Sampling Distributions

Learning linguistic structure with simple recurrent networks February 20, 2013.

Stressful Life Events and Its Effects on Educational Attainment: An Agent Based Simulation of the Process CS 460 December 8, 2005.

Artificial Neural Networks - Introduction -

Artificial Neural Networks - Introduction -

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

Un Supervised Learning & Self Organizing Maps Learning From Examples

Improving Robustness in Distributed Systems Jeremy Russell Software Engineering Honours Project.

Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.

Lecture 6: Multiple Regression

= == Critical Value = 1.64 X = 177  = 170 S = 16 N = 25 Z =

1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.

Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy.

Chapter 7 Probability and Samples: The Distribution of Sample Means

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

Determining the Size of

Radial Basis Function Networks

Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)

CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Spatial Statistics Applied to point data.

Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.

LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.

Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.

Pencil-and-Paper Neural Networks Prof. Kevin Crisp St. Olaf College.

Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.

Chapter 7 Probability and Samples: The Distribution of Sample Means.

11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

Issues concerning the interpretation of statistical significance tests.

The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.

Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.

CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.

Inference: Probabilities and Distributions Feb , 2012.

Sampling and estimation Petter Mostad

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Module –I Codes: Weighted and non-weighted codes

Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.

CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton

HOW MANY MODIFIABLE MECHANISMS DO MODIFIABLE SYNAPSES NEED? A. R. Gardner-Medwin Three distinct concepts: 1. How many (and which) local parameters are.

1 Azhari, Dr Computer Science UGM. Human brain is a densely interconnected network of approximately neurons, each connected to, on average, 10 4.

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Lecture 1.31 Criteria for optimal reception of radio signals.

Statistical analysis.

Measurement, Quantification and Analysis

Statistical analysis.

Text Based Information Retrieval

Context-based Data Compression

Compact Query Term Selection Using Topically Related Text

Simple learning in connectionist networks

Process Capability.

Elements of a statistical test Statistical null hypotheses

Boltzmann Machine (BM) (§6.4)

Parametric Methods Berlin Chen, 2005 References:

Patrick Kaifosh, Attila Losonczy Neuron

Simple learning in connectionist networks

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Predicting Gene Expression from Sequence

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Patrick Kaifosh, Attila Losonczy Neuron

Presentation transcript:

DO LOCAL MODIFICATION RULES ALLOW EFFICIENT LEARNING ABOUT DISTRIBUTED REPRESENTATIONS ? A. R. Gardner-Medwin THE PRINCIPLE OF LOCAL COMPUTABILITY Neural computations must satisfy a constraint - each modifiable parameter is a function only of the history of locally available information. Realistic models must be capable of expression in this form - without functions of global patterns. In synaptic terms : modification can only depend on the history of pre- and post- synaptic membrane and cytoplasm states and local extracellular modulators. DISTRIBUTED REPRESENTATIONS Events (significant for learning & inference) correspond to spatial patterns - sets of active cells - and are not usually the sole trigger features for individual cells. Distributed representations can be compact (representing many distinct events on few cells), but..... THE OVERLAP PROBLEM Each cell and synapse is involved in experience of many events, causing interference and inevitable errors in learning and statistics for individual events. THESE CONSIDERATIONS SEEM TO ENFORCE THIS BROAD STRATEGY :- Measure statistics for individual cells and synapses Combine for all cells or synapses active in an event Correct for the average interference due to overlap Tolerate the variance introduced Questions:  How can we quantify the variance that is tolerable? - the concept of efficiency.  What are the constraints on the number of events that can be learned about, on a given number of cells?  What are the implications for the nature of distributed representations and trigger features, for efficient learning?

Counting Events - a challenge for distributed representations With H. Barlow (2001, Neural Computation 13: ) we examined the problem of counting events. We argue that unless a representation can sustain efficient counting of events, it cannot sustain efficient learning. But... neural systems do not need to count precisely Why Count? Biological success is about prediction Prediction is about conditional probabilities Probabilities come from frequencies Frequencies require counting But NB typically: Actual N = expected value   Note that what matters is the estimate of , not the value of N This implies that there is little point in counting much more accurately than    (roughly,   N) Statistical efficiency of an estimation (Fisher) Efficiency = data needed for a given reliability, with a perfect (counting) algorithm data needed when there is added variance 50% efficiency  it takes twice as long to achieve reliable inferences Counting a Poisson process with variance = V :- Efficiency = 100% x  / (  + V) Estimating the count of randomly scattered X’s When forced to subitise (i.e. to estimate a count without sequential matching to linguistic symbols, fingers, etc.) human counting efficiency falls with large numbers, but is typically >50% for counts in the spatial domain.

Dept. of Physiology, University College London, London WC1E 6BT, UK Models & simulation examples of performance based on usage of (A) cells, (B) pre- & post-synaptic cell pairings and (C) triplets of cells Simulation results (shown above) with the 3 models:-  200 distinct events were represented by random selections of 10 binary active cells on a network of 100 cells  Events occurred differing numbers of times shown on the horizontal axes (Poisson distributed, mean=4)  Simulated estimates of the number of occurrences are plotted on the vertical axes, from activation on test presentation of each event.  For (C) cells had 20 afferent synapses from each other cell, shuffled so that neighbours were never identical, but otherwise random. A) Projection model : sums weights proportional to usage of cells active in a test event. B) Support model : measure auto-associative support for activity in a test event using imposed inhibition. Synapse weights  amount of pre- and post- synaptic paired activity. C) Triplet support As B, but with synapses that strengthen with co- occurrences of activity in 3 cells: pre- and post- plus an adjacent pre- axon. Under test conditions (recall) active synapses (red) sum their strengths, but only when adjacent pairs are active (a), not single synapses (b,c). A) Counting efficiency =10% C) Counting efficiency =91% B) Counting efficiency =65% Distributed representations can support high efficiency, but they must employ very many cells compared with a compact representation for the counted events.  NB 100 cells would allow each event to have a direct representation, giving 100% counting efficiency.  7 cells suffice for a compact representation distinguishing 100 events.  (the activity ratio) is the fraction of cells active in each event.

Conclusions - Distributed vs. Direct representations Distributed representations can be compact (requiring few cells), but do not permit 100% counting efficiency. Distributed reps do permit high efficiencies (>>50%) but only if they are highly redundant (comparable to or greater than direct reps). Distributed reps allow novel events to be counted without prior provision for special encoding, even when the interest in a type of event emerges only after the experience. In this context distributed reps have a huge advantage over direct reps. Distributed reps permit the boosting of counting efficiencies for rare or significant events by variation of activity ratios through mechanisms such as attention and habituation. Direct reps allow the processing of simultaneous events in parallel. Distributed reps require that simultaneous events be processed serially, as in a stream of conscious awareness. Serial processing of simultaneous events 1. Use more cells & trigger features than there are events of interest E.g. counting 100 equiprobable events:- Efficiency Efficiency active total (projection (support number of cells cells model - A) model -B) distinct reps % 45% % 63% % 93% % 98%10 8 Direct Reps: % %  NB distributed reps handle almost any 100 of the many distinct events.  Direct reps must be set up in advance (  restricted to familiar events) How can better representations improve efficiency? 2. Vary the number of active cells between events Rare events tend to have poor counting efficiency because of overwhelming interference from overlap with common events.  Increase the number of cells active in rare (and important) events, e.g. through alerting & orienting reactions and selective attention.  Reduce the number of cells active in common events through habituation, adaptation. 3. Choose trigger features appropriately Trigger features (i.e. the conditions that lead to activity of individual cells) should ideally be such that :-  there is minimum overlap between representations of different events, especially ones that have different significance in relation to learning.  the subset of events in which a feature is active tend to have similar associations, and significance in relation to learning.