CSC2535: Computation in Neural Networks Lecture 13: Representing things with neurons Geoffrey Hinton.

Slides:



Advertisements
Similar presentations
電腦視覺 Computer and Robot Vision I
Advertisements

Computer Vision Lecture 16: Region Representation
CSC321: Neural Networks Lecture 3: Perceptrons
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Chapter 2: Pattern Recognition
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Biologically Inspired Robotics Group,EPFL Associative memory using coupled non-linear oscillators Semester project Final Presentation Vlad TRIFA.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Neural Networks Lecture 8: Two simple learning algorithms
Computer vision.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 5: Distributed Representations Geoffrey Hinton.
Digital Image Processing CCS331 Relationships of Pixel 1.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
CSC321: Neural Networks Lecture 19: Simulating Brain Damage Geoffrey Hinton.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
CSC321: Lecture 7:Ways to prevent overfitting
Vector Quantization CAP5015 Fall 2005.
CSC321: Neural Networks Lecture 18: Distributed Representations
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
1 Overview representing region in 2 ways in terms of its external characteristics (its boundary)  focus on shape characteristics in terms of its internal.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
1 Nonlinear models for Natural Image Statistics Urs Köster & Aapo Hyvärinen University of Helsinki.
Symbolic Reasoning in Spiking Neurons: A Model of the Cortex/Basal Ganglia/Thalamus Loop Terrence C. Stewart Xuan Choo Chris Eliasmith Centre for Theoretical.
Another Example: Circle Detection
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Today’s Lecture Neural networks Training
Big data classification using neural network
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
CSC321 Lecture 18: Hopfield nets and simulated annealing
Data Mining, Neural Network and Genetic Programming
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Neural Networks.
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
A Gentle Introduction to Bilateral Filtering and its Applications
Data Compression.
CS 5010 Program Design Paradigms "Bootcamp" Lesson 9.3
Mean Shift Segmentation
Hash functions Open addressing
Common Classification Tasks
Computer Vision Lecture 5: Binary Image Processing
Fitting Curve Models to Edges
4.2 Data Input-Output Representation
Computer Vision Lecture 3: Digital Images
Craig Schroeder October 26, 2004
CSSE463: Image Recognition Day 23
Volume 56, Issue 2, Pages (October 2007)
Creating Data Representations
Morphological Operators
CSSE463: Image Recognition Day 23
Data Structures & Algorithms
Fourier Transform of Boundaries
Topological Signatures For Fast Mobility Analysis
CSSE463: Image Recognition Day 23
The Network Approach: Mind as a Web
Volume 56, Issue 2, Pages (October 2007)
Lecture-Hashing.
Presentation transcript:

CSC2535: Computation in Neural Networks Lecture 13: Representing things with neurons Geoffrey Hinton

Localist representations The simplest way to represent things with neural networks is to dedicate one neuron to each thing. Easy to understand. Easy to code by hand Often used to represent inputs to a net Easy to learn This is what mixture models do. Each cluster corresponds to one neuron Easy to associate with other representations or responses. But localist models are very inefficient whenever the data has componential structure.

Examples of componential structure Big, yellow, Volkswagen Do we have a neuron for this combination Is the BYV neuron set aside in advance? Is it created on the fly? How is it related to the neurons for big and yellow and Volkswagen? Consider a visual scene It contains many different objects Each object has many properties like shape, color, size, motion. Objects have spatial relationships to each other.

Using simultaneity to bind things together shape neurons Represent conjunctions by activating all the constituents at the same time. This doesn’t require connections between the constituents. But what if we want to represent yellow triangle and blue circle at the same time? Maybe this explains the serial nature of consciousness. And maybe it doesn’t! color neurons

Using space to bind things together Conventional computers can bind things together by putting them into neighboring memory locations. This works nicely in vision. Surfaces are generally opaque, so we only get to see one thing at each location in the visual field. If we use topographic maps for different properties, we can assume that properties at the same location belong to the same thing.

The definition of “distributed representation” Each neuron must represent something so it’s a local representation of whatever this something is. “Distributed representation” means a many-to-many relationship between two types of representation (such as concepts and neurons). Each concept is represented by many neurons Each neuron participates in the representation of many concepts

Coarse coding Using one neuron per entity is inefficient. An efficient code would have each neuron active half the time. This might be inefficient for other purposes (like associating responses with representations). Can we get accurate representations by using lots of inaccurate neurons? If we can it would be very robust against hardware failure.

Coarse coding Use three overlapping arrays of large cells to get an array of fine cells If a point falls in a fine cell, code it by activating 3 coarse cells. This is more efficient than using a neuron for each fine cell. It loses by needing 3 arrays It wins by a factor of 3x3 per array Overall it wins by a factor of 3

How efficient is coarse coding? The efficiency depends on the dimensionality In one dimension coarse coding does not help In 2-D the saving in neurons is proportional to the ratio of the fine radius to the coarse radius. In k dimensions , by increasing the radius by a factor of R we can keep the same accuracy as with fine fields and get a saving of:

Coarse regions and fine regions use the same surface Each binary neuron defines a boundary between k-dimensional points that activate it and points that don’t. To get lots of small regions we need a lot of boundary. fine coarse saving in neurons without loss of accuracy ratio of radii of fine and coarse fields constant

Limitations of coarse coding It achieves accuracy at the cost of resolution Accuracy is defined by how much a point must be moved before the representation changes. Resolution is defined by how close points can be and still be distinguished in the represention. Representations can overlap and still be decoded if we allow integer activities of more than 1. It makes it difficult to associate very different responses with similar points, because their representations overlap This is useful for generalization. The boundary effects dominate when the fields are very big.

Coarse coding in the visual system As we get further from the retina the receptive fields of neurons get bigger and bigger and require more complicated patterns. Most neuroscientists interpret this as neurons exhibiting invariance. But its also just what would be needed if neurons wanted to achieve high accuracy for properties like position orientation and size. High accuracy is needed to decide if the parts of an object are in the right spatial relationship to each other.

Representing relational structure “George loves Peace” How can a proposition be represented as a distributed pattern of activity? How are neurons representing different propositions related to each other and to the terms in the proposition? We need to represent the role of each term in proposition.

A way to represent structures George Worms Peace Chips Love Hate Tony War Fish Eat Give A way to represent structures agent object beneficiary action

The recursion problem Jacques was annoyed that Tony helped George One proposition can be part of another proposition. How can we do this with neurons? One possibility is to use “reduced descriptions”. In addition to having a full representation as a pattern distributed over a large number of neurons, an entity may have a much more compact representation that can be part of a larger entity. It’s a bit like pointers. We have the full representation for the object of attention and reduced representations for its constituents. This theory requires mechanisms for compressing full representations into reduced ones and expanding reduced descriptions into full ones.

Representing associations as vectors In most neural networks, objects and associations between objects are represented differently: Objects are represented by distributed patterns of activity Associations between objects are represented by distributed sets of weights We would like associations between objects to also be objects. So we represent associations by patterns of activity. An association is a vector, just like an object.

Circular convolution Circular convolution is a way of creating a new vector, t, that represents the association of the vectors c and x. t is the same length as c or x t is a compressed version of the outer product of c and x T can be computed quickly in O(n log n) using FFT Circular correlation is a way of using c as a cue to approximately recover x from t. It is a different way of compressing the outer product. scalar product with shift of j

A picture of circular convolution Circular correlation is compression along the other diagonals

Constraints required for decoding Circular correlation only decodes circular convolution if the elements of each vector are distributed in the right way: They must be independently distributed They must have mean 0. They must have variance of 1/n i.e. they must have expected length of 1. Obviously vectors cannot have independent features when they encode meaningful stuff. So the decoding will be imperfect.

Storage capacity of convolution memories The memory only contains n numbers. So it cannot store even one association of two n-component vectors accurately. This does not matter if the vectors are big and we use a clean-up memory. Multiple associations are stored by just adding the vectors together. The sum of two vectors is remarkably close to both of them compared with its distance from other vectors. When we try to decode one of the associations, the others just create extra random noise. This makes it even more important to have a clean-up memory.

The clean-up memory Every atomic vector and every association is stored in the clean-up memory. The memory can take a degraded vector and return the closest stored vector, plus a goodness of fit. It needs to be a matrix memory (or something similar) that can store many different vectors accurately. Each time a cue is used to decode a representation, the clean-up memory is used to clean up the very degraded output of the circular correlation operation.

Representing structures A structure is a label plus a set of roles Like a verb The vectors representing similar roles in different structures can be similar. We can implement all this in a very literal way! structure label circular convolution A particular proposition

Representing sequences using chunking Consider the representation of “abcdefgh”. First create chunks for subsequences Then add the chunks together