Associative Memory “Remembering”? – Associating something with sensory cues Cues in terms of text, picture or anything Modeling the process of memorization.

Slides:



Advertisements
Similar presentations
Bioinspired Computing Lecture 16
Advertisements

Chapter3 Pattern Association & Associative Memory
Pattern Association.
Introduction to Neural Networks Computing
August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 Chapter 2 The Digital World. 2 Digital Data Representation.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
The 8085 Microprocessor Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
CS 678 –Relaxation and Hopfield Networks1 Relaxation and Hopfield Networks Totally connected recurrent relaxation networks Bidirectional weights (symmetric)
1 Neural networks 3. 2 Hopfield network (HN) model A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982.
Pattern Association A pattern association learns associations between input patterns and output patterns. One of the most appealing characteristics of.
Correlation Matrix Memory CS/CMPE 333 – Neural Networks.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Content Addressable Memories
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Counters and Registers
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Artificial Neural Networks
Associative-Memory Networks Input: Pattern (often noisy/corrupted) Output: Corresponding pattern (complete / relatively noise-free) Process 1.Load input.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
10/6/20151 III. Recurrent Neural Networks. 10/6/20152 A. The Hopfield Network.
Neural Networks Architecture Baktash Babadi IPM, SCS Fall 2004.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Hebbian Coincidence Learning
Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series.
Neural Networks and Fuzzy Systems Hopfield Network A feedback neural network has feedback loops from its outputs to its inputs. The presence of such loops.
IE 585 Associative Network. 2 Associative Memory NN Single-layer net in which the weights are determined in such a way that the net can store a set of.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CSC321: Introduction to Neural Networks and machine Learning Lecture 16: Hopfield nets and simulated annealing Geoffrey Hinton.
MEMORY ORGANIZTION & ADDRESSING Presented by: Bshara Choufany.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam
Optimization with Neural Networks Presented by: Mahmood Khademi Babak Bashiri Instructor: Dr. Bagheri Sharif University of Technology April 2007.
Class 09 Content Addressable Memories Cell Design and Peripheral Circuits.
Chapter 4 MARIE: An Introduction to a Simple Computer.
Introduction to Microprocessors
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Activations, attractors, and associators Jaap Murre Universiteit van Amsterdam en Universiteit Utrecht
Chapter 6 Neural Network.
ECE 471/571 - Lecture 16 Hopfield Network 11/03/15.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
J. Kubalík, Gerstner Laboratory for Intelligent Decision Making and Control Artificial Neural Networks II - Outline Cascade Nets and Cascade-Correlation.
Neural Networks.
Ch7: Hopfield Neural Model
ECE 471/571 - Lecture 15 Hopfield Network 03/29/17.
Real Neurons Cell structures Cell body Dendrites Axon
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
ECE 471/571 - Lecture 19 Hopfield Network.
Indexing and Hashing Basic Concepts Ordered Indices
Covariation Learning and Auto-Associative Memory
Implementing an OpenFlow Switch on the NetFPGA platform
Recurrent Networks A recurrent network is characterized by
Jason Klaus, Duncan Elliott Confidential
Ch4: Backpropagation (BP)
Scalable Multi-Match Packet Classification Using TCAM and SRAM
Patrick Kaifosh, Attila Losonczy  Neuron 
The Network Approach: Mind as a Web
Ch6: AM and BAM 6.1 Introduction AM: Associative Memory
Ch4: Backpropagation (BP)
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
CSC 578 Neural Networks and Deep Learning
Packet Classification Using Binary Content Addressable Memory
Patrick Kaifosh, Attila Losonczy  Neuron 
Presentation transcript:

Associative Memory “Remembering”? – Associating something with sensory cues Cues in terms of text, picture or anything Modeling the process of memorization The minimum requirements of a content addressable memory

Associative Memory Able to store multiple independent patterns Able to recall patterns with reasonable accuracy Should be able to recall partial or noisy input patterns Other expectations – Speed and biological similarities

Associative Memory

Modeling Memory Analogy with physical systems Systems that flow towards locally stable points Ball in a bowl Think in terms of “remembering” the bottom of bowl Initial position is now our sensory cue

Modeling Memory Surface with several stable points -> storing several patterns Depending on initial position (sensory cue) the ball will end up in one of the stable points (stored pattern) Stable point closest to initial position is reached

Modeling Memory Two important points – Stores a set patterns and recalls the one closest to initial position – Minimum energy state is reached Properties of network – Described by state vectors – Has a set of stable states – Network evolves to reach stable state closest to initial state resulting in decrease of energy notes from ht p://

Hopfield Networks A recurrent neural network which can function as a associative memory A node is connected to all other nodes (except with itself) Each node behaves as an input as well as output node Network’s current state defined by outputs of nodes

Hopfield Networks Image Source : ht p:// Nodes are not self connected w ii = 0 Strength of connections are symmetric meaning w ji = w ij

Classification and Search Engines Classification engine receives streams of packets as its input. It applies a set of application-specific sorting rules and policies continuously on the packets. It ends up compiling a series of new parallel packet streams in queues of packets.ored. For classification the NP should consult a memory bank, a lookup table or even a data base where the rules are stored. Search engines are used for consultation of a lookup table or a database based on rules and policies for the correct classification. Search engines are mostly based on associative memory, which is also known as CAM

What is CAM? Content Addressable Memory is a special kind of memory! Read operation in traditional memory:  Input is address location of the content that we are interested in it.  Output is the content of that address. In CAM it is the reverse:  Input is associated with something stored in the memory.  Output is location where the associated content is stored.

CAM for Routing Table Implementation CAM can be used as a search engine. We want to find matching contents in a database or Table. Example Routing Table

Simplified CAM Block Diagram  The input to the system is the search word.  The search word is broadcast on the search lines.  Match line indicates if there were a match btw. the search and stored word.  Encoder specifies the match location.  If multiple matches, a priority encoder selects the first match.  Hit signal specifies if there is no match.  The length of the search word is long ranging from 36 to 144 bits.  Table size ranges: a few hundred to 32K.  Address space : 7 to 15 bits. Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

CAM Memory Size Largest available around 18 Mbit (single chip). Rule of thumb: Largest CAM chip is about half the largest available SRAM chip.  A typical CAM cell consists of two SRAM cells. Exponential growth rate on the size Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

CAM Basics The search-data word is loaded into the search-data register. All match-lines are pre-charged to high (temporary match state). Search line drivers broadcast the search word onto the differential search lines. Each CAM core compares its stored bit against the bit on the corresponding search-lines. Match words that have at least one missing bit, discharge to ground. Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

Type of CAMs Binary CAM (BCAM) only stores 0s and 1s – Applications: MAC table consultation. Layer 2 security related VPN segregation. Ternary CAM (TCAM) stores 0s, 1s and don’t cares. – Application: when we need wilds cards such as, layer 3 and 4 classification for QoS and CoS purposes. IP routing (longest prefix matching). Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb. CAM entries are structured as multiples of 36 bits rather than 32 bits.

CAM Advantages They associate the input (comparand) with their memory contents in one clock cycle. They are configurable in multiple formats of width and depth of search data that allows searches to be conducted in parallel. CAM can be cascaded to increase the size of lookup tables that they can store. We can add new entries into their table to learn what they don’t know before. They are one of the appropriate solutions for higher speeds.

CAM Disadvantages They cost several hundred of dollars per CAM even in large quantities. They occupy a relatively large footprint on a card. They consume excessive power. Generic system engineering problems: – Interface with network processor. – Simultaneous table update and looking up requests.

CAM structure The comparand bus is 72 bytes wide bidirectional. The result bus is output. Command bus enables instructions to be loaded to the CAM. It has 8 configurable banks of memory. The NPU issues a command to the CAM. CAM then performs exact match or uses wildcard characters to extract relevant information. There are two sets of mask registers inside the CAM.

CAM structure  There is global mask registers which can remove specific bits and a mask register that is present in each location of memory.  The search result can be  one output (highest priority)  Burst of successive results.  The output port is 24 bytes wide.  Flag and control signals specify status of the banks of the memory.  They also enable us to cascade multiple chips.

CAM Features CAM Cascading: – We can cascade up to 8 pieces without incurring performance penalty in search time (72 bits x 512K). – We can cascade up to 32 pieces with performance degradation (72 bits x 2M). Terminology: – Initializing the CAM: writing the table into the memory. – Learning: updating specific table entries. – Writing search key to the CAM: search operation Handling wider keys: – Most CAM support 72 bit keys. – They can support wider keys in native hardware. Shorter keys: can be handled at the system level more efficiently.

CAM Latency Clock rate is between 66 to 133 MHz. The clock speed determines maximum search capacity. Factors affecting the search performance: – Key size – Table size For the system designer the total latency to retrieve data from the SRAM connected to the CAM is important. By using pipeline and multi-thread techniques for resource allocation we can ease the CAM speed requirements. Source: IDT

Associative-Memory Networks Input: Pattern (often noisy/corrupted) Output: Corresponding pattern (complete / relatively noise-free) Process 1.Load input pattern onto core group of highly-interconnected neurons. 2.Run core neurons until they reach a steady state. 3.Read output off of the states of the core neurons. InputsOutputs Input: ( ) Output: ( )

Associative Network Types 1. Auto-associative: X = Y 2. Hetero-associative Bidirectional: X <> Y *Recognize noisy versions of a pattern *Iterative correction of input and output BAM = Bidirectional Associative Memory

Associative Network Types (2) 3. Hetero-associative Input Correcting: X <> Y *Input clique is auto-associative => repairs input patterns 4. Hetero-associative Output Correcting: X <> Y *Output clique is auto-associative => repairs output patterns

Hebb’s Rule Connection Weights ~ Correlations ``When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell.” (Hebb, 1949) In an associative neural net, if we compare two pattern components (e.g. pixels) within many patterns and find that they are frequently in: a) the same state, then the arc weight between their NN nodes should be positive b) different states, then ” ” ” ” negative Matrix Memory: The weights must store the average correlations between all pattern components across all patterns. A net presented with a partial pattern can then use the correlations to recreate the entire pattern.

Correlated Field Components Each component is a small portion of the pattern field (e.g. a pixel). In the associative neural network, each node represents one field component. For every pair of components, their values are compared in each of several patterns. Set weight on arc between the NN nodes for the 2 components ~ avg correlation. ?? a b a b Avg Correlation ba w ab

Quantifying Hebb’s Rule Compare two nodes to calc a weight change that reflects the state correlation: Hebbian Principle: If all the input patterns are known prior to retrieval time, then init weights as: * When the two components are the same (different), increase (decrease) the weight Ideally, the weights will record the average correlations across all patterns: Weights = Average Correlations Auto-Association: Hetero-Association: Auto: Hetero: Auto:Hetero: i = input component o = output component

Matrix Representation Let X = matrix of input patterns, where each ROW is a pattern. So x k,i = the ith bit of the kth pattern. Let Y = matrix of output patterns, where each ROW is a pattern. So y k,j = the jth bit of the kth pattern. Then, avg correlation between input bit i and output bit j across all patterns is: 1/P (x 1,i y 1,j + x 2,i y 2,j + … + x p,i y p,j ) = w i,j To calculate all weights: Hetero Assoc:W = X T Y Auto Assoc:W = X T X In Pattern 1: x 1,1..x 1,n In Pattern 2: x 2,1..x 2,n In Pattern p: x 1,1..x 1,n : X X 1,i.. P1 P2 Pp XTXT X 2,i X p,i Out P1: y 1,1.. y 1,j ……y 1,n Out P2: y 2,1.. y 2,j ……y 2,n Out P3: y p,1.. y p,j ……y p,n : Y Dot product

Auto-Associative Memory 1 node per pattern unit Fully connected: clique Weights = avg correlations across all patterns of the corresponding units Auto-Associative Patterns to Remember 2. Distributed Storage of All Patterns: Retrieval Comp/Node value legend: dark (blue) with x => +1 dark (red) w/o x => -1 light (green) => 0

Hetero-Associative Memory 1 3 2b a 1 node per pattern unit for X & Y Full inter-layer connection Weights = avg correlations across all patterns of the corresponding units 1 3 b 2 1. Hetero-Associative Patterns (Pairs) to Remember 2. Distributed Storage of All Patterns: 3. Retrieval b a a 1

Hopfield Networks Auto-Association Network Fully-connected (clique) with symmetric weights State of node = f(inputs) Weight values based on Hebbian principle Performance: Must iterate a bit to converge on a pattern, but generally much less computation than in back-propagation networks. InputOutput (after many iterations) Discrete node update rule: Input value

Hopfield Network Example Patterns to Remember p1p1 p2p2 p3p3 2. Hebbian Weight Init: Avg Correlations across 3 patterns W /3 p1p1 p2p2 p3p3 Avg W /3 W /3 W /3 W /3 W [-] [+] -1/3 1/3 3. Build Network 4. Enter Test Pattern /3 1/3 -1/3 1/

Hopfield Network Example (2) 5. Synchronous Iteration (update all nodes at once) Node1234 Output /31 21/3001/31 3-1/ / Inputs -1/3 1/3 -1/3 1/3 Stable State p1p = Values from Input Layer From discrete output rule: sign(sum)

Using Matrices Goal: Set weights such that an input vector Vi, yields itself when multiplied by the weights, W. X = V1,V2..Vp, where p = # input vectors (i.e., patterns) So Y=X, and the Hebbian weight calculation is: W = X T Y = X T X X = X T = Common index = pattern #, so X T X = this is correlation sum w 2,4 = w 4,2 = x T 2,1 x 1,4 + x T 2,2 x 2,4 + x T 2,3 x 3,4

Matrices (2) The upper and lower triangles of the product matrix represents the 6 weights w i,j = w j,i Scale the weights by dividing by p (i.e., averaging). Picton (ANN book) subtracts p from each. Either method is fine, as long we apply the appropriate thresholds to the output values. This produces the same weights as in the non-matrix description. Testing with input = ( ) ( ) = ( ) Scaling* by p = 3 and using 0 as a threshold gives: (2/3 2/3 2/3 -2/3) => ( ) * For illustrative purposes, it’s easier to scale by p at the end instead of scaling the entire weight matrix, W, prior to testing.

Hopfield Network Example (3) 5b. Synchronous Iteration Node1234 Output 111/ / /31/ /3-1/3000 Inputs 4b. Enter Another Test Pattern /3 1/3 -1/3 1/3 Input pattern is stable, but not one of the original patterns. Attractors in node-state space can be whole patterns, parts of patterns, or other combinations. Spurious Outputs

Hopfield Network Example (4) 5c. Asynchronous Iteration (One randomly-chosen node at a time) 4c. Enter Another Test Pattern Update 3 -1/3 1/3 -1/3 1/3 -1/3 1/3 -1/3 1/3 Update 4 -1/3 1/3 -1/3 1/3 -1/3 1/3 -1/3 1/3 Update 2 Stable & Spurious Asynchronous Updating is central to Hopfield’s (1982) original model.

Hopfield Network Example (5) 5d. Asynchronous Iteration 4d. Enter Another Test Pattern Update 3 -1/3 1/3 -1/3 1/3 -1/3 1/3 -1/3 1/3 Update 4 -1/3 1/3 -1/3 1/3 -1/3 1/3 -1/3 1/3 Update 2 Stable Pattern p 3

Hopfield Network Example (6) 5e. Asynchronous Iteration (but in different order) 4e. Enter Same Test Pattern Update 2 -1/3 1/3 -1/3 1/3 -1/3 1/3 -1/3 1/3 Update 3 or 4 (No change) -1/3 1/3 -1/3 1/3 Stable & Spurious

Associative Retrieval = Search Back-propagation: Search in space of weight vectors to minimize output error Associative Memory Retrieval: Search in space of node values to minimize conflicts between a) node-value pairs and average correlations (weights), and b) node values and their initial values. Input patterns are local (sometimes global) minima, but many spurious patterns are also minima. High dependence upon initial pattern and update sequence (if asynchronous) p1p1 p2p2 p3p3

Energy Function Basic Idea: Energy of the associative memory should be low when pairs of node values mirror the average correlations (i.e. weights) on the arcs that connect the node pair, and when current node values equal their initial values (from the test pattern). When pairs match correlations, w kj x j x k > 0 When current values match input values, I k x k > 0 Gradient Descent A little math shows that asynchronous updates using the discrete rule: yield a gradient descent search along the energy landscape for the E defined above.

Storage Capacity of Hopfield Networks Capacity = Relationship between # patterns that can be stored & retrieved without error to the size of the network. Capacity = # patterns / # nodes or # patterns / # weights If we use the following definition of 100% correct retrieval: When any of the stored patterns is entered completely (no noise), then that same pattern is returned by the network; i.e. The pattern is a stable attractor. A detailed proof shows that a Hopfield network of N nodes can achieve 100% correct retrieval on P patterns if: P < N/(4*ln(N)) N Max P In general, as more patterns are added to a network, the avg correlations will be less likely to match the correlations in any particular pattern. Hence, the likelihood of retrieval error will increase. => The key to perfect recall is selective ignorance!!