Hybrid computing using a neural network with dynamic external memory

Hybrid computing using a neural network with dynamic external memory
Alex Graves, Greg Wayne et al, 2016, Nature Youngnam Kim

Outline This paper proposed an improved version of neural Turing machines The name is the Differentiable Neural Computer a.k.a. DNC The 3 main differences are dynamic memory allocation improved location-based addressing – temporal memory linkage the agent can learn how much to write The presentation is going to address neural Turing machines briefly what are different between NTMs and DNCs experimental results

Neural Turing machines(Alex Graves et al, 2014)
imitate Turing machines by memory networks having an external memory M t ∈ ℝ 𝑁×𝑑 , 𝑁 = number of memory locations, 𝑑 = memory vector dimension read and write heads, interaction with memory must be differentiable a controller learns what and where to read and write, generally RNNs are used

𝑒 𝑡 ∈ ℝ 𝑤 is an erase vector, 𝑎 𝑡 ∈ ℝ 𝑤 is an add vector
Neural Turing machines – read and write To be differentiable, we do Attention read and write everywhere to the different extent Reading 𝑖 𝑤 𝑡 (𝑖) =1, 0≤ 𝑤 𝑡 𝑖 ≤1 𝑟 𝑡 ← 𝑖 𝑤 𝑡 𝑖 𝑴 𝑡 (𝑖) Writing 𝑴 𝑡 𝑖 ← 𝑴 𝑡−1 𝑖 ⊙ 𝟏− 𝑤 𝑡 𝑖 𝒆 𝑡 + 𝑤 𝑡 𝑖 𝒂 𝑡 𝑒 𝑡 ∈ ℝ 𝑤 is an erase vector, 𝑎 𝑡 ∈ ℝ 𝑤 is an add vector

Neural Turing machines – addressing
Addressing – how to produce weights for read and write operations Content-based addressing Location-based addressing a controller produces key vector 𝒌 𝒕 and key strength 𝜷 𝒕 ≥1, then content-based weight 𝑤 𝑡 𝑐 is 𝒘 𝑡 𝑐 = exp⁡{𝛽×𝑆( 𝒌 𝑡 , 𝑴(𝑖))} 𝑗 exp⁡{𝛽×𝑆( 𝒌 𝑡 , 𝑴(𝑗)} 𝑆 𝒖,𝒗 = 𝒖∙𝒗 ‖𝒖‖‖𝒗‖ 𝑆 is a similarity function, generally cosine similarity DNC used the same content-based addressing of NTMs

Location based addressing(different from DNCs) In NTMs, interpolates the content weights 𝒘 𝑡 𝑐 and previous weights 𝒘 𝑡−1 before shift 𝒘 𝑡 𝑔 ← 𝑔 𝑡 𝒘 𝑡 𝑐 + 1− 𝑔 𝑡 𝒘 𝑡−1 where an interpolation gate 𝑔 𝑡 is a scalar in the range (0,1) after interpolation, shifts 𝒘 𝑡 𝑐 using shift distribution 𝒔 𝑡 𝑤 𝑡 𝑖 = 𝑗=0 𝑁−1 𝑤 𝑡 𝑔 𝑗 𝑠 𝑡 (𝑖−𝑗) to avoid leakage and dispersion of weightings, use sharpening parameter 𝛾 𝑡 ≥1 𝑤 𝑡 𝑖 ← 𝑤 𝑡 𝑖 𝛾 𝑡 𝑗 𝑤 𝑡 𝑗 𝛾 𝑡

an example of shift weightings disadvantages: we can iterate on only adjacent elements

Differentiable Neural Computers – architecture

Differentiable Neural Computers – write operation
dynamic memory allocations the agent learns deciding whether the location to be freed in which it reads to do this, the read head produces an allocation vector 𝒂 𝑡 ∈ 0,1 𝑁 when usage vector 𝑢 𝑡 𝑖 is close to 0, indicates 𝑖-th memory location being free 𝝍 𝑡 = 𝑖=1 𝑅 (𝟏− 𝑓 𝑡 𝑖 𝒘 𝑡−1 𝑟,𝑖 ) 𝒖 𝑡 = 𝒖 𝑡−1 + 𝒘 𝑡−1 𝑤 − 𝒖 𝑡−1 ⊙ 𝒘 𝑡−1 𝑤 ⊙ 𝝍 𝑡 𝒂 𝑡 𝝓 𝑡 𝑗 = 1− 𝒖 𝑡 𝝓 𝑡 𝑗 𝑖=1 𝑗−1 𝒖 𝑡 [ 𝝓 𝑡 [𝑖]] overwriting how much free the location is force to use locations more free where 𝝍 𝑡 is a retention vector, 𝑓 𝑡 𝑖 is a free gate of read head 𝑖 and 𝝓 𝑡 is free list 𝒘 𝑡−1 𝑟,𝑖 is a 𝑖-th read weighting of previous time step and 𝒘 𝑡−1 𝑤 is a write weighting free list 𝜙 𝑡 is a sorted list of index in ascending order of usage

Differentiable Neural Computers – write operation
interpolating content weighting 𝒄 𝑡 𝑤 and allocation weightings 𝒂 𝑡 𝒘 𝑡 𝑤 = 𝑔 𝑡 𝑤 [ 𝑔 𝑡 𝑎 𝒂 𝑡 + 1− 𝑔 𝑡 𝑎 𝒄 𝑡 𝑤 ] where 𝑔 𝑡 𝑤 is a write gate and 𝑔 𝑡 𝑎 is an allocation gate

copy of 10 sequences of length 5
Differentiable Neural Computers – write operation copy of 10 sequences of length 5 with memory size 10

Differentiable Neural Computers – read operation
Temporal memory linkage after write operation, we can store information about the order in which the data are written here, let the linkage matrix 𝐿∈ 0,1 𝑁×𝑁 𝐿[𝑖,𝑗] represent the degree to which location 𝑖 was the location written to after location 𝑗 𝒑 𝑡 = 1− 𝑖 𝒘 𝑡 𝑤 𝑖 𝒑 𝑡−1 + 𝒘 𝑡 𝑤 , 𝒑 0 =𝟎 𝑳 𝑡 𝑖,𝑗 = 1− 𝒘 𝑡 𝑤 𝑖 − 𝒘 𝑡 𝑤 𝑗 𝑳 𝑡−1 𝑖,𝑗 + 𝒘 𝑡 𝑤 𝑖 𝒑 𝑡−1 𝑗 𝐿 0 𝑖,𝑗 =0 ∀𝑖,𝑗 𝐿 𝑡 𝑖,𝑖 =0 ∀𝑖 goes to 0 when write is not null the degree to which the latest valid write operation attends to location 𝑗 close to 1, cut the links from 𝑗 to 𝑖

Temporal memory linkage the agent can choose which direction to read forward weighting 𝒇 𝑡 𝑖 and backward weighting 𝒃 𝑡 𝑖 is 𝒇 𝑡 𝑖 = 𝑳 𝑡 𝒘 𝑡−1 𝑟,𝑖 𝒃 𝑡 𝑖 = 𝑳 𝑡 𝑇 𝒘 𝑡−1 𝑟,𝑖

Read mode each read head can choose which mode to read using 𝝅 𝑡 𝑖 ∈ 0,1 3 resulting read weighting of read head 𝑖 is 𝒘 𝑡 𝑟,𝑖 = 𝝅 𝑡 𝑖 1 𝒃 𝑡 𝑖 + 𝝅 𝑡 𝑖 2 𝒄 𝑡 𝑟,𝑖 + 𝝅 𝑡 𝑖 3 𝒇 𝑡 𝑖 Then, we can iterate on written sequences forward and backward regardless of their actual locations

Differentiable Neural Computers – controller
DNCs used a deep LSTM as controller LSTM with multi-layers 𝑥 𝑡 is an input 𝑟 𝑡−1 𝑖 is a read vector of read head 𝑖 at previous time step 𝑣 𝑡 and 𝜉 𝑡 are outputs 𝜉 𝑡 is an interface vector

Differentiable Neural Computers – experiments
bAbI question & answering dataset consisting of 20 type of reasoning 10,000 training data, 1,000 test data Graph task training inference, shortest path and traversal on randomly generated graphs test on London Underground and family tree Mini-SHRDLU moving block to satisfy given constraints reinforcement learning

Differentiable Neural Computers – bAbI
‘mary journeyed to the kitchen. mary moved to the bathroom. john went back to the hallway. john picked up the milk there. Q: what is john carrying?’ the answer is milk. a lexicon of 159 unique words one-hot vector encoding is used DNC is a classifier here

Differentiable Neural Computers – bAbI

Differentiable Neural Computers – Graph task
0-999 labels 1) regress the optimal policy 2) 10-time steps of planning 0-9, direct 10-410, relation(not input) check the DNC remember a graph

logistic regressor input – write vector target – an input triple at that time

Differentiable Neural Computers – extra experiments
DNC trained with 256 memory size for traversal fraction of completes over 100 traversal tasks (source node, edge, destination node)

Differentiable Neural Computers – mini SHRDLU
reward – the number of satisfied constraints penalty – when taking an invalid action logistic regressor input – contents average vector target – first 5 actions by the agent input dimension *9 7 actions

Differentiable Neural Computers – mini SHRDLU
Perfect = minimal moves Success = anyway satisfy all constraints Incomplete = failed to satisfy all constraints

Differentiable Neural Computers – conclusion
reasoning about and representing complex data structure is important DNCs can detect variability of tasks maintaining domain regularity the controller learns domain regularity and write variability in memory future direction is to make the model without adapting parameters

Hybrid computing using a neural network with dynamic external memory

Similar presentations

Presentation on theme: "Hybrid computing using a neural network with dynamic external memory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hybrid computing using a neural network with dynamic external memory

Similar presentations

Presentation on theme: "Hybrid computing using a neural network with dynamic external memory"— Presentation transcript:

Similar presentations

About project

Feedback