Graph Attention Networks

Graph Attention Networks
Authors: Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

Outlines Goal Challenges Method Discussion (advantages) Experiment
Future work (limitations) Summary

Goal Build a model for node classification of graph-structured data
Compute the hidden representations of each node by attending over its neighbors

Challenges Limitations of the existing work
Depends on the graph structure, cannot directly apply to unseen graphs Sample a fixed-sized neighborhood of each node (e.g., GraphSAGE) For the proposed model (contributions) Generalized to completely unseen graphs Apply to graph nodes having different degrees

Method Graph attention layer The input: a set of node features
The output: a set of new node features including neighboring information Attention coefficients: eij indicates the importance of node j’s features to node i

Attention coefficients
W is a weight matrix, a linear transformation a is a shared attentional mechanism For node i, neighboring nodes are first-order neighbors of i After normalization and activation, the coefficients is ∥ is the concatenation operation

Multi-head attention For a single attention for a node i,
For k independent attention for a node i, For the final layer, replace concatenation with average

Advantages Computational efficient (parallel computation)
The time complexity: Assign different weights to nodes of a same neighborhood No requirement to be undirected Work on the entire neighboring nodes No ordering on the neighboring nodes

Datasets for the experiment
Tested on four datasets

Experiment setup and Evaluation metrics
Transductive learning: a two-layer GAT model For Cora and Citeseer datasets, The first layer: K = 8, F’ = 8, ELU (exponential linear unit) The second layer (classifier), K = 1, F’ = #classes, softmax For Pubmed, the only change is K = 8 in the classification layer. Mean classification accuracy Inductive learning: a three-layer GAT model The first two layers: K = 4, F’ = 256, ELU (exponential linear unit) The third layer (classifier): K = 6, F’ = 121, logistic sigmoid Micro-average F1

Experiment results (Mean classification accuracy)

Experiment results (Micro-average F1)

Future work In practical, handle larger batch size
Because the implementation only supports sparse matric multiplication for rank-2 tensors Neighboring nodes attention for model interpretability Incorporate edge features

Summary Graph attention networks Graph attention layer
Deal with different sized neighborhoods Does not need to know the entire graph structure upfront

Graph Attention Networks

Similar presentations

Presentation on theme: "Graph Attention Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graph Attention Networks

Similar presentations

Presentation on theme: "Graph Attention Networks"— Presentation transcript:

Similar presentations

About project

Feedback