Download presentation
Presentation is loading. Please wait.
1
Graph Attention Networks
Authors: Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
2
Outlines Goal Challenges Method Discussion (advantages) Experiment
Future work (limitations) Summary
3
Goal Build a model for node classification of graph-structured data
Compute the hidden representations of each node by attending over its neighbors
4
Challenges Limitations of the existing work
Depends on the graph structure, cannot directly apply to unseen graphs Sample a fixed-sized neighborhood of each node (e.g., GraphSAGE) For the proposed model (contributions) Generalized to completely unseen graphs Apply to graph nodes having different degrees
5
Method Graph attention layer The input: a set of node features
The output: a set of new node features including neighboring information Attention coefficients: eij indicates the importance of node j’s features to node i
6
Attention coefficients
W is a weight matrix, a linear transformation a is a shared attentional mechanism For node i, neighboring nodes are first-order neighbors of i After normalization and activation, the coefficients is ∥ is the concatenation operation
7
Multi-head attention For a single attention for a node i,
For k independent attention for a node i, For the final layer, replace concatenation with average
8
Advantages Computational efficient (parallel computation)
The time complexity: Assign different weights to nodes of a same neighborhood No requirement to be undirected Work on the entire neighboring nodes No ordering on the neighboring nodes
9
Datasets for the experiment
Tested on four datasets
10
Experiment setup and Evaluation metrics
Transductive learning: a two-layer GAT model For Cora and Citeseer datasets, The first layer: K = 8, F’ = 8, ELU (exponential linear unit) The second layer (classifier), K = 1, F’ = #classes, softmax For Pubmed, the only change is K = 8 in the classification layer. Mean classification accuracy Inductive learning: a three-layer GAT model The first two layers: K = 4, F’ = 256, ELU (exponential linear unit) The third layer (classifier): K = 6, F’ = 121, logistic sigmoid Micro-average F1
11
Experiment results (Mean classification accuracy)
12
Experiment results (Micro-average F1)
13
Future work In practical, handle larger batch size
Because the implementation only supports sparse matric multiplication for rank-2 tensors Neighboring nodes attention for model interpretability Incorporate edge features
14
Summary Graph attention networks Graph attention layer
Deal with different sized neighborhoods Does not need to know the entire graph structure upfront
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.