N1H111SM's Miniverse

Graph Convolutional Neural Networks

字数统计: 1k阅读时长: 4 min
2020/02/25 Share

Materials

Motivation

本文提出了广泛流行的GCN,通过对ChebNet进行一阶近似,从而达到了以下效果:首先是一个scalable的方法框架;第二是计算复杂度和图的边集大小呈线性关系。

在本文之前的工作处理半监督节点分类问题时,大多在 loss function 上加上图的正则项$\mathcal L_{reg}$:

其中$\Delta = D-A$代表的是未经过归一化的graph laplacian. The formulation of this euation relies on the assumption that connected nodes in the graph are likely to share the same label. This assumption, however, might restrict modeling capacity, as graph edges need not necessarily encode node similarity, but could containa dditional information.

Architecture

基于以上的动机本文提出了一个神经网络架构$f(X,A)$从而避免正则项的使用。

其中 $\tilde A$ 表示加上self-loop的邻接矩阵,$\tilde D$ 为其对应的度矩阵;$\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}$ 则为经过归一化的邻接矩阵。$H^{(l)}\in \mathbb R^{N\times D}$ 为第$l$ 层的的图节点表示,对应的$W^{(l)} \in \mathbb R ^{D\times D}$为第$l$层的参数矩阵,从这里我们可以看出参数矩阵和图的结构是无关的(因为和节点维数无关),从而给了GCN scalability。以上的等式优美的地方在于它是可以从Spectral-based方法推导得到的。

Derivation from ChebNet

ChebNet是利用矩阵上的切比雪夫多项式进行K阶近似的方法。简单来说是将图傅里叶变换中的$g_\theta$进行近似。

Note that this expression is now K-localized since it is a Kth-order polynomial in the Laplacian, i.e. it depends only on nodes that are at maximum K steps away from the central node (Kth-order neighborhood).

K阶近似意味着中心节点的表示仅会被周围距离为K的节点影响。本文的一个重要想法是:我们不需要在单个filter上去定义K阶近似,相反我们限制一个filter就是一阶近似,将这个一阶近似层叠K层也能够达到Kth-order neighborhood的效果。同时估计ChebNet中$\lambda_{\max}\approx 2$。一阶(线性)近似的两个自由变量分别取为相反数$\theta = \theta^ \prime_0 = -\theta^ \prime_1$。这样我们就得到最开始定义的GCN上单通道的卷积操作:

注意到中间的变换矩阵所有的特征值都在$[0,2]$的区间内,这会导致数值计算上的不稳定。运用renormalization trick $I_{N}+D^{-\frac{1}{2}} A D^{-\frac{1}{2}} \rightarrow \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}$ 即可。

将以上的单通道卷积操作扩展到$X\in \mathbb R^{N\times C}$,即有现在的公式。该公式的计算复杂度为$O(|E|FC)$,因为矩阵$A$是一个sparse matrix。

Semi-Supervised Node Classification

用一个简单的两层模型来解释这个问题中GCN的forward model:

用以下XE loss function下进行训练:

示意图为以下:

image.png

Limitations and Future Work

  • For very large and densely connected graph datasets, further approximations might be necessary.
  • Our framework currently does not naturally support edge features and is limited to undirected graphs (weighted oo unweighted).
  • we implicitly assume locality (dependence on the Kth-order neighborhood for a GCN with K layers) and equal importance of self-connections vs. edges to neighboring nodes. It might be beneficial to introduce a trade-off parameter $\lambda $ in the definition of $\tilde A$: $\tilde A = A + \lambda I_N$.
CATALOG
  1. 1. Motivation
  2. 2. Architecture
    1. 2.1. Derivation from ChebNet
    2. 2.2. Semi-Supervised Node Classification