N1H111SM's Miniverse

2020/02/28 Share

Materials:

# Backgrounds

## Basics

History 早期的工作可以被归结为Recurrent GNNs (RecGNN)，节点在每个iteration和周围的节点传递信息直到节点的表示拟合，这种方法在计算上开销比较大，后来的研究者为了解决这一问题也有工作。CNN在视觉领域取得成功之后，研究者在GNN上重新定义了convolution操作并且有很多的工作围绕ConvGNN展开。在ConvGNN中可以分为两个小的子方向：spatial-based ConvGNNs以及spectral-based ConvGNNs。除此之外，还有很多工作关注在Graph AutoEncoders以及Spatial-Temporal GNNs，这些工作都是在前两个方向的基础上进行的。

GNN vs Graph Kernel 后者是以往解决graph classification的主流方法，这个方法通过核函数将图结构映射到一个低维的向量空间，然后利用ML方法例如SVM进行分类。和GNN不同的是，Graph Kernel方法里的核函数是人为定义的而不是可学习的。GNN则将Graph Representation直接进行端到端的训练学习。

## Training Frameworks

Semi-supervised learning for node-level classification. 在节点上进行的半监督分类问题。半监督意味着在一个图中有一些节点进行了标注，而另外一些节点没有标注。通常使用ConvGNN进行未标注节点的分类。

Supervised learning for graph-level classification. Graph-level的有监督分类。通常使用graph convolution + graph pooling/readout layer进行。graph pooling layer通常被用来进行高维信息的聚合，readout layer将每个节点的representation映射到graph-level的representation。

Unsupervised learning for graph embedding. 在没有标注的情况下学习图像的embedding。一种方法采用autoencoder；另一种方法采用negative sampling：采样一些互相没有连接的node pair作为负样本，有连接的node pair作为正样本。

# RecGNNs

RecGNNs是最早开始在图结构上采用神经网络的，可以说是GNNs方法的先驱。早期的工作因为计算条件的限制，只能在directed acyclic graphs上进行。

Scarselli et al.提出的GNN*算是图神经网络的开山之作，它采用了一种information diffusion mechanism - it updates nodes’ states by exchanging neighborhood information recurrently until a stable equilibrium is reached:

To ensure convergence, the recurrent function $f(\cdot)$ must be a contraction mapping, which shrinks the distance between two points after projecting them into a latent space. In the case of $f(\cdot)$ being a neural network, a penalty term has to be imposed on the Jacobian matrix of parameters.

Graph Echo State Network (GraphESN) 在GCN*的基础上进行了改进从而达到了更好的训练效率 - It implements a contractive state transition function to recurrently update node states until the global graph state reaches convergence. Afterward, the output layer is trained by taking the fixed node states as inputs. （该工作的引用不是很高）

Gated Graph Neural Network (GGNN) 采用了GRU作为recurrent function.

Stochastic Steady-state Embedding (SSE)对于更大的图上有比较好的表现。他对于节点hidden state的更新采用的是随机且异步(stochastic and asynchronous)的方式：它会sample一部分的节点进行更新，并且sample另一部分的节点进行梯度计算。为了让算法更加稳定，在进行hidden state的更新时还会采用smoothing的方法。

# ConvGNNs

## Spectral-based ConvGNNs

Spectral-based approaches define graph convolutions by introducing filters from the perspective of graph signal processing where the graph convolutional operation is interpreted as removing noises from graph signals.

Spectral Convolutional Neural Network (Spectral CNN) 假设了多层的可训练multi-channel filter $\mathbf{g}_{\theta}=\mathbf{\Theta}_{i, j}^{(k)}$. Spectral CNN的图卷积操作定义为：

where $k$ is the layer index, $\mathbf{H}^{(k-1)} \in \mathbf{R}^{n \times f_{k-1}}$ is the input graph signal, $\mathbf{H}^{(0)}=\mathbf{X}$, $f_{k-1}$ is the number of input channels and $f_k$ is the number of output channels, $\Theta_{i, j}^{(k)}$ is a diagonal matrix filled with learnable parameters.

Chebyshev Spectral CNN (ChebNet) 认为non-parametric filters有两点不足：(i) they are not localized in space; (ii) their learning complexity is in $O(n)$. 根据上一篇博客中的Lemma 1，我们可以通过高次$L_n$来得到具有局部连通性的filter. 所以我们可以如下设计一个多channel的filter函数：

Chebyshev Polynomials的递归定义如下：

As an improvement over Spectral CNN, the filters defined by ChebNet are localized in space, which means filters can extract local features independently of the graph size.

Graph Convolutional Network (GCN) 对ChebNet进行了一阶近似，通过固定$K=1$，$\lambda_\max=2$，我们能够将上式改写成：

To restrain the number of parameters and avoid over-fitting, GCN further assume $\theta =\theta_0=-\theta_1$, leading to the following definition of a graph convolution,

## Spatial-based ConvGNNs

Spatial-based approaches inherit ideas from RecGNNs to define graph convolutions by information propagation. The spatial graph convolutional operation essentially propagates node information along edges.

Neural Network for Graphs (NN4G) 是和GNN*同一时间提出来的模型。和RecGNNs不同的是，NN4G层之间不共享参数，除此之外，层与层之间还有residual connection。这和之前的GCN有很强的相似性，只是NN4G没有采用normalization的技术，这会导致层与层之间的表示在scale上有非常大的差别。从这里我们可以想到，一定要注意最简单的常识性问题。例如算法是否收敛，设计的模型是否会导致参数的爆炸，每一层的scale是多少，都要有一些意识。深度学习虽然像是在很多基本块中找合适的零件搭积木，但是关于模型设计的技术细节，还是需要多加注意。

Message Passing Neural Network (MPNN) outlines a general framework of spatial-based ConvGNNs. 这篇工作将图卷积看作是message passing的一种方式，节点信息通过边进行传递。message passing机制定义为以下：

Graph Isomorphism Network (GIN) finds that previous MPNN-based methods are incapable of distinguishing different graph structures based on the graph embedding they produced. 为了弥补这个缺点，GIN将中心节点的权重用一个可学习参数来代替：

GraphSage 不直接让所有的邻接节点进行message passing，因为一个节点可能有非常不同的邻接节点数，如果直接全部计算会有较大的计算开销。因此这篇工作采取了从一个节点的neighborhodd中采样固定数目节点的方法，其中$S_{\mathcal{N}(v)}$表示采样，其中的aggregation function $f_k$必须是一个顺序无关的函数例如mean, sum or max：

Graph Attention Network (GAT) 和之前的模型不同，它认为message passing中的不同节点应该有不同的权重。这权重也是通过训练学习得到的。GAT中的图卷积定义为：

Gated Attention Network (GAAN) introduces a self-attention mechanism which computes an additional attention score for each attention head. There are other graph attention models which might be of interest. However, they do not belong to the ConvGNN framework.

### Improvement in terms of training efficiency

Fast Learning with Graph Convolutional Network (Fast-GCN) 和GraphSage不同，它在每一层sample固定数目个node来计算图卷积，而不是对每个node sample固定数目个neighbor。It interprets graph convolutions as integral transforms of embedding functions of nodes under probability measures. Monte Carlo approximation and variance reduction techniques are employed to facilitate the training process.

Stochastic Training of Graph Convolutional Networks (StoGCN)

### Comparison between spectral and spatial models

• 首先，spectral-based models的效率不如spatial-based models。spectral-based model要么需要执行特征向量计算，要么需要同时处理整个图形。空间模型则更可扩展到大型图，因为它们通过信息传播直接在图域中执行卷积。可以在一批节点而不是整个图中执行计算。
• 依赖于图傅立叶基础的spectral-based model无法很好地推广到新图。因为图拉普拉斯矩阵因不同的图而不同，一旦产生变化需要进行costly的谱分解运算。spatial-based model在每个节点上局部执行图卷积，可以在不同位置和结构之间轻松共享权重。
• Third, spectral-based models are limited to operate on undirected graphs. Spatial-based models are more flexible to handle multi-source graph inputs such as edge inputs, directed graphs, signed graphs, and heterogeneous graphs, because these graph inputs can be incorporated into the aggregation function easily.

## Graph Pooling

Chebyshev Spectral CNN (ChebNet) 中提出了一种有趣的方法：他们会首先对图的节点进行重新排列。Input graphs are first coarsened into multiple levels by the Graclus. After coarsening, the nodes of the input graph and its coarsened version are rearranged into a balanced binary tree. Arbitrarily aggregating the balanced binary tree from bottom to top will arrange similar nodes together. Pooling such a rearranged signal is much more efficient than pooling the original. SortPooling通过另一种机制来对节点进行排序。

The aforementioned pooling methods mainly consider graph features and ignore the structural information of graphs. DiffPool 提出了一种可微的pooling方法。

## Discussion of Theoretical Aspects

Receptive Field. 指的是中心节点能够“看”到多远的邻接节点。对于GCN类似的方法，每多一层graph convolution，receptive field的半径就增大1. 所以对于一个连通图来说，至少存在一个最大层数，能够让任意一个节点作为中心节点看到全局的图结构信息。

VC Dimension. Vapnik–Chervonenkis (VC) dimension is a measure of the capacity (complexity, expressive power, richness, or flexibility) of a space of functions that can be learned by a statistical classification algorithm. It is defined as the cardinality of the largest set of points that the algorithm can shatter. 有研究者推导出GNN*如果使用sigmoid or tangent hyperbolic activation，则VC dimension为$O(p^4n^2)$，这意味着如果使用这些非线性函数，那么随着Model parameter number $p$还有节点数$n$的增加，模型的复杂度会快速升高。

Graph Isomorphism. 如果两个图topologically identical，那我们称这两个图是同构的。Graph Isomorphism Network (GIN) 工作证明如果一个GNN将两个图$G_1$和$G_2$分别映射到两个不同的embedding上，那么这两个图可以通过WL-test被识别成两个异构图。该工作同时证明了GCN以及GraphSage是没有办法辨别不同的图结构的。同时该工作更进一步证明了如果一个GNN的aggregation function以及readout function都是单射（injective），那么这个GNN在辨别两个异构图的能力上和WL-test是没有差别的。

Equivariance and Invariance. GNN在处理node-level task时必须是equivariant function，在处理graph-level task时必须是invariant function. 若用数学语言表达，假设$f(\mathbf{A}, \mathbf{X}) \in R^{n \times d}$ 是一个GNN函数，$\mathbf Q$为任意的置换矩阵。如果$f\left(\mathbf{Q} \mathbf{A} \mathbf{Q}^{T}, \mathbf{Q X}\right)=\mathbf{Q} f(\mathbf{A}, \mathbf{X})$，则为equivariant；如果$f\left(\mathbf{Q} \mathbf{A} \mathbf{Q}^{T}, \mathbf{Q X}\right)= f(\mathbf{A}, \mathbf{X})$，则为invariant。这也就是说，对于任意的节点顺序，GNN函数必须给出相对同节点的相同的结果。

# Spatial-Temporal GNNs

Graphs in many real-world applications are dynamic both in terms of graph structures and graph inputs. Spatial-temporal graph neural networks (STGNNs) occupy important positions in capturing the dynamicity of graphs. Methods under this category aim to model the dynamic node inputs while assuming interdependency between connected nodes. STGNNs capture spatial and temporal dependencies of a graph simultaneously. The task of STGNNs can be forecasting future node values or labels, or predicting spatial-temporal graph labels. STGNNs follow two directions, RNN-based methods and CNN-based methods.

RNN-based approaches suffer from time-consuming iterative propagation and gradient explosion/vanishing issues. As alternative solutions, CNN-based approaches tackle spatial-temporal graphs in a non-recursive manner with the advantages of parallel computing, stable gradients, and low memory requirements.

# Future Directions

Model Depth. CNN在视觉领域证实了网络层越多表现越好。然而研究工作显示GNN正相反。因为GNN中的节点表示不断地和周围进行传递，在理论上，足够多层的网络会让所有的节点表示拟合到一个single point. This raises the question of whether going deep is still a good strategy for learning graph data.

Salability Trade-off. The scalability of GNNs is gained at the price of corrupting graph completeness. Whether using sampling or clustering, a model will lose part of the graph information. 采样会导致一个节点可能错过一个非常重要的节点；而聚类则导致一个节点可能失去它独特的pattern。如何平衡算法的可扩展性和图完整性是未来的一大方向。

Heterogenity. 现在的GNN直接假设同质的图，但对于异质图来说，节点、边可能有不同的表示和输入（图像或文字）。

Dynamicity. Graphs are in nature dynamic in a way that nodes or edges may appear or disappear, and that node/edge inputs may change time by time. New graph convolutions are needed to adapt to the dynamicity of graphs. Although the dynamicity of graphs can be partly addressed by STGNNs, few of them consider how to perform graph convolutions in the case of dynamic spatial relations.