Weisfeiler Lehman Framework

This Weisfeiler Lehman framework operates on top of existing graph kernels and is inspired by the Weisfeiler-Lehman test of graph isomorphism [WL68]. The key idea of the Weisfeiler-Lehman algorithm is to replace the label of each vertex with a multiset label consisting of the original label of the vertex and the sorted set of labels of its neighbors. The resultant multiset is then compressed into a new, short label. This relabeling procedure is then repeated for \(h\) iterations. Note that this procedure is performed simultaneously on all input graphs. Therefore, two vertices from different graphs will get identical new labels if and only if they have identical multiset labels.

More formally, given a graph \(G=(V,E)\) endowed with a labeling function \(\ell=\ell_0\), the Weisfeiler-Lehman graph of \(G\) at height \(i\) is a graph \(G_i=(V,E)\) endowed with a labeling function \(\ell_i\) which has emerged after \(i\) iterations of the relabeling procedure described above.

The Weisfeiler-Lehman sequence up to height \(h\) of \(G\) consists of the Weisfeiler-Lehman graphs of \(G\) at heights from \(0\) to \(h\), \(\{ G_0,G_1,\ldots,G_h\}\).

Definition: Weisfeiler-Lehman Framework

Let \(k\) be any kernel for graphs, that we will call the base kernel. Then the Weisfeiler-Lehman kernel with \(h\) iterations with the base kernel \(k\) between two graphs \(G\) and \(G'\) is defined as

\begin{equation} k_{WL}(G,G') = k(G_0,G_0') + k(G_1,G_1') + \ldots + k(G_h,G_h') \end{equation}

where \(h\) is the number of Weisfeiler-Lehman iterations, and \(\{ G_0,G_1,\ldots,G_h\}\) and \(\{ G_0',G_1',\ldots,G_h'\}\) are the Weisfeiler-Lehman sequences of \(G\) and \(G'\) respectively.

From the above definition, it is clear that any graph kernel that takes into account discrete node labels can take advantage of the Weisfeiler-Lehman framework and compare graphs based on the whole Weisfeiler-Lehman sequence.

The general implementation of this framework can be found here:

WeisfeilerLehman([n_jobs, verbose, …])

Compute the Weisfeiler Lehman Kernel.

It should support all Kernel objects inside its parameter base_kernel (formulated in the correct way).

Weisfeiler-Lehman Subtree Kernel

The Weisfeiler-Lehman subtree kernel is a very popular algorithm, and is considered the state-of-the-art in graph classification. Let \(G\), \(G'\) be two graphs. Define \(\Sigma_i \subseteq \Sigma\) as the set of letters that occur as node labels at least once in \(G\) or \(G'\) at the end of the \(i^{th}\) iteration of the Weisfeiler-Lehman algorithm. Let \(\Sigma_0\) be the set of original node labels of \(G\) and \(G'\). Assume all \(\Sigma_i\) are pairwise disjoint. Without loss of generality, assume that every \(\Sigma_i = \{ \sigma_{i1},\ldots,\sigma_{i|\Sigma_i|} \}\) is ordered. Define a map \(c_i : \{ G,G' \} \times \Sigma_i \rightarrow \mathbb{N}\) such that \(c_i(G, \sigma_{ij})\) is the number of occurrences of the letter \(\sigma_{ij}\) in the graph \(G\).

The Weisfeiler-Lehman subtree kernel on two graphs \(G\) and \(G'\) with \(h\) iterations is defined as

\begin{equation} k(G,G') = \langle \phi(G),\phi(G') \rangle \end{equation}

where

\begin{equation} \phi(G) = (c_0(G,\sigma_{01}),\ldots,c_0(G,\sigma_{0|\Sigma_0|}),\ldots,c_h(G,\sigma_{h1}),\ldots,c_h(G,\sigma_{h|\Sigma_h|})) \end{equation}

and

\begin{equation} \phi(G') = (c_0(G',\sigma_{01}),\ldots,c_0(G',\sigma_{0|\Sigma_0|}),\ldots,c_h(G',\sigma_{h1}),\ldots,c_h(G',\sigma_{h|\Sigma_h|})) \end{equation}

It can be shown that the above definition is equivalent to comparing the number of shared subtrees between the two input graphs [SSVL+11]. It is interesting to note that the Weisfeiler-Lehman subtree kernel exhibits an attractive computational complexity since it can be computed in \(\mathcal{O}(hm)\) time.

Note

To create an instance of the above kernel use the Vertex Histogram Kernel as the base_kernel.

Bibliography

SSVL+11

Nino Shervashidze, Pascal Schweitzer, Erik Jan van Van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. Weisfeiler-Lehman Graph Kernels. Journal of Machine Learning Research, 12:2539–2561, 2011.

WL68

Boris Weisfeiler and AA Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, 2(9):12–16, 1968.