Graph Hopper Kernel¶
Given two graphs, the GraphHopper kernel compares shortest paths between pairs of vertices from the two graphs [FKP+13]. The kernel takes into account both path lengths and the vertices encountered while ``hopping’’ along shortest paths. The kernel is equivalent to a weighted sum of node kernels.
Let \(G=(V,E)\) be a graph. The graph contains either discrete node labels or continuous node attributes. Let \(\ell : \mathcal{V} \rightarrow \mathcal{L}\) be a labeling function that assigns either discrete labels or continuous attributes to vertices. The kernel compares node labels/attributes using a kernel \(k_n\) (eg delta kernel in the case of node labels, and linear or gaussian kernel in the case of node attributes). Given two vertices \(v,u \in V\), a path \(\pi\) from \(v\) to \(u\) in \(G\) is defined as a sequence of vertices
where \(v_1 = v\), \(v_l = u\) and \((v_i, v_{i+1}) \in E\) for all \(i=1,\ldots,l-1\). Let \(\pi(i) = v_i\) denote the \(i^{th}\) vertex encountered when ``hopping’’ along the path. Denote by \(l(\pi)\) the weighted length of \(\pi\) and by \(|\pi|\) its discrete length, defined as the number of vertices in \(\pi\). The shortest path \(\pi_{ij}\) from \(v_i\) to \(v_j\) is defined in terms of weighted length. The diameter \(\delta(G)\) of \(G\) is the maximal number of nodes in a shortest path in \(G\), with respect to weighted path length.
The GraphHopper kernel is defined as a sum of path kernels \(k_p\) over the families \(P, P'\) of shortest paths in \(G,G'\)
The path kernel \(k_p(\pi, \pi')\) is a sum of node kernels \(k_n\) on vertices simultaneously encountered while simultaneously hopping along paths \(\pi\) and \(\pi'\) of equal discrete length, that is
The \(k(G,G')\) kernel can be decomposed into a weighted sum of node kernels
where \(w(v,v')\) counts the number of times \(v\) and \(v'\) appear at the same hop, or coordinate, \(i\) of shortest paths \(\pi,\pi'\) of equal discrete length \(|\pi| = |\pi'|\). We can decompose the weight \(w(v,v')\) as
where \(\mathbf{M_v}\) is a \(\delta \times \delta\) matrix whose entry \([\mathbf{M_v}]_{ij}\) counts how many times \(v\) appears at the \(i^{th}\) coordinate of a shortest path in \(G\) of discrete length \(j\), and \(\delta = \max(\delta(G), \delta(G'))\). The components of these matrices can be computed efficiently using recursive message-passing algorithms. The total complexity of computing \(k(G,G')\) is \(\mathcal{O}(n^2(m + \log n + d + \delta^2))\) where \(n\) is the number of vertices, \(m\) is the number of edges and \(d\) is the dimensionality of the node attributes (\(d=1\) in the case of discrete node labels).
The implementation of the neighborhood hash kernel can be found below
|
Graph Hopper Histogram kernel as found in [FKP+13]. |