.. _weisfeiler_lehman:

Weisfeiler-Lehman Optimal Assignment
====================================

The Weisfeiler-Lehman optimal assignment kernel capitalizes on the theory of valid assignment kernels to improve the performance of the Weisfeiler-Lehman subtree kernel :cite:`kriege2016valid`.
Before we delve into the details of the kernel, it is necessary to introduce the theory of valid optimal assignment kernels.

Definition: Valid Assignment Kernels
------------------------------------

Let :math:`\mathcal{X}` be a set, and :math:`[\mathcal{X}]^n` denote the set of all :math:`n`-element subsets of :math:`\mathcal{X}`.
Let also :math:`X,X' \in [\mathcal{X}]^n` for :math:`n \in \mathbb{N}`, and :math:`\mathfrak{B}(X,X')` denote the set of all bijections between :math:`X` and :math:`X'`.
The optimal assignment kernel on :math:`[\mathcal{X}]^n` is defined as

.. math::
    :nowrap:

    \begin{equation}
      K_\mathfrak{B}^k(X,X') = \max_{B \in \mathfrak{B}(X,X')} \sum_{(x,x') \in B} k(x,x')
    \end{equation}

where :math:`k` is a kernel between the elements of :math:`X` and :math:`X'`.
Kriege et al. showed that the above function :math:`K_\mathfrak{B}(\mathcal{X},\mathcal{X}')` is a valid kernel only if the base kernel :math:`k` is strong.
Strong kernels are equivalent to kernels obtained from a hierarchy defined on set :math:`\mathcal{X}`.
More specifically, let :math:`T` be a rooted tree such that the leaves of :math:`T` are the elements of :math:`\mathcal{X}`.
Let :math:`V(T)` be the set of vertices of :math:`T`.
Each inner vertex :math:`v` in :math:`T` corresponds to a subset of :math:`\mathcal{X}` comprising all leaves of the subtree rooted at :math:`v`.
Let :math:`w : V(T) \rightarrow \mathbb{R}_{\geq 0}` be a weight function such that :math:`w(v) \geq w(p(v))` for all :math:`v` in :math:`T` where :math:`p(v)` is the parent of vertex :math:`v`.
Then, the tuple :math:`(T,w)` defines a hierarchy.
Let :math:`LCA(u,v)` be the lowest common ancestor of vertices :math:`u` and :math:`v`, that is, the unique vertex with maximum depth that is an ancestor of both :math:`u` and :math:`v`.
Let :math:`H = (T,w)` be a hierarchy on :math:`\mathcal{X}`, then the function defined as :math:`k(x,y) = w(LCA(x,y))` for all :math:`x,y` in :math:`\mathcal{X}` is the kernel on :math:`\mathcal{X}` induced by :math:`H`.
Interestingly, strong kernels are equivalent to kernels obtained from a hierarchical partition of the domain of the kernel. 
Hence, by constructing a hierarchy on :math:`\mathcal{X}`, we can derive a strong kernel :math:`k` and ensure that the emerging assignment function is a valid kernel.

Based on the property that every strong kernel is induced by a hierarchy, we can derive explicit feature maps for strong kernels.
Let :math:`\omega : V(T) \rightarrow \mathbb{R}_{\geq 0}` be an additive weight function defined as :math:`\omega(v) = w(v) - w(p(v))` and :math:`\omega(r) = w(r)` for the root :math:`r`.
Note that the property of a hierarchy assures that the values of the :math:`\omega` function are nonnegative.
For :math:`v \in V(T)`, let :math:`P(v) \subseteq V(T)` denote the vertices on the path from :math:`v` to the root :math:`r`.
The strong kernel :math:`k` induced by the hierarchy :math:`H` can be defined using the mapping :math:`\phi : \mathcal{X} \rightarrow \mathbb{R}^n`, where :math:`n = |V(T)|` and the components indexed by :math:`v \in V(T)` are

.. math::
    :nowrap:

    \begin{equation}
      \phi(v) = \left\{
      \begin{array}{lr}
        \sqrt{\omega(u)} & \text{if }u \in P(v),\\
        \quad 0 & \text{otherwise}
      \end{array}
    \right.
    \end{equation}

The Figure below shows an example of a strong kernel, an associated hierarchy and the derived feature vectors.

.. figure:: ../_figures/optimal_assignment_example.png

    The matrix of a strong kernel on objects :math:`a,b` and :math:`c` (a) induced by the hierarchy (b) and the derived feature vectors (c). A vertex :math:`v` in (b) is annotated by its weights :math:`w(v);\omega(v)`.

Let :math:`H = (T,w)` be a hierarchy on :math:`\mathcal{X}`.
As mentioned above, the hierarchy :math:`H` induces a strong kernel :math:`k`.
Since :math:`k` is strong, the function :math:`K_\mathfrak{B}^k` is a valid kernel.
The kernel :math:`K_\mathfrak{B}^k` can be computed in linear time in the number of vertices :math:`n` of the tree :math:`T` using the histogram intersection kernel as follows

.. math::
    :nowrap:

    \begin{equation}
        K_\mathfrak{B}^k(X, X') = \sum_{i=1}^n \min\big(H_{X}(i),H_{X'}(i)\big)
    \end{equation}

which is known to be a valid kernel on :math:`\mathbb{R}^n` :cite:`barla2003histogram`.
Hence, the complexity of the proposed kernel depends on the size of the tree :math:`T`.
The following Figure illustrates the relation between the optimal assignment kernel employing a strong base kernel and the histogram intersection kernel.

.. figure:: ../_figures/optimal_assignment_histograms.png

    An assignment instance (a) for :math:`X,Y \in [\mathcal{X}]^5` and the derived histograms (b). The set :math:`X` contains three distinct vertices labeled :math:`a` and the set :math:`Y` two distinct vertices labeled :math:`b` and :math:`c`. Taking the multiplicities into account the histograms are obtained from the hierarchy of the base kernel :math:`k` depicted in the previous Figure.  The optimal assignment yields a value of :math:`K_\mathfrak{B}^k(X, Y)` :math:`= \sum_{i=1}^n \min\big(H_{X}(i),H_{Y}(i)\big)` :math:`= \min\{5,5\}` :math:`+ \min\{8,6\}` :math:`+ \min\{3,1\}` :math:`+ \min\{2,4\}` :math:`+ \min\{1s,2\}=15`.


Definition: Weisfeiler-Lehman Optimal Assignment
------------------------------------------------

We next present the Weisfeiler-Lehman optimal assignment kernel.
Let :math:`G=(V,E)` and :math:`G'=(V',E')` be two graphs.
The Weisfeiler-Lehman optimal assignment kernel is defined as

.. math::
    :nowrap:

    \begin{equation}
        k(G,G') = K_\mathfrak{B}^k(V,V')
    \end{equation}

where :math:`k` is the following base kernel

.. math::
    :nowrap:
    
    \begin{equation}
        k(v,v') = \sum_{i=0}^h \delta(\tau_i(v), \tau_i(v'))
    \end{equation}

where :math:`\tau_i(v)` is the label of node :math:`v` at the end of the :math:`i^{th}` iteration of the Weisfeiler-Lehman relabeling procedure.

The base kernel value reflects to what extent two vertices :math:`v` and :math:`v'` have a similar neighborhood.
It can be shown that the colour refinement process of the Weisfeiler-Lehman algorithm defines a hierarchy on the set of all vertices of the input graphs.
Specifically, the sequence :math:`(\tau_i)_{0\leq i \leq h}` gives rise to a family of nested subsets, which can naturally be represented by a hierarchy :math:`(T,w)`.
When assuming :math:`\omega(v) = 1` for all vertices :math:`v \in V(T)`, the hierarchy induces the kernel defined above.
Such a hierarchy for a graph on six vertices is illustrated in the following Figure.

.. figure:: ../_figures/wl_optimal_assignment.png

    A graph :math:`G` with uniform initial labels :math:`\tau_0` and refined labels :math:`\tau_i` for :math:`i \in \{1,\ldots,3\}` (a), and the associated hierarchy (b).


The above kernel is implemented below

.. currentmodule:: grakel

.. autosummary::

   WeisfeilerLehmanOptimalAssignment

Bibliography
------------
.. bibliography:: graph_kernels.bib
   :filter: docname in docnames