ODD-STh Kernel

The ODD-STh kernel is a kernel between labeled graphs. Its approach derives from the idea of utilizing tree-based kernels, i.e. kernels that take as input graphs that are trees. Such kernels are in general more computationally efficient as trees are constrained to interesting properties. The idea behind the ODD-STh kernel proposed in [DSMNS12], has to do with decomposition of two graph to ordered DAGs and adding the kernel values between all pairs of DAGs of the original graphs as:

\[\begin{split}K_{K_{DAG}}(G_{1}, G_{2}) = \sum_{\substack{D_{1} \in DD(G_{1}) \\ D_{2} \in DD(G_{2})}} K_{DAG}(D1, D2)\end{split}\]

where \(DD(G_{i})\) corresponds to a graph decomposition of this graph and \(K_{DAG}\) is a kernel between DAGs. As a DAG decomposition of each graph they considered the set of all directed BFS explorations starting from each node inside the graph, as follows in the picture:

../_images/odd_sth_1.png

A simple DAG decomposition of a single graph

Now in order to move from DAGs to trees each \(K_{DAG}\) kernel was calculated as the sum of tree kernel between derived trees between each of the two DAGs:

\[\begin{split}K_{DAG} = \sum_{\substack{v_{1} \in V(D_{1}) \\ v_{2} \in V(D_{2})}} C(root(T(v_{1})), root(T(v_{2})))\end{split}\]

where \(T()\) corresponds to the tree-visits on DAGs (which preserve an essence of textit{ordering} as found in ([DSMNS12], section 5.2). An example of such tree visits follows:

../_images/odd_sth_2.png

Ordered tree visits on a DAG decomposed from a graph

\(C()\) is a kernel between trees, where in our case it will be the Sub-Tree Kernel (as found in [VS02]).

For increasing the efficiency of this algorithm for the new set of DAG decomposition, known as ODD (Ordered Dag Decomposition), an aggregation of all the decomposition in a single DAG was proposed notated as \(BigDAG\). This method introduced in ([ADSMSM06], MinimalDAG: Figure 2, p. 3), aggregates nodes having same labels with frequencies if they correspond to the same path on each DAG, while conserves the existence of nodes that cannot be aggregated.

../_images/odd_sth_3.png

Construction of a \(BigDAG\) from two DAGs

Doing so allows as to replace the kernel computation:

\[\begin{split}K_{K_{DAG}}(G_{1}, G_{2}) = \sum_{\substack{D_{1} \in DD(G_{1}) \\ D_{2} \in DD(G_{2})}} K_{DAG}(D1, D2)\end{split}\]

with:

\[\begin{split}K_{BigDAG}(G_{1}, G_{2}) = \sum_{\substack{u_{1} \in V(BigDAG(G_{1}))\\ u_{2} \in V(BigDAG(G_{2})}} f_{u_{1}}f_{u_{2}}C(u_{1}, u_{2})\end{split}\]

where \(f_{u}\) is the frequency counter of the node \(u\) and \(C(u, v)\) is the number of matching proper subtrees from \(u\) and \(v\). An even more abstract idea they followed was to created a \(Big^{2}DAG\) where all the \(BigDAGs\) created from each graph, would be aggregated to a single one, in the same way as in trees, but instead of incrementing frequencies on common nodes a frequency vector of appended frequencies for each DAG, was constructed.

../_images/odd_sth_4.png

Construction of a \(Big^{2}DAG\) from two \(BigDAGs\)

In the final \(Big^{2}DAG\) graph, the computation of the kernel matrix is all about calculating the following formula:

\[K_{Big^{2}DAG}(G_{i}, G_{j}) = \sum_{u_{1}, u_{2} \in V(Big^{2}DAG)} F_{u_{1}}[i] * F_{u_{2}}[j] C(u_{1}, u_{2})\]

which is equivalent to:

\[K_{Big^{2}DAG}(G_{i}, G_{j}) = \sum_{u \in V(Big^{2}DAG)} F_{u}[i] * F_{u}[j] C(u, u)\]

because the subtree kernel will have a match only between identical subtrees, that is:

\[C(u_{1}, u_{2}) \not= 0 \leftrightarrow T(u_{1}) = T(u_{2})\]

Finally in order to construct the \(Big^{2}DAG\) each vertex would be represented by a tuple containing a unique hash (whose uniqueness has to do with the ordering) a frequency vector and a depth, which where utilized for calculating the kernel value. In order to restrict the size of the produced graphs a parameter \(h\) was introduced which restricts the maximum depth of the BFS exploration when doing the graph decomposition.

The Ordered Dag Decomposition - Sub-Tree \(h\) (ODD-STh) kernel can be found implemented below:

OddSth([n_jobs, normalize, verbose, h])

ODD-Sth kernel as proposed in [DSMNS12].

Note

Because the \(Big^{2}DAG\) graph should be preserved through consequent transformations, the cost of copying it may make fit_transform calculation between all the graphs of train and test faster than fitting on train graphs and transforming on test graphs.

Bibliography

ADSMSM06

Fabio Aiolli, Giovanni Da San Martino, Alessandro Sperduti, and Alessandro Moschitti. Fast On-line Kernel Learning for Trees. In Proceedings of the 6th International Conference on Data Mining, 787–791. 2006.

DSMNS12(1,2,3)

Giovanni Da San Martino, Nicolo Navarin, and Alessandro Sperduti. A Tree-Based Kernel for Graphs. In Proceedings of the 2012 SIAM International Conference on Data Mining. 2012.

VS02

S.V.N. Vishwanathan and Alexander J. Smola. Fast Kernels for String and Tree Matching. In Proceedings of the 15th International Conference on Neural Information Processing Systems, 585–592. 2002.