Core Kernel Framework

The core framework is a tool for increasing the expressive power of graph kernels [NMLV18]. The framework is not restricted to graph kernels, but can be applied to any graph comparison algorithm. It capitalizes on the \(k\)-core decomposition which is capable of uncovering topological and hierarchical properties of graphs. Specifically, the \(k\)-core decomposition is a powerful tool for network analysis and it is commonly used as a measure of importance and well connectedness for vertices in a broad spectrum of applications. The notion of \(k\)-core was first introduced by Seidman to study the cohesion of social networks [Sei83]. In recent years, the \(k\)-core decomposition has been established as a standard tool in many application domains such as in network visualization [AHDallAstaBV06], in protein function prediction [WA05] and in graph clustering [GMTV14].

Core Decomposition

Let \(G = (V,E)\) be an undirected and unweighted graph. Let \(n\) and \(m\) denote the number of vertices and number of edges, respectively. Given a subset of vertices \(S \subseteq V\), let \(E(S)\) be the set of edges that have both end-points in \(S\). Then, \(G'=(S,E(S))\) is the subgraph induced by \(S\). We use \(G' \subseteq G\) to denote that \(G'\) is a subgraph of \(G\). The degree of a vertex \(v \in S\), \(d_{G'}(v)\), is equal to the number of vertices that are adjacent to \(v\) in \(G'\). Let \(G\) be a graph and \(G'\) a subgraph of \(G\) induced by a set of vertices \(S\). Then, \(G'\) is defined to be a \(k\)-core of \(G\), denoted by \(C_k\), if it is a maximal subgraph of \(G\) in which all vertices have degree at least \(k\). Hence, if \(G'\) is a \(k\)-core of \(G\), then \(\forall v \in S\), \(d_{G'}(v) \geq k\). Each \(k\)-core is a unique subgraph of \(G\), and it is not necessarily connected. The core number \(c(v)\) of a vertex \(v\) is equal to the highest-order core that \(v\) belongs to. In other words, \(v\) has core number \(c(v) = k\), if it belongs to a \(k\)-core but not to a \((k+1)\)-core. The degeneracy \(\delta^*(G)\) of a graph \(G\) is defined as the maximum \(k\) for which graph \(G\) contains a non-empty \(k\)-core subgraph, \(\delta^*(G) = \max_{v \in V}c(v)\). Furthermore, assuming that \(\mathcal{C} = \{ C_0, C_1, \ldots, C_{\delta^*(G)} \}\) is the set of all \(k\)-cores, then \(\mathcal{C}\) forms a nested chain

\[C_{\delta^*(G)} \subseteq \ldots \subseteq C_1 \subseteq C_0 = G\]

Therefore, the \(k\)-core decomposition is a very useful tool for discovering the hierarchical structure of graphs. The \(k\)-core decomposition of a graph can be computed in \(\mathcal{O}(n+m)\) time cite{matula1983smallest,batagelj2011fast}. The underlying idea is that we can obtain the \(i\)-core of a graph if we recursively remove all vertices with degree less than \(i\) and their incident edges from the graph until no other vertex can be removed.

Core Kernels

The \(k\)-core decomposition builds a hierarchy of nested subgraphs, each having stronger connectedness properties compared to the previous ones. The core framework measures the similarity between the corresponding according to the hierarchy subgraphs and aggregates the results. Let \(G=(V,E)\) and \(G'=(V',E')\) be two graphs. Let also \(k\) be any kernel for graphs. Then, the core variant of the base kernel \(k\) is defined as

\[k_c(G, G') = k(C_0,C'_0) + k(C_1,C'_1) + \ldots + k(C_{\delta^*_{min}},C'_{\delta^*_{min}})\]

where \(\delta^*_{min}\) is the minimum of the degeneracies of the two graphs, and \(C_0,C_1,\ldots,C_{\delta^*_{min}}\) and \(C'_0,C'_1,\ldots,C'_{\delta^*_{min}}\) are the \(0\)-core, \(1\)-core,:math:ldots, \(\delta^*_{min}\)-core subgraphs of \(G\) and \(G'\), respectively. By decomposing graphs into subgraphs of increasing importance, the algorithm is capable of more accurately capturing their underlying structure.

The computational complexity of the core framework depends on the complexity of the base kernel and the degeneracy of the graphs under comparison. Given a pair of graphs \(G, G'\) and an algorithm \(A\) for comparing the two graphs, let \(\mathcal{O}_A\) be the time complexity of algorithm \(A\). Let also \(\delta^*_{min} = \min \big( \delta^*(G),\delta^*(G') \big)\) be the minimum of the degeneracies of the two graphs. Then, the complexity of computing the core variant of algorithm \(A\) is \(\mathcal{O}_{c}=\delta^*_{min}\mathcal{O}_A\). It is well-known that the degeneracy of a graph is upper bounded by the maximum of the degrees of its vertices and by the largest eigenvalue of its adjacency matrix \(\lambda_1\). Since in most real-world graphs it holds that \(\lambda_1 \ll n\), it also holds that \(\delta^*_{max} \ll n\), and hence, the time complexity added by the core framework is not very high.

The implementation of the core framework can be found below

CoreFramework([n_jobs, verbose, normalize, …])

The core kernel framework, as proposed in [NMLV18].

Bibliography

AHDallAstaBV06

J. Ignacio Alvarez-Hamelin, Luca Dall’Asta, Alain Barrat, and Alessandro Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. Advances in Neural Information Processing Systems, 18:41–50, 2006.

GMTV14

Christos Giatsidis, Fragkiskos Malliaros, Dimitrios Thilikos, and Michalis Vazirgiannis. CORECLUSTER: A Degeneracy Based Graph Clustering Framework. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, 44–50. 2014.

NMLV18(1,2)

Giannis Nikolentzos, Polykarpos Meladianos, Stratis Limnios, and Michalis Vazirgiannis. A Degeneracy Framework for Graph Similarity. In 27th International Joint Conference on Artificial Intelligence. 2018.

Sei83

Stephen B. Seidman. Network Structure and Minimum Degree. Social networks, 5(3):269–287, 1983.

WA05

Stefan Wuchty and Eivind Almaas. Peeling the yeast protein network. Proteomics, 5(2):444–449, 2005.