Creating your Own Kernel¶
As mentioned in the Core Concepts subsection, each kernel imported from the GraphKernel
generic wrapper
and found inside grakel.kernels
sub-package, inherits the Kernel
class found there.
In order to write any kernel that will be integrated (see Contributing), in our package we would like the
user to inherit that class. This may seem restricting but is not. We will try to demonstrate this in the
following sections and we would like to mention the fact that the whole kernel package is not final, but must
be always considered final when deploying a new package, because all the kernels should have a unified common interface.
Overriding the Kernel
class methods¶
In order to start we will present the current interface of the Kernel
class
(public methods) and guide the user how he can write a simple base Kernel
, such as the vertex-histogram-kernel.
The vertex-histogram-kernel, defined in [SB15] p.4 section 2.3 is a simple kernel, that calculates label histograms for each graph, that is: counts the number of occurrences for each label value and as a kernel between two graphs calculates the sum of products between frequencies of common occurrences.
To design this kernel let’s first learn some things about the kernel class.
Which methods should I implement¶
Each kernel should have the following methods:
__init__
method implemented. This method should be always overrided.fit
. Calculate features of the reference dataset.fit_transform
. Calculate features and the kernel matrix on the reference dataset.transform
. Calculate the kernel matrix between the fitted dataset and the transformed.diagonal
. Calculate the diagonal of the kernel matrix matrix produced between all elements of transform.
So let’s start with our example and define an __init__
method for our vertex-histogram-kernel.
Let’s import some library elements we will use
from warnings import warn
from collections import Counter, Iterable
from grakel import Kernel, Graph
and let’s define the vertex-histogram-kernel
class VertexHistogram(Kernel):
"""Vertex Histogram kernel as found in :cite:`Sugiyama2015NIPS`
Parameters
----------
None.
Attributes
----------
None.
"""
# Define the graph format that this kernel needs (if needed)
# _graph_format = "auto" (default: "auto")
def __init__(self,
n_jobs=n_jobs,
verbose=False,
normalize=False,
# kernel_param_1=kernel_param_1_default,
# ...
# kernel_param_n=kernel_param_n_default,
):
"""Initialise an `odd_sth` kernel."""
# Add new parameters
self._valid_parameters |= new_parameters
super(VertexHistogram, self).__init__(n_jobs=n_jobs, verbose=verbose, normalize=normalize)
# Get parameters and check the new ones
# @for i=1 to num_new_parameters
# self.kernel_param_i = kernel_param_i
# self.initialized_.update({
# param_needing_initialization_1 : False
# ...
# param_needing_initialization_m : False
# })
def initialize_(self):
"""Initialize all transformer arguments, needing initialization."""
# If you want to implement a parallelization by your self here is your chance
# If there is a pairwise operation on the Kernel object there is parallelization is implemented
# Just run the initialise from father to initialise a joblib Parallel (if n_jobs is not None).
super(VertexHistogram, self).initialize_()
# for i=1 .. m
# if not self.initialized_["param_needing_initialization_i"]:
# # Apply checks (raise ValueError or TypeError accordingly)
# # calculate derived fields stored on self._derived_field_ia .. z
# self.initialized_["param_needing_initialization_i"] = True
pass
As you see there a few things we should note. Firstly the verbose
, normalize
,
n_jobs
parameters should be always included in the kernel definition and passed as is,
to the father method by calling his __init__
method. All __init__
parameters should be stored
to class method parameters with attributes of the exactly same name, as this is important for the sci-kit learn
transformer Pipeline and any needed initialization registered in initialized_
attribute must be registered
inside initialize_
method as an update of the father object. If the user decides to overwrite fit
or fit_transform
he
must call self.initialize_()
as the first command of any process inside the kernel. This allows a valid
parameter resetting through set_params
, satisfying an important aspect for Pipeline consistency.
Irrelevant from the interface the Kernel
class offers two methods, overriding which, a user can write
his own kernel. These methods are parse_input
and pairwise_operation
.
The parse_input
, pairwise_operation
methods¶
The parse_input
method actually produces all the features used from the kernel.
The default procedure is that both fit
and transform
use parse_input
to create features for each graph producing a list where each element corresponds to the index
indicating that graph, from the corresponding iterable (while ignoring empty elements).
Those lists are then used feeding single list elements to the pairwise_operation
method, calculating a final kernel value.
Note
In order not to repeat again-again the word kernel, when defining a kernel class this word is omitted on the kernel name definition.
So to continue or example we first define parse_input
inside the VertexHistogram
class:
def parse_input(self, X):
"""Parse and check the given input for vertex kernel.
Parameters
----------
X : iterable
For the input to pass the test, we must have:
Each element must be an iterable with at most three features and at
least one. The first that is obligatory is a valid graph structure
(adjacency matrix or edge_dictionary) while the second is
node_labels and the third edge_labels (that fitting the given graph
format).
Returns
-------
out : list
List of frequency-histogram for each Graph.
"""
if not isinstance(X, Iterable):
raise TypeError('input must be an iterable\n')
else:
out = list()
for (i, x) in enumerate(iter(X)):
is_iter = isinstance(x, Iterable)
if is_iter:
x = list(x)
if is_iter and len(x) in [0, 2, 3]:
if len(x) == 0:
warn('Ignoring empty element on index: '+str(i))
continue
else:
# Our element is an iterable of at least 2 elements
labels = x[1]
elif type(x) is Graph:
# get labels in any existing format
labels = x.get_labels(purpose="any")
else:
raise TypeError('each element of X must be either a ' +
'graph object or a list with at least ' +
'a graph like object and node labels ' +
'dict \n')
# Append frequencies for the current Graph
out.append(Counter(labels.values()))
if len(out) == 0:
raise ValueError('parsed input is empty')
return out
The procedure of reading the input is really standard, but must be specified for each kernel
according to its minimum acceptable input, that is the least elements with which it can be computed.
After reading the labels, we calculate frequencies with a Counter
on the label values, while
adding them to a list for all the non empty element of the iterable. Finally on those list
elements that are produced from parse_input
the pairwise operation that calculates
the kernel value is
def pairwise_operation(self, x, y):
"""Calculate sum of frequency products.
Parameters
----------
x, y : Counter
Label-Frequency Counters as occur from `parse_input`.
Returns
-------
kernel : number
The kernel value.
"""
return sum(x[k]*y[k] for k in x.keys())
where we count the \(\sum_{l \in L_{i} \cap L_{j}} f^{i}_{l}*f^{j}_{l}\), using the property of a Counter
object, returning a 0 value on empty occurrences.
Now going back to the fascinating water-example
>>> H2O = Graph([[0, 1, 1], [1, 0, 0], [1, 0, 0]], {0: 'O', 1: 'H', 2: 'H'})
>>> H3O = Graph([[0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]], {0: 'O', 1: 'H', 2: 'H', 3:'H'})
>>> vh_kernel = vertex_histogram(normalize=True)
>>> vh_kernel.fit_transform([H2O])[0, 0]
1.0
>>> vh_kernel.transform([H3O])[0, 0]
0.9899494936611665
So this kernel works!
Note
The normalize
argument uses diagonal
, which makes it a good way of checking if this method is implemented also (correctly).
Why should I follow the list format on parse input?¶
You can still avoid changing the list format result on parse_input
, by instead of overriding pairwise operation, doing so with overriding _calculate_kernel_matrix
method. This method must be able to receive one element whose default value is None
, where if it is, the method should calculate the kernel matrix between elements of self.X
(where the output of parse_input
is stored) and otherwise calculate between the input of Y
(which also comes from parse_input
) and self.X
.
By doing so, the developer needs to override the diagonal
method, because it normally uses pairwise_operation
. The diagonal method can also see the last transformed
input as resulted from :code:`parse_input` inside the attribute self._Y
.
But why not always use a pairwise operation?¶
There are cases where the matrix calculation as a batch is much more computationally efficient.
This can be demonstrated, by utilizing libraries such as numpy
.
Let’s go back to the vertex histogram kernel and do so. The class definition of __init__
stands as above.
We will redefine the parse_input
method, in order for utilizing the numpy
package.
Let’s start with some new import statements
from collections import Counter, Iterable
from sklearn.utils.validation import check_is_fitted
from grakel import Kernel, Graph
from numpy import zeros, einsum
and define the new parse_input
class VertexHistogram(Kernel):
"""Vertex Histogram kernel as found in :cite:`Sugiyama2015NIPS`
Parameters
----------
None.
Attributes
----------
None.
"""
# Define the graph format that this kernel needs (if needed)
# _graph_format = "auto" (default: "auto")
def __init__(self,
n_jobs=n_jobs,
verbose=False,
normalize=False,
# kernel_param_1=kernel_param_1_default,
# ...
# kernel_param_n=kernel_param_n_default,
):
"""Initialise an `odd_sth` kernel."""
# Add new parameters
self._valid_parameters |= new_parameters
super(VertexHistogram, self).__init__(n_jobs=n_jobs, verbose=verbose, normalize=normalize)
# Get parameters and check the new ones
# @for i=1 to num_new_parameters
# self.kernel_param_i = kernel_param_i
# self.initialized_.update({
# param_needing_initialization_1 : False
# ...
# param_needing_initialization_m : False
# })
def initialize_(self):
"""Initialize all transformer arguments, needing initialization."""
if not self.initialized_["n_jobs"]:
# n_jobs is not used in this kernel
# numpy utilises low-level parallelization for calculating matrix operations
if self.n_jobs is not None:
warnings.warn('no implemented parallelization for VertexHistogram')
self.initialized_["n_jobs"] = True
# for i=1 .. m
# if not self.initialized_["param_needing_initialization_i"]:
# # Apply checks (raise ValueError or TypeError accordingly)
# # calculate derived fields stored on self._derived_field_ia .. z
# self.initialized_["param_needing_initialization_i"] = True
def parse_input(self, X):
"""Parse and check the given input for vertex kernel.
Parameters
----------
X : iterable
For the input to pass the test, we must have:
Each element must be an iterable with at most three features and at
least one. The first that is obligatory is a valid graph structure
(adjacency matrix or edge_dictionary) while the second is
node_labels and the third edge_labels (that fitting the given graph
format).
Returns
-------
out : np.array, shape=(len(X), n_labels)
A numpy array for frequency (cols) histograms for all Graphs (rows).
"""
if not isinstance(X, Iterable):
raise TypeError('input must be an iterable\n')
else:
rows, cols, data = list(), list(), list()
if self._method_calling in [1, 2]:
labels = dict()
self._labels = labels
elif self._method_calling == 3:
labels = dict(self._labels)
ni = 0
for (i, x) in enumerate(iter(X)):
is_iter = isinstance(x, Iterable)
if is_iter:
x = list(x)
if is_iter and len(x) in [0, 2, 3]:
if len(x) == 0:
warnings.warn('Ignoring empty element' +
' on index: '+str(i))
continue
else:
# Our element is an iterable of at least 2 elements
L = x[1]
elif type(x) is Graph:
# get labels in any existing format
L = x.get_labels(purpose="any")
else:
raise TypeError('each element of X must be either a ' +
'graph object or a list with at least ' +
'a graph like object and node labels ' +
'dict \n')
# construct the data input for the numpy array
for (label, frequency) in Counter(L.values()).items():
# for the row that corresponds to that graph
rows.append(ni)
# and to the value that this label is indexed
col_idx = labels.get(label, None)
if col_idx is None:
# if not indexed, add the new index (the next)
col_idx = len(labels)
labels[label] = col_idx
# designate the certain column information
cols.append(col_idx)
# as well as the frequency value to data
data.append(frequency)
ni += 1
# Initialise the feature matrix
features = zeros(shape=(ni, len(labels)))
features[rows, cols] = data
if ni == 0:
raise ValueError('parsed input is empty')
return features
Let’s stop here at two points.
What is
self._method_calling
? The parse input is a global method used by three basic functions (fit
,fit_transform
andtransform
) to extract certain features, from the input while parsing it. Although in most cases the procedure is unique there are minor points where the execution of the algorithm should differentiate, according to if the information is intended for the fit method or not.self._method_calling
is a class attribute that should be initialized in 1 if any method is called fromfit
, to 2 if it is called fromfit_transform
and to 3 if it is called fromtransform
. The ordinaryfit_transform
, callsfit
to parse the input in which caseself._method_calling
will never be set to 2, but it is added in favor of completeness.Why are you doing something else on
fit
and ontransform
? Generallyfit
stands for setting the reference dataset, that is, it holds information that act as reference for any further transformation. In our case thelabels
, appearing in all graphs are enumerated in order to be added on a feature matrix. The labels corresponding to the dataset offit
should occupy the first position of the feature matrix, where any new labels of the matrix should be added on new positions coherently and forgotten (because there is no need to be remembered).
Now we would like to define the _calculate_kernel_matrix
method, calculating simple the kernel matrix as a result of a simple dot product
def _calculate_kernel_matrix(self, Y=None):
"""Calculate the kernel matrix given a target_graph and a kernel.
Each a matrix is calculated between all elements of Y on the rows and
all elements of X on the columns.
Parameters
----------
Y : np.array, default=None
The array between samples and features.
Returns
-------
K : numpy array, shape = [n_targets, n_inputs]
The kernel matrix: a calculation between all pairs of graphs
between targets and inputs. If Y is None targets and inputs
are the taken from self.X. Otherwise Y corresponds to targets
and self.X to inputs.
"""
if Y is None:
K = self.X.dot(self.X.T)
else:
K = Y[:, :self.X.shape[1]].dot(self.X.T)
return K
as well as the diagonal method (using Einstein summation convention which is a really fast method for speeding up the diagonal calculation of the final matrix, as found here)
def diagonal(self):
"""Calculate the kernel matrix diagonal of the fitted data.
Parameters
----------
None.
Returns
-------
X_diag : np.array
The diagonal of the kernel matrix, of the fitted. This consists
of each element calculated with itself.
"""
# Check is fit had been called
check_is_fitted(self, ['X', '_Y'])
try:
check_is_fitted(self, ['_X_diag'])
except NotFittedError:
# Calculate diagonal of X
self._X_diag = einsum('ij,ij->i', self.X, self.X)
Y_diag = einsum('ij,ij->i', self._Y, self._Y)
return self._X_diag, Y_diag
Now let’s solve a classification problem on the “DD” dataset, by following a standard procedure. First import and download the dataset
>>> from grakel import GraphKernel, datasets
>>> DD = datasets.fetch_dataset("DD", verbose=False)
>>> DD_data = DD.data
>>> DD_classes = DD.target
afterwards split train/test in a pseudo-random way holding
>>> from sklearn.model_selection import train_test_split
>>> X, Y, y_train, y_test = train_test_split(DD_data, DD_classes, test_size=0.1, random_state=42)
Note
For understanding why 42, was chosen as a random_state
, check this.
now let’s calculate the K_train
and K_test
kernels as
>>> vh_kernel = vertex_histogram(normalize=True)
>>> K_train = vh_kernel.fit_transform(X)
>>> K_test = vh_kernel.transform(Y)
and finally classify, by using/finding the best C (that is a parameter emphasizing to the SVM, how much you want to avoid misclassifying each training example)
>>> from sklearn.metrics import accuracy_score
>>> from sklearn.svm import SVC
>>> from numpy import arange
>>> C_grid = 10. ** arange(4,10,1) / len(DD.data)
>>> acc_score = 0
>>> for i in range(C_grid.shape[0]):
>>> clf = SVC(kernel='precomputed', C = C_grid[i])
>>> clf.fit(K_train, y_train)
>>> y_pred = clf.predict(K_test)
>>> acc_score = max(acc_score, round(accuracy_score(y_test, y_pred)*100, 2))
which produces an accuracy score close to the maximum 78.2% documented on [KGW16]
>>> print(acc_score, "%")
76,27 %
What if I don’t want to follow, this structure at all?¶
Although the basic methods (fit
, fit_transform
, transform
, diagonal
) are needed for the kernel
pipeline, the kernel structure is not imposed, but proposed. The user can always write her/his methods, as long as they are coherent with some design specifications needed for a valid kernel, mainly:
Unified input support between all structures, while expecting input in a relevant way, that is accepting a
Graph
type-object or an iterable of at least one element, which position 0 should signify the graph Object, position 1 should signify the node labels and position 2 the edge labels.Support of
normalization
in a valid way is imperative.The method
diagonal
is used in order for nested-graph-kernels (such as Weisfeiler-Lehman) to be able to calculate a correct normalized output, so it is also imperative, to be valid. Its output should be supported both as a scalar and as a vector.
Getting more advanced¶
Note
This documentation entry is a preamble to the Contributing section and will evolve much more as users contribute to this project.
What if you do not want just to write your kernel, but fork the project in order to integrate an optimized version of your new kernel.
Wrapping C++ functions for the kernels to use¶
We have implemented a simple Cython extension for the use of C++ inside python kernel implementation.
This package can be found inside grakel/kernels/_c_functions
. There the user can find the following files.
grakel/kernels/_c_functions
├── functions.cpp
├── functions.pxd
├── functions.pyx
├── include
│ └── functions.hpp
└── src
└── *.cpp
In order to add a new function, what he/she should do is to add its source file inside src
while including/implementing
its definition found inside the include/functions.hpp
. Afterwards by externing the function definition inside the library
file functions.pxd
, by importing it from include/functions.hpp
you should add it inside the functions.pyx, file
where the essential IO wraping of the C++ function is done inside Python. Finally you must add the address of new C++ you wrote on
the _c_functions
Extension
found on setup.py
, by adding it to the list given to the sources
argument.
Parallelization¶
A basic infrastracture for utilizing parallelization possibilities of the kernel computation, was created inside the kernel class.
The user define n_jobs for each kernel and the indexes of the kernel matrix are linearized and splitted to the effective number of jobs
on which the pairwise operation is applied in parallel. We use the joblib
library, as found inside the sklearn.externals
.
On certain frameworks (Weisfeiler Lehman, Hadamard Code) the strategy for applying parallel kernel calculation is different, where the
task is the single kernel calculation of a kernel matrix from the base kernel. Parallelization is a feature we wanted to include in our
library because it can give the opportunity to the user to increase the efficiency of kernel matrix computation on large dataset, although
the induced overhead in the most cases seems not to worth it. The developer can either follow our road and call the initialization
of the father method creating a joblib.Parallel
object with a threading
backend api (stored in _parallel
) and use it inside
her/his code anyway he wants or follow a different strategy that seems to have a computational advantage. In case someone wants to contribute
in redisigning and extending the parallelization proccess he/she can see how in Contributing.
Bibliography¶
- KGW16
Nils M Kriege, Pierre-Louis Giscard, and Richard Wilson. On Valid Optimal Assignment Kernels and Applications to Graph Classification. In Advances in Neural Information Processing Systems, 1623–1631. 2016.
- SB15
Mahito Sugiyama and Karsten M. Borgwardt. Halting in Random Walk Kernels. In Advances in Neural Information Processing Systems, 1639–1647. 2015.