Creating your Own Kernel

As mentioned in the Core Concepts subsection, each kernel imported from the GraphKernel generic wrapper and found inside grakel.kernels sub-package, inherits the Kernel class found there. In order to write any kernel that will be integrated (see Contributing), in our package we would like the user to inherit that class. This may seem restricting but is not. We will try to demonstrate this in the following sections and we would like to mention the fact that the whole kernel package is not final, but must be always considered final when deploying a new package, because all the kernels should have a unified common interface.

Overriding the Kernel class methods

In order to start we will present the current interface of the Kernel class (public methods) and guide the user how he can write a simple base Kernel, such as the vertex-histogram-kernel.

The vertex-histogram-kernel, defined in [SB15] p.4 section 2.3 is a simple kernel, that calculates label histograms for each graph, that is: counts the number of occurrences for each label value and as a kernel between two graphs calculates the sum of products between frequencies of common occurrences.

To design this kernel let’s first learn some things about the kernel class.

Which methods should I implement

Each kernel should have the following methods:

  • __init__ method implemented. This method should be always overrided.

  • fit. Calculate features of the reference dataset.

  • fit_transform. Calculate features and the kernel matrix on the reference dataset.

  • transform. Calculate the kernel matrix between the fitted dataset and the transformed.

  • diagonal. Calculate the diagonal of the kernel matrix matrix produced between all elements of transform.

So let’s start with our example and define an __init__ method for our vertex-histogram-kernel.

Let’s import some library elements we will use

from warnings import warn
from collections import Counter, Iterable
from grakel import Kernel, Graph

and let’s define the vertex-histogram-kernel

class VertexHistogram(Kernel):
    """Vertex Histogram kernel as found in :cite:`Sugiyama2015NIPS`

    Parameters
    ----------
    None.

    Attributes
    ----------
    None.

    """

    # Define the graph format that this kernel needs (if needed)
    # _graph_format = "auto" (default: "auto")

    def __init__(self,
                 n_jobs=n_jobs,
                 verbose=False,
                 normalize=False,
                 # kernel_param_1=kernel_param_1_default,
                 # ...
                 # kernel_param_n=kernel_param_n_default,
                 ):
        """Initialise an `odd_sth` kernel."""

        # Add new parameters
        self._valid_parameters |= new_parameters

        super(VertexHistogram, self).__init__(n_jobs=n_jobs, verbose=verbose, normalize=normalize)

        # Get parameters and check the new ones
        # @for i=1 to num_new_parameters
        #   self.kernel_param_i = kernel_param_i

        # self.initialized_.update({
        #    param_needing_initialization_1 : False
        #             ...
        #    param_needing_initialization_m : False
        # })

    def initialize_(self):
        """Initialize all transformer arguments, needing initialization."""
        # If you want to implement a parallelization by your self here is your chance
        # If there is a pairwise operation on the Kernel object there is parallelization is implemented
        # Just run the initialise from father to initialise a joblib Parallel (if n_jobs is not None).
        super(VertexHistogram, self).initialize_()

        # for i=1 .. m
        #     if not self.initialized_["param_needing_initialization_i"]:
        #         # Apply checks (raise ValueError or TypeError accordingly)
        #         # calculate derived fields stored on self._derived_field_ia .. z
        #         self.initialized_["param_needing_initialization_i"] = True
        pass

As you see there a few things we should note. Firstly the verbose, normalize, n_jobs parameters should be always included in the kernel definition and passed as is, to the father method by calling his __init__ method. All __init__ parameters should be stored to class method parameters with attributes of the exactly same name, as this is important for the sci-kit learn transformer Pipeline and any needed initialization registered in initialized_ attribute must be registered inside initialize_ method as an update of the father object. If the user decides to overwrite fit or fit_transform he must call self.initialize_() as the first command of any process inside the kernel. This allows a valid parameter resetting through set_params, satisfying an important aspect for Pipeline consistency.

Irrelevant from the interface the Kernel class offers two methods, overriding which, a user can write his own kernel. These methods are parse_input and pairwise_operation.

The parse_input, pairwise_operation methods

The parse_input method actually produces all the features used from the kernel. The default procedure is that both fit and transform use parse_input to create features for each graph producing a list where each element corresponds to the index indicating that graph, from the corresponding iterable (while ignoring empty elements). Those lists are then used feeding single list elements to the pairwise_operation method, calculating a final kernel value.

Note

In order not to repeat again-again the word kernel, when defining a kernel class this word is omitted on the kernel name definition.

So to continue or example we first define parse_input inside the VertexHistogram class:

    def parse_input(self, X):
        """Parse and check the given input for vertex kernel.

        Parameters
        ----------
        X : iterable
            For the input to pass the test, we must have:
            Each element must be an iterable with at most three features and at
            least one. The first that is obligatory is a valid graph structure
            (adjacency matrix or edge_dictionary) while the second is
            node_labels and the third edge_labels (that fitting the given graph
            format).

        Returns
        -------
        out : list
            List of frequency-histogram for each Graph.

        """
        if not isinstance(X, Iterable):
            raise TypeError('input must be an iterable\n')
        else:
            out = list()
            for (i, x) in enumerate(iter(X)):
                is_iter = isinstance(x, Iterable)
                if is_iter:
                    x = list(x)
                if is_iter and len(x) in [0, 2, 3]:
                    if len(x) == 0:
                        warn('Ignoring empty element on index: '+str(i))
                        continue
                    else:
                        # Our element is an iterable of at least 2 elements
                        labels = x[1]
                elif type(x) is Graph:
                    # get labels in any existing format
                    labels = x.get_labels(purpose="any")
                else:
                    raise TypeError('each element of X must be either a ' +
                                     'graph object or a list with at least ' +
                                     'a graph like object and node labels ' +
                                     'dict \n')

                # Append frequencies for the current Graph
                out.append(Counter(labels.values()))

            if len(out) == 0:
                raise ValueError('parsed input is empty')
            return out

The procedure of reading the input is really standard, but must be specified for each kernel according to its minimum acceptable input, that is the least elements with which it can be computed. After reading the labels, we calculate frequencies with a Counter on the label values, while adding them to a list for all the non empty element of the iterable. Finally on those list elements that are produced from parse_input the pairwise operation that calculates the kernel value is

    def pairwise_operation(self, x, y):
        """Calculate sum of frequency products.

        Parameters
        ----------
        x, y : Counter
            Label-Frequency Counters as occur from `parse_input`.

        Returns
        -------
        kernel : number
            The kernel value.

        """
        return sum(x[k]*y[k] for k in x.keys())

where we count the \(\sum_{l \in L_{i} \cap L_{j}} f^{i}_{l}*f^{j}_{l}\), using the property of a Counter object, returning a 0 value on empty occurrences.

Now going back to the fascinating water-example

>>> H2O = Graph([[0, 1, 1], [1, 0, 0], [1, 0, 0]], {0: 'O', 1: 'H', 2: 'H'})
>>> H3O = Graph([[0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]], {0: 'O', 1: 'H', 2: 'H', 3:'H'})
>>> vh_kernel = vertex_histogram(normalize=True)
>>> vh_kernel.fit_transform([H2O])[0, 0]
1.0
>>> vh_kernel.transform([H3O])[0, 0]
0.9899494936611665

So this kernel works!

Note

The normalize argument uses diagonal, which makes it a good way of checking if this method is implemented also (correctly).

Why should I follow the list format on parse input?

You can still avoid changing the list format result on parse_input, by instead of overriding pairwise operation, doing so with overriding _calculate_kernel_matrix method. This method must be able to receive one element whose default value is None, where if it is, the method should calculate the kernel matrix between elements of self.X (where the output of parse_input is stored) and otherwise calculate between the input of Y (which also comes from parse_input) and self.X. By doing so, the developer needs to override the diagonal method, because it normally uses pairwise_operation. The diagonal method can also see the last transformed input as resulted from :code:`parse_input` inside the attribute self._Y.

But why not always use a pairwise operation?

There are cases where the matrix calculation as a batch is much more computationally efficient. This can be demonstrated, by utilizing libraries such as numpy.

Let’s go back to the vertex histogram kernel and do so. The class definition of __init__ stands as above. We will redefine the parse_input method, in order for utilizing the numpy package.

Let’s start with some new import statements

from collections import Counter, Iterable
from sklearn.utils.validation import check_is_fitted
from grakel import Kernel, Graph
from numpy import zeros, einsum

and define the new parse_input

class VertexHistogram(Kernel):
    """Vertex Histogram kernel as found in :cite:`Sugiyama2015NIPS`

    Parameters
    ----------
    None.

    Attributes
    ----------
    None.

    """

    # Define the graph format that this kernel needs (if needed)
    # _graph_format = "auto" (default: "auto")

    def __init__(self,
                 n_jobs=n_jobs,
                 verbose=False,
                 normalize=False,
                 # kernel_param_1=kernel_param_1_default,
                 # ...
                 # kernel_param_n=kernel_param_n_default,
                 ):
        """Initialise an `odd_sth` kernel."""

        # Add new parameters
        self._valid_parameters |= new_parameters

        super(VertexHistogram, self).__init__(n_jobs=n_jobs, verbose=verbose, normalize=normalize)

        # Get parameters and check the new ones
        # @for i=1 to num_new_parameters
        #   self.kernel_param_i = kernel_param_i

        # self.initialized_.update({
        #    param_needing_initialization_1 : False
        #             ...
        #    param_needing_initialization_m : False
        # })

    def initialize_(self):
        """Initialize all transformer arguments, needing initialization."""
        if not self.initialized_["n_jobs"]:
            # n_jobs is not used in this kernel
            # numpy utilises low-level parallelization for calculating matrix operations
            if self.n_jobs is not None:
                warnings.warn('no implemented parallelization for VertexHistogram')
            self.initialized_["n_jobs"] = True

        # for i=1 .. m
        #     if not self.initialized_["param_needing_initialization_i"]:
        #         # Apply checks (raise ValueError or TypeError accordingly)
        #         # calculate derived fields stored on self._derived_field_ia .. z
        #         self.initialized_["param_needing_initialization_i"] = True


    def parse_input(self, X):
        """Parse and check the given input for vertex kernel.

        Parameters
        ----------
        X : iterable
            For the input to pass the test, we must have:
            Each element must be an iterable with at most three features and at
            least one. The first that is obligatory is a valid graph structure
            (adjacency matrix or edge_dictionary) while the second is
            node_labels and the third edge_labels (that fitting the given graph
            format).

        Returns
        -------
        out : np.array, shape=(len(X), n_labels)
            A numpy array for frequency (cols) histograms for all Graphs (rows).

        """
        if not isinstance(X, Iterable):
            raise TypeError('input must be an iterable\n')
        else:
            rows, cols, data = list(), list(), list()
            if self._method_calling in [1, 2]:
                labels = dict()
                self._labels = labels
            elif self._method_calling == 3:
                labels = dict(self._labels)
            ni = 0
            for (i, x) in enumerate(iter(X)):
                is_iter = isinstance(x, Iterable)
                if is_iter:
                    x = list(x)
                if is_iter and len(x) in [0, 2, 3]:
                    if len(x) == 0:
                        warnings.warn('Ignoring empty element' +
                                      ' on index: '+str(i))
                        continue
                    else:
                        # Our element is an iterable of at least 2 elements
                        L = x[1]
                elif type(x) is Graph:
                    # get labels in any existing format
                    L = x.get_labels(purpose="any")
                else:
                    raise TypeError('each element of X must be either a ' +
                                     'graph object or a list with at least ' +
                                     'a graph like object and node labels ' +
                                     'dict \n')

                # construct the data input for the numpy array
                for (label, frequency) in Counter(L.values()).items():
                    # for the row that corresponds to that graph
                    rows.append(ni)

                    # and to the value that this label is indexed
                    col_idx = labels.get(label, None)
                    if col_idx is None:
                        # if not indexed, add the new index (the next)
                        col_idx = len(labels)
                        labels[label] = col_idx

                    # designate the certain column information
                    cols.append(col_idx)

                    # as well as the frequency value to data
                    data.append(frequency)
                ni += 1

            # Initialise the feature matrix
            features = zeros(shape=(ni, len(labels)))
            features[rows, cols] = data

            if ni == 0:
                raise ValueError('parsed input is empty')
            return features

Let’s stop here at two points.

  • What is self._method_calling? The parse input is a global method used by three basic functions (fit, fit_transform and transform) to extract certain features, from the input while parsing it. Although in most cases the procedure is unique there are minor points where the execution of the algorithm should differentiate, according to if the information is intended for the fit method or not. self._method_calling is a class attribute that should be initialized in 1 if any method is called from fit, to 2 if it is called from fit_transform and to 3 if it is called from transform. The ordinary fit_transform, calls fit to parse the input in which case self._method_calling will never be set to 2, but it is added in favor of completeness.

  • Why are you doing something else on fit and on transform? Generally fit stands for setting the reference dataset, that is, it holds information that act as reference for any further transformation. In our case the labels, appearing in all graphs are enumerated in order to be added on a feature matrix. The labels corresponding to the dataset of fit should occupy the first position of the feature matrix, where any new labels of the matrix should be added on new positions coherently and forgotten (because there is no need to be remembered).

Now we would like to define the _calculate_kernel_matrix method, calculating simple the kernel matrix as a result of a simple dot product

    def _calculate_kernel_matrix(self, Y=None):
        """Calculate the kernel matrix given a target_graph and a kernel.

        Each a matrix is calculated between all elements of Y on the rows and
        all elements of X on the columns.

        Parameters
        ----------
        Y : np.array, default=None
            The array between samples and features.

        Returns
        -------
        K : numpy array, shape = [n_targets, n_inputs]
            The kernel matrix: a calculation between all pairs of graphs
            between targets and inputs. If Y is None targets and inputs
            are the taken from self.X. Otherwise Y corresponds to targets
            and self.X to inputs.

        """
        if Y is None:
            K = self.X.dot(self.X.T)
        else:
            K = Y[:, :self.X.shape[1]].dot(self.X.T)
        return K

as well as the diagonal method (using Einstein summation convention which is a really fast method for speeding up the diagonal calculation of the final matrix, as found here)

    def diagonal(self):
        """Calculate the kernel matrix diagonal of the fitted data.

        Parameters
        ----------
        None.

        Returns
        -------
        X_diag : np.array
            The diagonal of the kernel matrix, of the fitted. This consists
            of each element calculated with itself.


        """
        # Check is fit had been called
        check_is_fitted(self, ['X', '_Y'])
        try:
            check_is_fitted(self, ['_X_diag'])
        except NotFittedError:
            # Calculate diagonal of X
            self._X_diag = einsum('ij,ij->i', self.X, self.X)


        Y_diag = einsum('ij,ij->i', self._Y, self._Y)

        return self._X_diag, Y_diag

Now let’s solve a classification problem on the “DD” dataset, by following a standard procedure. First import and download the dataset

>>> from grakel import GraphKernel, datasets
>>> DD = datasets.fetch_dataset("DD", verbose=False)
>>> DD_data = DD.data
>>> DD_classes = DD.target

afterwards split train/test in a pseudo-random way holding

>>> from sklearn.model_selection import train_test_split
>>> X, Y, y_train, y_test = train_test_split(DD_data, DD_classes, test_size=0.1, random_state=42)

Note

For understanding why 42, was chosen as a random_state, check this.

now let’s calculate the K_train and K_test kernels as

>>> vh_kernel = vertex_histogram(normalize=True)
>>> K_train = vh_kernel.fit_transform(X)
>>> K_test = vh_kernel.transform(Y)

and finally classify, by using/finding the best C (that is a parameter emphasizing to the SVM, how much you want to avoid misclassifying each training example)

>>> from sklearn.metrics import accuracy_score
>>> from sklearn.svm import SVC
>>> from numpy import arange

>>> C_grid = 10. ** arange(4,10,1) / len(DD.data)

>>> acc_score = 0
>>> for i in range(C_grid.shape[0]):
>>>     clf = SVC(kernel='precomputed', C = C_grid[i])
>>>     clf.fit(K_train, y_train)
>>>     y_pred = clf.predict(K_test)
>>>     acc_score = max(acc_score, round(accuracy_score(y_test, y_pred)*100, 2))

which produces an accuracy score close to the maximum 78.2% documented on [KGW16]

>>> print(acc_score, "%")
76,27 %

What if I don’t want to follow, this structure at all?

Although the basic methods (fit, fit_transform, transform, diagonal) are needed for the kernel pipeline, the kernel structure is not imposed, but proposed. The user can always write her/his methods, as long as they are coherent with some design specifications needed for a valid kernel, mainly:

  • Unified input support between all structures, while expecting input in a relevant way, that is accepting a Graph type-object or an iterable of at least one element, which position 0 should signify the graph Object, position 1 should signify the node labels and position 2 the edge labels.

  • Support of normalization in a valid way is imperative.

  • The method diagonal is used in order for nested-graph-kernels (such as Weisfeiler-Lehman) to be able to calculate a correct normalized output, so it is also imperative, to be valid. Its output should be supported both as a scalar and as a vector.

Getting more advanced

Note

This documentation entry is a preamble to the Contributing section and will evolve much more as users contribute to this project.

What if you do not want just to write your kernel, but fork the project in order to integrate an optimized version of your new kernel.

Wrapping C++ functions for the kernels to use

We have implemented a simple Cython extension for the use of C++ inside python kernel implementation. This package can be found inside grakel/kernels/_c_functions. There the user can find the following files.

grakel/kernels/_c_functions
                    ├── functions.cpp
                    ├── functions.pxd
                    ├── functions.pyx
                    ├── include
                    │      └── functions.hpp
                    └── src
                         └── *.cpp

In order to add a new function, what he/she should do is to add its source file inside src while including/implementing its definition found inside the include/functions.hpp. Afterwards by externing the function definition inside the library file functions.pxd, by importing it from include/functions.hpp you should add it inside the functions.pyx, file where the essential IO wraping of the C++ function is done inside Python. Finally you must add the address of new C++ you wrote on the _c_functions Extension found on setup.py, by adding it to the list given to the sources argument.

Parallelization

A basic infrastracture for utilizing parallelization possibilities of the kernel computation, was created inside the kernel class. The user define n_jobs for each kernel and the indexes of the kernel matrix are linearized and splitted to the effective number of jobs on which the pairwise operation is applied in parallel. We use the joblib library, as found inside the sklearn.externals. On certain frameworks (Weisfeiler Lehman, Hadamard Code) the strategy for applying parallel kernel calculation is different, where the task is the single kernel calculation of a kernel matrix from the base kernel. Parallelization is a feature we wanted to include in our library because it can give the opportunity to the user to increase the efficiency of kernel matrix computation on large dataset, although the induced overhead in the most cases seems not to worth it. The developer can either follow our road and call the initialization of the father method creating a joblib.Parallel object with a threading backend api (stored in _parallel) and use it inside her/his code anyway he wants or follow a different strategy that seems to have a computational advantage. In case someone wants to contribute in redisigning and extending the parallelization proccess he/she can see how in Contributing.

Bibliography

KGW16

Nils M Kriege, Pierre-Louis Giscard, and Richard Wilson. On Valid Optimal Assignment Kernels and Applications to Graph Classification. In Advances in Neural Information Processing Systems, 1623–1631. 2016.

SB15

Mahito Sugiyama and Karsten M. Borgwardt. Halting in Random Walk Kernels. In Advances in Neural Information Processing Systems, 1639–1647. 2015.