The loss function that calculates the (Kullback–Leibler divergence). It reflects, like CrossEntropy, the measure of error in representing one density (real) of probabilities of another (predicted)

It is used in classification tasks.

The error function formula is:

KL(P || Q) = \int\limits_{R^d} p(x)\log{\frac{p(x)}{q(x)}}dx


P, Q - continuous random variables in the R^d space;
KL(P || Q) - Kullback-Leibler divergence for distributions P and Q;
p(x), q(x) - distribution densities of P and Q respectively.

Connection with entropy and cross entropy:

KL(P || Q) = \int p(x)\log{\frac{p(x)}{q(x)}}dx = \int p(x)\log{p(x)}dx - \int p(x)\log{q(x)}dx = H(p) + H(p, q)


H(p) - entropy of the distribution of P;
H(p, q) - cross entropy of the distributions P and Q.


def __init__(self, maxlabels=None, normTarget=False):


Parameter Allowed types Description Default
maxlabels int Index of the last possible class None
normTarget bool Whether to normalize the target distribution False


maxlabels - needed for additional verification when working with loaded target labels, i.e. if the target labels contain values larger than the value passed in this argument, the class will throw an error;

normTarget - when this flag is set, the values of the target class tensor will be normalized by the softmax function, that is, if the target tensor is received in the "raw" form with values x_i\in{R}, then with the flag set: x_i\in[0, 1], \sum_{i=0}^N x_i = 1.


Necessary imports:

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Cost import KLDivergence


gpuarray required to properly place the tensor in the GPU

Synthetic target and prediction tensors:

scores = gpuarray.to_gpu(np.random.randn(10, 10).astype(np.float32))
labels = gpuarray.to_gpu(np.random.randn(10, 10).astype(np.float32))


Please remember that the first dimension of target and prediction tensors is the size of the batch.

Initializing the error function:

div = KLDivergence(normTarget=True)

Calculating the error and the gradient on the batch:

error, grad = div(pred, target)