# KLDivergence¶

## Description¶

The loss function that calculates the (Kullback–Leibler divergence). It reflects, like CrossEntropy, the measure of error in representing one density (real) of probabilities of another (predicted)

It is used in classification tasks.

The error function formula is:

KL(P || Q) = \int\limits_{R^d} p(x)\log{\frac{p(x)}{q(x)}}dx

where

$P, Q$ - continuous random variables in the $R^d$ space;
$KL(P || Q)$ - Kullback-Leibler divergence for distributions P and Q;
$p(x), q(x)$ - distribution densities of $P$ and $Q$ respectively.

Connection with entropy and cross entropy:

KL(P || Q) = \int p(x)\log{\frac{p(x)}{q(x)}}dx = \int p(x)\log{p(x)}dx - \int p(x)\log{q(x)}dx = H(p) + H(p, q)

where

$H(p)$ - entropy of the distribution of $P$;
$H(p, q)$ - cross entropy of the distributions $P$ and $Q$.

## Initializing¶

def __init__(self, maxlabels=None, normTarget=False):


Parametrs

Parameter Allowed types Description Default
maxlabels int Index of the last possible class None
normTarget bool Whether to normalize the target distribution False

Explanations

maxlabels - needed for additional verification when working with loaded target labels, i.e. if the target labels contain values larger than the value passed in this argument, the class will throw an error;

normTarget - when this flag is set, the values of the target class tensor will be normalized by the softmax function, that is, if the target tensor is received in the "raw" form with values $x_i\in{R}$, then with the flag set: $x_i\in[0, 1]$, $\sum_{i=0}^N x_i = 1$.

## Examples¶

Necessary imports:

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Cost import KLDivergence


Info

gpuarray required to properly place the tensor in the GPU

Synthetic target and prediction tensors:

scores = gpuarray.to_gpu(np.random.randn(10, 10).astype(np.float32))
labels = gpuarray.to_gpu(np.random.randn(10, 10).astype(np.float32))


Important

Please remember that the first dimension of target and prediction tensors is the size of the batch.

Initializing the error function:

div = KLDivergence(normTarget=True)


Calculating the error and the gradient on the batch:

error, grad = div(pred, target)