# Restoration of a nonlinear function by a neural network with activation¶

Take a look on GitHub

### Neural network with nonlinearity: theory¶

As we have shown in the previous tutorial, it is impossible to restore a nonlinear function with a neural network not containing nonlinearity. We will have to use the activation of neurons discussed in the Restoration of linear function by a single neuron to fix this:

Figure 1. Network diagram with one hidden layer and activation

where

$x_1 .. x_m$ - network input;

$z_1 .. z_n$ - hidden layer neurons;

$h_1 .. h_n$ - activation of the respective neurons of the hidden layer via the $f$ function;

$y$ - network output;

$w_{ij}^{(l)}$ - weight of appropriate connections between neurons;

$l$ - layer number, $i$ - neuron on $l + 1$ layer, $j$ - neuron on $l$ layer;

$b_i^{(l)}$ - bias on the respective layers.

We will take the network from the previous tutorial with three inputs and three neurons on the hidden layer. Then the only difference is activation after the hidden layer:

y = \sum_{j=1}^{n} w_{1j}^{(2)}h_j + b_1^{(2)} = W^{(2)}h + b^{(2)} = w_{11}^{(2)}h_1 + w_{12}^{(2)}h_2 + w_{13}^{(2)}h_3 + b_{1}^{(2)}

where $h_j = f(z_j)$.

The gradient calculation for the parameter will change accordingly

$$\frac{\partial{J}}{\partial{w_{11}^{(1)}}} = \frac{\partial{J}}{\partial{y}} \frac{\partial{y}}{\partial{h_1}} \frac{\partial{h_1}}{\partial{z_1}} \frac{\partial{z_1}}{\partial{w_{11}^{(1)}}}$$

We take ReLU as an activation function, where:

$$\begin{matrix} \ f'(x) & = & \left\{ \begin{matrix} 0, & \mbox{if } x < 0 \\ 1, & \mbox{if } x \geq 0 \end{matrix} \right. \end{matrix}$$

### Neural network with nonlinearity: code¶

Start of the script:

import numpy as np
import matplotlib.pyplot as plt

def show(x, y, pred=None, title=None):
plt.plot(x, y, linewidth=5, color="black", antialiased=True, label="True values")

if pred is not None:
plt.plot(x, pred, linewidth=2, color="orange", antialiased=True, label="Predicted values")
if title is not None:
plt.title(f'{title}')
plt.legend()
plt.show()

def showSubplots(x, y, *args, title=None):
fig = plt.figure(1, figsize=[10.4, 4.8])

for i, arg in enumerate(args):
i += 1

ax.plot(x, y, linewidth=5, color="black", antialiased=True, label="True values")

yp = arg["y"]
name = arg["name"]
color = arg["color"]

ax.plot(x, yp, linewidth=2, color=color, antialiased=True, label="{}".format(name))

ax.legend()
if title is not None:
fig.suptitle(f'{title}')
fig.show()


Take classes Linear and Net from tutorial Restoring a linear function by a neural network:

class Linear:
def __init__(self, insize, outsize, name=None):
self.w = np.random.randn(insize, outsize)
self.b = np.zeros((outsize, ))

self.inData = None
self.data = None

self.name = name

def __call__(self, data):
return self.forward(data)

def forward(self, data):
self.inData = data
self.data = np.dot(data, self.w) + self.b

return self.data

return delta

def update(self, lr=0.1):
self.w -= np.dot(self.inData.T, self.grad) * lr


class Error:
@staticmethod
def value(true, pred):
return 0.5 * np.mean((true - pred) ** 2)

@staticmethod
c = 1 / np.prod(true.shape)
return -(true - pred) * c

class Net:
def __init__(self):
self.layers = []

def __call__(self, data):
return self.predict(data)

def __getitem__(self, item):
return self.layers[item]

def append(self, layer):
self.layers.append(layer)
for layer in self.layers[::-1]:

def update(self, lr):
for layer in self.layers:
layer.update(lr)

def predict(self, data):
for layer in self.layers:
data = layer.forward(data)

return data

def optimize(self, data, target, lr):
prediction = self(data)

#print("Simple net error {}".format(Error.value(target, prediction)))

self.update(lr)

X = np.linspace(-3, 3, 1024, dtype=np.float32).reshape(-1, 1)

def func(x):
from math import sin
return 2 * sin(x) + 5

f = np.vectorize(func)
Y = f(X)


A new class - Activation is added. It implements ReLU activation:

class Activation:
def __init__(self, name="relu"):
self.name = name

self.inData = None
self.data = None

def forward(self, data):
self.inData = data

self.data = data * (data > 0)

return self.data

def update(self, lr):
pass


### Network training¶

We introduce a new trick for a nonlinear function - data mixing. We used to form a batch from values that follow each other in a sequence, but now a batch can contain points from different ends of the function definition area:

def trainNet(size, steps=1000, batchsize=10, learnRate=1e-2):
np.random.seed(1234)

net = Net()
net.append(Linear(insize=1, outsize=size, name="layer_1"))
net.append(Activation(name="act_layer"))
net.append(Linear(insize=size, outsize=1, name="layer_2"))

predictedBT = net(X)

XC = X.copy()
perm = np.random.permutation(XC.shape[0])
XC = XC[perm, :]

for i in range(steps):
idx = np.random.randint(0, 1000 - batchsize)
x = XC[idx:idx + batchsize]
y = f(x).astype(np.float32)

net.optimize(x, y, learnRate)

predictedAT = net(X)

showSubplots(
X,
Y,
{
"y": predictedBT,
"name": "Net results before training",
"color": "orange"
},
{
"y": predictedAT,
"name": "Net results after training",
"color": "orange"
}
)

trainNet(5, steps=1000, batchsize=100)


Figure 2. Comparison of network results (5 neurons in a hidden layer) before and after the training

As you can see, the network intends to describe our function, and each neuron in the hidden layer is responsible for its own section of the curve. It would be a logical step to try increasing the number of neurons:

trainNet(100, steps=1000, batchsize=100)


Figure 3. Comparison of network results (100 neurons in a hidden layer) before and after the training

Looks better, but still not accurate. We could increase the number of steps to improve the quality, but that would take a too long, so it is time to use some more powerful tools of the PuzzleLib library.

### Implementing the library tools¶

Please mind that this time we train the network not by steps (running a single batch), but by epochs (running the entire dataset), and we also pick a more advanced optimizer:

def trainNetPL(size, epochs, batchsize=10, learnRate=1e-2):
from PuzzleLib.Modules import Linear, Activation
from PuzzleLib.Modules.Activation import relu
from PuzzleLib.Containers import Sequential
from PuzzleLib.Optimizers import MomentumSGD
from PuzzleLib.Cost import MSE
from PuzzleLib.Handlers import Trainer
from PuzzleLib.Backend.gpuarray import to_gpu

np.random.seed(1234)

net = Sequential()
net.append(Linear(insize=1, outsize=size, initscheme="gaussian"))
net.append(Activation(activation=relu))
net.append(Linear(insize=size, outsize=1, initscheme="gaussian"))

predictedBT = net(to_gpu(X)).get()

cost = MSE()
optimizer = MomentumSGD(learnRate)
optimizer.setupOn(net)

trainer = Trainer(net, cost, optimizer, batchsize=batchsize)

show(X, Y, net(to_gpu(X)).get())

XC, YC = X.copy(), Y.copy()
perm = np.random.permutation(XC.shape[0])
XC = XC[perm, :]
YC = YC[perm, :]

for i in range(epochs):
trainer.trainFromHost(XC.astype(np.float32), YC.astype(np.float32), macroBatchSize=1000,
onMacroBatchFinish=lambda train: print("PL module error: %s" % train.cost.getMeanError()))

net.evalMode()

predictedAT = net(to_gpu(X)).get()

showSubplots(
X,
Y,
{
"y": predictedBT,
"name": "PL net results before training",
"color": "orange"
},
{
"y": predictedAT,
"name": "PL net results after training",
"color": "orange"
}
)


Figure 4. Comparison of PL network results before and after the training

## Conclusion¶

We can conclude that any framework for deep learning is generally a library for automatic differentiation of computational graphs, no magic involved here. We hope that by filling the gaps in understanding the work of neural networks we managed to help the reader get more confident, so you could perhaps even participate in the development of the PuzzleLib library by writing your own modules.