IT ist kurios!

An even simpler neural net as the simplest neural net

no comments

Googling for the simplest neural net always provide the XOR example with a hidden layer implemented with two neurons. A short but detailed explanation can be found in The Nature of Code – Chapter 10. Neural Networks written by Daniel Shiffman. The trainingssets work like this input(0,1) -> output(1), input(1,1) -> output(0) as in 1 XOR 1 = 0 and so on.

Searching for a “c simple neural net” may come up on top with an article of Santiago Becerra on torwardsdatascience “Simple neural network implementation in C“.  Santiago provides the code at GitHubGist (which is  C++ instead of plain C). There are plenty of other articles about implementations of the original description of the perceptron by Rosenblatt, as in Python, Haskell, Ada, and Assembler(x86).

The cpp-code is straight forward and only addresses the XOR-case, but is easily adoptable to different layouts and input types. Most of the parts of the code are explained in the article in detail, even if the code differs. Only one issue can be spotted in the listing: there is no valid random seed, so the results will always the same on every run. A simple initial srand (time(NULL)) before calling the init_weight function will provide different weights on every run.

Q: Is the original description of a perceptron really the simplest neural net?

A: No (and Yes).

The XOR example of a perceptron expect two inputs and one output. As always the first (and only) hidden layer is implemented with two neurons/nodes. To adopt the example code to different types of inputs the layout of the hidden layer may be changed to more or even less nodes. Use cases with more input attributes could be something like sensor inputs from pressure and temperature together with a timestamp.

But it is possible to think about usage scenarios with only a single input value. For the simplicity an example can be implemented where the output is the sin of the input. The training kernel will then look like:

for (int n=0; n < EPOCHES; n++) {
  for (int i=0; i<NUMTRAININGSSETS; i++) {
    // Starting training
    hiddenLayer = sigmoid(hiddenLayerBias + (training_inputs[i] * hiddenWeights));
    outputLayer = sigmoid(outputLayerBias + (hiddenLayer * outputWeights));
    // Starting backpropagation
    double deltaOutput = (training_outputs[i] - outputLayer) * dSigmoid(outputLayer);
    double deltaHidden = (deltaOutput * outputWeights) * dSigmoid(hiddenLayer);
    outputLayerBias += deltaOutput * lr;
    outputWeights += hiddenLayer * deltaOutput * lr;
    hiddenLayerBias += deltaHidden * lr;
    hiddenWeights += training_inputs[i] * deltaHidden *lr;

In this example the activation function is a sigmoid. Due to reducing the input dimension all inner loops can be removed and the flow is much more readable.

For a training set with only ten inputs (from 0.1 to 1.0) after 10.000 epochs the prediction varies from 0.1% to 13% accuracy. It is always way better as the expected 50%

If this works, the question is, if a single dimension hidden layer is either a more simple (and simplest) neural network or no neural network at all. This is for sure only a philosophical question as the definition of a generic  network is not provable per se.

A potential use case for such a neural net could be a replacement for lookup tables in embedded devices. LUTs often used to have functions like sine available in systems not providing these functions in hardware. The simple net implementation would only need the four network values (two weights and two bias). But for sure the number of calculations is significant more cost intensive compared to a LUT implementation.

Q: What other implications coming from this proof of concept?

A: Don’t know, but …

As this implementation was part of an effort to come up with a generic solution to parallelise NN-kernel code on accelerator hardware like GPUs and FGPAs, one interesting aspect is the question if one solution could be to see the whole net as stacked pipelines. Both the forward pass and the back propagation then would be the only point where the stacks depends on another. Such architecture could first of all boost the training massive. But – probably more important – could also make the training more predictable and could give more insight into the condition and behaviour of the net while it is trained.

An interesting second idea is: how could be an architecture looks like, which reduces all nodes and layers in a net to black boxes realised with such a simple “net”. This would provide much more degree of freedom for the layout design of a neural net not even ending up in a dynamic layout (like place and route for hardware design) for the net itself.

Finally, a really silly idea would be to use such a “net” as an entropy source, where some or all weights are fed by a chain of similar networks.




Written by qrios

October 30th, 2022 at 1:57 pm

Posted in code,ml

Leave a Reply