I would never have gone deep down into how neural network works, thanks to my research advisor, from the first meeting she encouraged me to know the workings of it and its practical implementation. I was amazed by what I heard about neural networks and how they actually work and so was my friend, to whom I explained this concept. Fairly simple concept but there’s so much distraction. And what’s interesting about this blog is that it has code and visualization so that you can visualize what neural networks does when doing forward pass.
You might be already familiar with the following cool-looking image of a neural network.
and also with this image of a single neuron called perceptron.
These diagrams/images of neural networks and the single neuron is easy to look at but hard to understand, what under the hood it’s doing.
This is the core concept and understanding this concept will make a strong base for further deep learning concepts 🔥.
So, What do neural networks do?
Well, it’s very simple, it fits squiggle to data and you will get to know how at the end of this blog. It can fit squiggle to as simple as a possible dataset as well as any real-world complex dataset.
Let’s see by example dataset and how neural networks work on these data: we have a drug that was designed to treat an illness.
we gave a drug to 3 different groups of people with 3 different types of dosages low, medium, and high.
Consider 0 for low efficiency and 1 for high efficiency. Low dosages and high dosages are not efficient and medium dosages are efficient.
Now based on our collected dataset, we want to predict whether or not future dosages are effective or not?
Neural Networks
Neural networks consist of nodes and connections between nodes.
Parameters: Numbers on connections represent parameter values called weights and biases, that were estimated when the neural network was trained on the dataset.
Backpropagation: Initially, the neural network’s parameter values are unknown/not optimized for the data. However, these parameter values are optimized using a method called backpropagation to fit squiggle to the data.
NOTE: For this blog assume the parameters are already optimized for the neural networks to this specific dataset. Meaning we already applied backpropagation.
Activation Functions
As you can see the two curved graphs. These curved or bent lines are building blocks of fitting squiggle to data. These identical curves can be reshaped by parameter values and added together to get a squiggle that fits the data. This curved line or bent line is called the Activation function.
There are many common activation functions that we can choose for neural networks. Some of them are ReLU, Softplus, Tanh, and Sigmoid. You have to decide which activation function or functions you want to use while building a neural network.
In this blog I have used SoftPlus activation function.
Figure 5 is the simple neural network that has single input node, single output node, and only two nodes between input and output nodes.
In more complex neural networks there can be more than one input nodes, more than one output nodes, and different layers of nodes between input and output nodes called hidden layers, you can see on figure 1 there’s two hidden layers in the network.
NOTE: When building a neural network you have to decide how many hidden layers you want and how many nodes are inside each layer
Make a guess and see how well it performs add more if needed.
Working of Neural Network
So now let’s feed the input to the neural network shown in figure 5 and see which operations does what and working of activation function and at the end what we get.
For this example consider 0 for low dosage and 1 for high dosage. And will be using same parameter values shown in the figure 5 that is already optimized for this data. It is encouraged that you look at figure 5 and figure of activation function along with the code.
inputs = [0.0, 0.1, 0.4, 0.5, 0.9, 1.0] # x-axis values
outputs = [0.0,0.0,1.0,1.0,0.0,0.0] # y-axis values
Top Node
Starting with the top node, first take 1st input and plug it into the equation that is multiply input with weight w1(-34.4) and then add output with bias b1(2.14). And the final output we get is the x-axis value of the activation function.
# Equation (input * w1)+b1 = x-axis value for top activation function
(0.0*-34.4)+2.14
# Output
2.14
By applying this equation to all the inputs we get following x-axis values of a top activation function.
# x-axis values of activation function or ipnuts to the activation function.
2.14, -1.2999999999999998, -11.62, -15.059999999999999, -28.82, -32.26
Here I have used SoftPlus activation function.
And its equation is f(x) = log(1 + e^x).
Now feeding this values to the SoftPlus activation function will give y-axis coordinate as the output of given x-axis coordinates.
# eg for 1st x-axis cordinate
log(1+e**2.14)
# output
2.2512325998949305 # corresponding y-axis cordinate as output of activation function
By applying SoftPlus activation function to all the x-axis input values that we generated, we get following y-axis output values.
[2.2512325998949305,
0.24100845383299216,
8.984546677413703e-06,
2.8808791465998513e-07,
3.044231533521716e-13,
9.76996261670133e-15]
And plotting this output values of activation function will look like following image.
And now we will scale these output values of activation function by multiplying each values with weight w3(-1.30) like following and we will get scaled values
[i*-1.30 for i in y_af]
# output
[-2.92660237986341,
-0.3133109899828898,
-1.1679910680637815e-05,
-3.745142890579807e-07,
-3.9575009935782306e-13,
-1.270095140171173e-14]
And by plotting these values we get the orange squiggle.
Bottom Node
Now lets feed the inputs to the bottom node. Now take 1st input and plug it into the bottom node of the neural network shown in figure 5. Which is to multiply input with weight w2(-2.52) and add its output with bias b2(1.29) to get x-axis value of bottom activation function.
The equation is (input * w2) + b2 = x-axis value for the bottom activation function.
# Applying to 1st input value
(0.0*-2.52)+1.29
# Output
1.29
By applying this equation to all the inputs we get following x-axis values of a bottom activation function.
# x-axis values of activation function or ipnuts to the activation function.
[1.29, 1.038, 0.28200000000000003, 0.030000000000000027, -0.9780000000000002, -1.23]
Now feeding this values to the SoftPlus activation function will give y-axis coordinate as the output of given x-axis coordinates of bottom node.
# eg for 1st x-axis cordinate
log(1+e**1.29)
# output
1.533158534955108 # corresponding y-axis cordinate as output of activation function
Same way by applying this SoftPlus activation function to all the x-axis values and we get following y-axis values as output.
[1.533158534955108, 1.3411830334097958, 0.8440549162893263, 0.7082596763414484, 0.31922613976831005, 0.25641783303708743]
And plotting this output values of activation function will look like green squiggle in following image.
And now we will scale these output values of activation function by multiplying each values with weight w4(2.28) like following and we will get scaled values.
scaled_bottom = [i*2.28 for i in y_bottom]
[3.495601459697646,
3.0578973161743344,
1.9244452091396638,
1.6148320620585022,
0.7278355986717469,
0.5846326593245593]
And by plotting these scaled values we get the red squiggle.
Now we have two operations left to get final output.
Now do sum of scaled values of top and bottom nodes meaning summation of orange and red squiggle.
sum_two = [scaled_top[i] + scaled_bottom[i] for i in range(len(scaled_top))]
# output
[0.5689990798342359,
2.7445863261914445,
1.9244335292289831,
1.614831687544213,
0.7278355986713511,
0.5846326593245467]
and by plotting its output values we get purple squiggle shown in below image.
But the purple squiggle is slightly up and not at its original position so by adding bias b3(-0.58) to the purple squiggle will result in our final squiggle that fits to the data🔥.
final = [sum_two[i]+(-0.58) for i in range(len(sum_two))]
# output
[-0.011000920165764039,
2.1645863261914444,
1.3444335292289833,
1.0348316875442132,
0.1478355986713511,
0.004632659324546706]
After adding bias b3 to the sum you will get final blue squiggle 🔥shown in below figure.
Now lets feed the test data to the neural network and see what value we get on squiggle. Giving input as 0.5 which is medium dosage and doing all the operations that we did above we finally get 1.0348316875442132 as output.
And by plotting this output value on the final squiggle, we can see its close to 1, meaning its effective 🔥.
Conclusion
- The weights and biases aka parameters values are initially unknown and is optimized using backpropagation.
- These parameters slice, flip and stretch into new shape of activation function.
- By combining these intermediate squiggle we get final new squiggle.
- Neural networks are big fancy squiggle fitting machines that can fit squiggle to any data.
So this is how neural network works and fits squiggle to the data and predicts the output.
Reference
♥ Thanks to Josh Starmer, for making this concept really simple and interesting https://youtu.be/CqOfi41LfDw?si=AtCnAADIY2AwgrSz