
The neural network model
A neural network model is similar to the preceding logistic regression model. The only difference is the addition of hidden layers between the input and output layers. Let's consider a single hidden layer neural network for classification to understand the process as shown in the following diagram:

Here, Layer 0 is the input layer, Layer 1 is the hidden layer, and Layer 2 is the output layer. This is also known as two layered neural networks, owing to the fact that when we count the number of layers in a neural network, we don't consider input layer as the first layer. Thus, input layer is considered as Layer 0 and then successive layers get the notation of Layer 1, Layer 2, and so on.
Now, a basic question which comes to mind: why the layers between the input and output layer termed as hidden layers ?
This is because the values of the nodes in the hidden layers are not present in the training set. As we have seen, at every node two calculations happen. These are:
Aggregation of the input signals from previous layers
Subjecting the aggregated signal to an activation to create deeper inner representations, which in turn are the values of the corresponding hidden nodes
Referring to the preceding diagram, we have three input features, , and
. The node showing value 1 is regarded as the bias unit. Each layer, except the output, generally has a bias unit. Bias units can be regarded as an intercept term and play an important role in shifting the activation function left or right. Remember, the number of hidden layers and nodes in them are hyperparameters that we define at the start. Here, we have defined the number of hidden layers to be one and the number of hidden nodes to be three,
, and
. Thus, we can say we have three input units, three hidden units, and three output units (
, and
, since we have out of three classes to predict). This will give us the shape of weights and biases associated with the layers. For example, Layer 0 has 3 units and Layer 1 has 3. The shape of the weight matrix and bias vector associated with Layer i is given by:




Therefore, the shapes of :
will be
and
will be
will be
and
will be
Now, let's understand the following notation:
: Here, it refers to the value of weight connecting node a in Layer i to node d in Layer i+1
: Here, it refers to the value of the bias connecting the bias unit node in Layer i to node d in Layer i+1
Therefore, the nodes in the hidden layers can be calculated in the following way:

Where, the f function refers to the activation function. Remember the logistic regression where we used sigmoid and softmax a the activation function for binary and multi-class logistic regression respectively.
Similarly, we can calculate the output unit, as so:
This brings us to an end of the forward propagation process. Our next task is to train the neural network (that is, train the weights and biases parameters) through backpropagation.
Let the actual output classes be and
.
Recalling the cost function section in linear regression, we used cross entropy to formulate our cost function. Since, the cost function is defined by,

where, C = 3, and m = number of examples
Since this is a classification problem, for each example the output will have only one output class as 1 and the rest would be zero. For example, for i, it would be:

Thus, cost function
Now, our goal is to minimize the cost function with regards to
and
. In order to train our given neural network, first randomly initialize
and
. Then we will try to optimize
through gradient descent where we will update
and
accordingly at the learning rate,
, in the following manner:
After setting up this structure, we have to perform these optimization steps (of updating and
) repeatedly for numerous iterations to train our neural network.
This brings us to the end of the basic of neural networks, which forms the basic building block of any neural network, shallow or deep. Our next frontier will be to understand some of the famous deep neural network architectures, such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Apart from that, we will also have a look at the benchmarked deep neural network architectures such as AlexNet, VGG-net, and Inception.