Building a Neural Network from Scratch with Python & NumPy
08/27/2024
Summary
A beginner-friendly guide to neural networks: from individual neurons to a complete Python model that predicts house prices using NumPy.
Intro
AI is transforming our world, and neural networks are the building block behind the most recent breakthroughs. In this article I will break down how these systems work, from a single neuron to a complete network, and build a small model in Python from scratch.
A single neuron
If you have looked into AI before, you have probably seen an illustration like the one below. It shows a simple neural network, which we will inspect and implement today in Python. The neural network is the fundamental concept behind modern AI and worth studying in detail. Before we get to the code, let's look at the key ideas.
To understand how a neural network works, let's zoom in on a single node (neuron).
Imagine a motion detector in your garden that turns on the light when it detects motion and when it's dark outside. There is also a dial that lets you set how easily the light turns on. This motion detector is a useful analogy for a single neuron.
In a neural network, the detected motion and the surrounding light are the inputs (x1 and x2). These inputs are connected to the neuron through weights, which determine how much each input influences the result. The dial in our example represents the bias, an additional independent value that adjusts the impact of the inputs on the final result.
To calculate the neuron's output, we multiply each input (x1, x2) by its corresponding weight (w1, w2). We then sum these products and add the bias. This gives us a numerical value that represents the neuron's response. In older neural models, this value indicated whether the neuron "fired" (light on) or stayed silent (light off). Since this approach only allowed for two states (1 or 0), an activation function was later introduced to provide a probability of how strongly the neuron fires, usually a value between 0 and 1. In our case, we use the sigmoid function, a common choice for such networks.
→ formula of the sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$
→ final formula: y = $σ(w1·x1 + w2·x2 + b)$
A neural network
Now that we have seen how a single neuron works, building a network becomes simple: we put multiple neurons together into layers and connect them so that the output of one layer becomes the input of the next.
Our simple neural network consists of three layers:
Input Layer: Receives the initial data (e.g., house size and age). The neurons here are placeholders for normalized numbers that represent the input.
Hidden Layer(s): Processes the information.
Output Layer: Produces the final result (e.g., the house price).
Here is the data flow through the network, illustrated with an example:
Input values enter the network through the input layer (e.g., the normalized house size is 0.8 and the normalized age is 0.4, so one input neuron has the value 0.8 and the other 0.4).
Each hidden neuron receives all input values (e.g., 0.8 and 0.4).
Hidden neurons compute their outputs and pass them to the output layer.
The output layer calculates the final result based on the hidden layer outputs.
We will look at this more closely later and implement it in code. First, let's create a class with the structure of our neural network in Python:
import numpy as np
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
# Initialize random weights and biases
# W1 and b1 sit between the input and hidden layer
self.W1 = np.random.randn(self.input_size, self.hidden_size)
self.b1 = np.ones((1, self.hidden_size))
# W2 and b2 sit between the hidden and output layer
self.W2 = np.random.randn(self.hidden_size, self.output_size)
self.b2 = np.ones((1, self.output_size))
# Create the network
nn = NeuralNetwork(input_size=2, hidden_size=3, output_size=1)
Introduction to training
With the network structure in place, we can move on to training. The following overview describes a single training step. In practice, we train over multiple batches and many rounds. It works as follows:
Initialization: We have a network initialized with random weights and biases.
Forward pass: We take the training data, which consists of input and the correct output. We feed the input into the network and compute the predicted output.
Cost function: We compare the predicted output to the correct output using a cost function, which quantifies the size of the error. Many cost functions exist; for our example we use Mean Squared Error (MSE).
Backward pass: We use the cost to go backwards through the network and apply partial derivatives to determine how each weight and bias needs to be updated.
Update: Finally, we adjust every weight and bias in the right direction. The learning rate controls how large these steps are.
Forward pass
As mentioned in the previous section, the forward pass takes the values from the provided data, feeds them into the network, and calculates the output. An example explains this best.
1. The situation: We have two input parameters: the size of a house (i1) and its age (i2). The goal is to train a neural network that predicts the price of the house (o1) based on these two parameters. For training, we use this small dataset:
2. Initialization: We initialize the network with random weights and biases. In the illustration, I used 1 as the bias for simplicity.
3. Normalization: Next, we normalize the data so that every feature has a comparable influence on the network. I will skip the details here; a quick search will turn up good explanations. For our example, the first input [100, 5] is normalized and rounded to -0.3 (i1) and -0.8 (i2).
4. Calculation: With these inputs and our weights and biases, we can compute an output.
a) We calculate the result for each hidden neuron using the sum from earlier:
b) Based on the hidden layer results, we calculate the output:
o1 = σ(h1·0.5 + h2·0.4 + h3·0.6 + 1)
o1 = σ(0.65·0.5 + 0.61·0.4 + 0.59·0.6 + 1)
o1 = σ(1.923) = 0.87
With the overview of the forward pass in place, let's code it:
# Sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.W1 = np.random.randn(self.input_size, self.hidden_size)
self.b1 = np.ones((1, self.hidden_size))
self.W2 = np.random.randn(self.hidden_size, self.output_size)
self.b2 = np.ones((1, self.output_size))
def forward(self, X):
# Compute the hidden layer output
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = sigmoid(self.z1)
# Compute the output layer (no activation, since this is a regression task)
self.z2 = np.dot(self.a1, self.W2) + self.b2
self.a2 = self.z2
return self.a2
Backward pass
Once we have calculated the output for a given input, it will rarely match the target value on the first try. The backpropagation algorithm helps us improve this. It works backwards through the network, starting from the error, and calculates how much each neuron contributed to it. With this information, we can adjust weights and biases step by step to reduce the error.
Backpropagation uses partial derivatives to determine how each input affects the final error. The idea is the same as analyzing the impact of inputs x₁ and x₂ on the error in a single neuron. The algorithm follows a few steps: a forward pass computes the output, the error is computed by comparing the output to the expected result, the error is propagated backwards through the network, gradients are calculated along the way, and the weights and biases are updated to reduce the error.
The mathematical details of backpropagation are out of scope for this article. If you want a deeper understanding, 3Blue1Brown's YouTube video on the topic is a good starting point. To give you a practical sense of how it works, here is a simplified Python implementation of the backward pass:
# Derivative of the sigmoid function
def sigmoid_derivative(x):
return x * (1 - x)
class NeuralNetwork:
# ... (previous code stays the same) ...
def backward(self, X, y, output, learning_rate):
m = X.shape[0] # number of training samples
# Error of the output layer
self.error = y - output
self.delta_output = self.error
# Gradients for W2 and b2
self.W2_grad = np.dot(self.a1.T, self.delta_output) / m
self.b2_grad = np.sum(self.delta_output, axis=0, keepdims=True) / m
# Error of the hidden layer
self.error_hidden = np.dot(self.delta_output, self.W2.T)
self.delta_hidden = self.error_hidden * sigmoid_derivative(self.a1)
# Gradients for W1 and b1
self.W1_grad = np.dot(X.T, self.delta_hidden) / m
self.b1_grad = np.sum(self.delta_hidden, axis=0, keepdims=True) / m
# Update weights and biases via gradient descent
self.W2 += learning_rate * self.W2_grad
self.b2 += learning_rate * self.b2_grad
self.W1 += learning_rate * self.W1_grad
self.b1 += learning_rate * self.b1_grad
Training and testing
After defining the network architecture, we train and test it on a dataset of house prices. The process involves several steps:
Initializing the network: We create a NeuralNetwork object with 2 input neurons, 4 hidden neurons, and 1 output neuron.
Dataset: We define our dataset, where X represents the input features (size and age) and y represents the target values (house prices in thousand euros).
Normalization: We normalize the data so that all features are on a similar scale, which helps the network learn more effectively.
Training: The training runs for 2,000 epochs with a learning rate of 0.05. In each epoch, we perform a forward pass followed by a backward pass to update the weights and biases. We compute the mean squared error (MSE) as our loss function and print it every 100 epochs to monitor progress.
Prediction function: We define a predict function that takes a house's size and age, normalizes them with the same parameters as the training data, runs them through the network, and denormalizes the output to get the predicted price in thousand euros.
Testing: We test the trained network by predicting prices for a few hypothetical houses.
The example shows the full workflow of training a small neural network for a regression task and puts the backpropagation algorithm from earlier into practice.
The chart below shows how the model improved at predicting house prices during training. In the beginning, the loss is high. As the network sees more epochs, the loss drops quickly and then flattens out, a typical learning curve.
The model picked up the relationship between size, age, and price well enough to produce reasonable predictions on the training data. The low final loss is a good sign, though a proper evaluation would require a separate test set with more samples. There is also a risk that the model partly memorized the training data, so a more robust setup would include more data and techniques like a train/validation split or regularization.
Conclusion
This article introduced the fundamentals of neural networks and walked through a small implementation in Python with NumPy. Once you build the mechanics yourself (weights, biases, the forward pass, backpropagation), they become much less abstract. From here, good next steps are trying different activation functions, adding more hidden layers, or moving to a framework like PyTorch.