In this blog post, I will discuss how to use loss functions in PyTorch. I will cover how loss functions work in both regression and classification tasks, how to work with numpy arrays, the expected shape and type of loss functions in PyTorch, and demonstrate some types of losses.
import numpy as np
import torch
from torch import nn
Table of Contents
Working with PyTorch
I wanted to start with a few notes on loss functions in PyTorch. We’ll talk about things that people switching from TensorFlow should be aware of.
Requires_grad
It’s important to understand the role of requires_grad
in loss functions and gradient computation. Our goal is to find how the loss changes with each parameter, i.e., \(\frac{\partial loss}{\partial x}\) for each parameter x.
requires_grad
is a flag that can be set on a tensor to indicate that gradients need to be computed for this tensor during the backward pass. When you create a tensor with requires_grad=True
, PyTorch will track all the operations performed on that tensor and store the gradients when the backward pass is called.
Here’s an example demonstrating the use of requires_grad:
y_pred = torch.randn(3, 5, requires_grad=True)
y_true = torch.randn(3, 5)
mae_loss = nn.L1Loss()
output = mae_loss(y_pred, y_true)
print(output)
# Compute gradients
output.backward()
print("Gradients:", y_pred.grad)
tensor(1.6688, grad_fn=<L1LossBackward>)
Gradients: tensor([[ 0.0667, -0.0667, -0.0667, 0.0667, -0.0667],
[-0.0667, -0.0667, -0.0667, -0.0667, -0.0667],
[-0.0667, 0.0667, -0.0667, -0.0667, 0.0667]])
In this example, we create an input tensor y_pred
with requires_grad=True
and a target tensor y_true
. We compute the L1 loss between the two tensors and call the backward()
method on the output tensor to compute gradients for y_pred
.
In this notebook, I’ll be focused on calculating the loss values and will skip the requires_grad
flag and the backward()
method. But if we were training a model, we would require gradients for optimization and include the requires_grad=True
flag.
Numpy Arrays
PyTorch primarily works with tensors, but it provides easy interoperability with numpy arrays. You can convert a numpy array to a PyTorch tensor using torch.from_numpy()
and convert a tensor back to a numpy array using the .numpy()
method. It is important to note that PyTorch expects input tensors to be of type float and target tensors to be of type long for classification tasks.
This means that you can’t directly put numpy arrays in a loss function. PyTorch losses rely on being able to call a .size()
method, which doesn’t exist for numpy arrays.
y_pred_np = np.random.randn(3, 5)
y_true_np = np.random.randn(3, 5)
# Using PyTorch loss function directly with numpy arrays (will raise an error)
loss = nn.L1Loss()
try:
output = loss(y_pred_np, y_true_np)
except TypeError:
print("TypeError: PyTorch loss functions expect tensors, not numpy arrays")
# Converting numpy arrays to PyTorch tensors
y_pred = torch.from_numpy(y_pred_np)
y_true = torch.from_numpy(y_true_np)
# Now, we can calculate the loss
output = loss(y_pred, y_true)
print("L1 Loss:", output.item())
TypeError: PyTorch loss functions expect tensors, not numpy arrays
L1 Loss: 1.0592096183606834
OK, let’s get to the loss functions.
Loss Functions
Inputs
Your inputs are going to look different based on the task. For a regression task, you’ll generally have y_pred
and y_true
tensors that are the same size. But for image classification tasks, you’ll have a prediction probability associated with each class, so your y_pred
tensor will have an extra dimension of size N
where N
is the number of possible classes.
Regression
For regression, the most commons losses are L1 and L2. Let’s take a look at them.
L1 and L2 Loss
input_tensor = torch.tensor([1, 2.5, 4, 0.5])
target_tensor = torch.tensor([2, 2.5, 2, 1])
l1_loss = nn.L1Loss()
l1_output = l1_loss(input_tensor, target_tensor)
print("L1 (Mean Absolute Error) Loss:", l1_output.item())
l2_loss = nn.MSELoss()
l2_output = l2_loss(input_tensor, target_tensor)
print("L2 (Mean Squared Error) Loss:", l2_output.item())
L1 (Mean Absolute Error) Loss: 0.875
L2 (Mean Squared Error) Loss: 1.3125
Image Classification
Let’s say we’re predicting a batch of four items that are each one of three classes (0-2)
y_true = torch.IntTensor(np.array([1,2,0,1]))
y_true
tensor([1, 2, 0, 1], dtype=torch.int32)
y_true.shape
torch.Size([4])
But your predictions will have an extra dimension in them. So they might look like
y_pred = torch.FloatTensor(np.array([[0.1, 0.5,0.4,],
[0.1, 0.2, 0.7],
[0.3, 0.25, 0.55],
[0.3, 0.4, 0.1]]))
y_pred
tensor([[0.1000, 0.5000, 0.4000],
[0.1000, 0.2000, 0.7000],
[0.3000, 0.2500, 0.5500],
[0.3000, 0.4000, 0.1000]])
To get the actual value predicted, we have to take the argmax
y_pred_labels = torch.argmax(y_pred, dim=1)
y_pred_labels
tensor([1, 2, 2, 1])
Negative Log Loss
NLL is used in multiclass classification problems. It uses a softmax
input_tensor = torch.randn(5, 3)
target_tensor = torch.tensor([0, 1, 2, 1, 0], dtype=torch.long)
# Softmax to convert input tensor to probabilities
softmax = nn.Softmax(dim=1)
input_prob = softmax(input_tensor)
# Negative Log Loss (Negative Log Likelihood)
nll_loss = nn.NLLLoss()
nll_output = nll_loss(torch.log(input_prob), target_tensor)
print("Negative Log Loss:", nll_output.item())
Negative Log Loss: 0.952965259552002
Note that you can also use LogSoftmax
.
y_pred = torch.tensor([[0.5153, 0.7051, 0.4947, 0.3446, 0.5288],
[0.3464, 0.2458, 0.8569, 0.4821, 0.3244],
[0.4474, 0.6615, 0.0062, 0.6603, 0.2461]])
y_true = torch.tensor([1, 0, 4])
softmax = nn.LogSoftmax(dim=1)
nll_loss(softmax(y_pred), y_true)
tensor(1.6554)
Note that it’s important that we remember to do a softmax first. If we don’t, we won’t get an error, we’ll just get the wrong answer.
nll_loss(y_pred, y_true)
tensor(-0.4325)