import torch
from torch import nn
import numpy as np

How does one divide up loss functions? They could be divided by regression and classification loss functions.

Table of Contents

Notes

I wanted to start with a few notes on loss functions in PyTorch.

Requires_grad

Loss functions exist to help compute the gradients for trainable parameters. We want to find how the loss changes with each parameter, that is, \(\frac{\partial loss}{\partial x}\) for each parameter $ x $.

So what you will see is this

y_pred = torch.randn(3, 5, requires_grad=True)
y_true = torch.randn(3, 5)

mae_loss = nn.L1Loss()
output = mae_loss(y_pred, y_true)
print(output)
output.backward()
tensor(1.1468, grad_fn=<L1LossBackward>)

For our purposes, we’re just going to be stopping at creating the value, so we can skip requires_grad for some of the demo.

Numpy

Note that you can’t directly put numpy arrays in a loss function. PyTorch losses rely on being able to call a .size() method, which doesn’t exist for numpy arrays.

np.random.randn(3,5)
array([[-0.12188827,  0.51116533, -0.23330246, -0.26176837,  0.35833209],
       [ 0.17633687,  0.10065291, -0.44017884, -1.42619676, -0.95035462],
       [-2.09363552,  0.85312672, -1.69011215,  0.03842335,  0.86324746]])
y_pred = np.random.randn(3,5)
y_true = np.random.randn(3,5)
loss = nn.L1Loss()
try:
    output = loss(y_pred, y_true)
except TypeError:
    pass

If you have a numpy array, you’ll need to convert it to a numpy tensor first.

y_pred_np = np.array([0.1, 0.3, 0.5, 0.9])
y_pred_np
array([0.1, 0.3, 0.5, 0.9])
y_true_np = np.array([0, 1, 1, 1])
y_pred = torch.from_numpy(y_pred_np)
y_pred
tensor([0.1000, 0.3000, 0.5000, 0.9000], dtype=torch.float64)
y_true = torch.from_numpy(y_true_np)
y_true
tensor([0, 1, 1, 1], dtype=torch.int32)
loss(y_pred, y_true)
tensor(0.3500, dtype=torch.float64)

OK, let’s get to the loss functions.

Loss Functions

Inputs

Your inputs are going to look different based on the task. For a regression task, you’ll generally have y_pred and y_true tensors that are the same size. But for image classification tasks, you’ll have a prediction probability associated with each class, so your y_pred tensor will have an extra dimension of size N where N is the number of possible classes.

Regression

For regression, the most commons losses are L1 and L2. Let’s take a look at them.

L1
y_pred = torch.tensor([1, 2.5, 4, 0.5])
y_pred
tensor([1.0000, 2.5000, 4.0000, 0.5000])
y_true = torch.tensor([2, 2.5, 2, 1])
y_true
tensor([2.0000, 2.5000, 2.0000, 1.0000])
mae_loss = nn.L1Loss()
mae_loss(y_pred, y_true)
tensor(0.8750)
L2

The same can be done with L2 loss.

mse_loss = nn.MSELoss()
mse_loss(y_pred, y_true)
tensor(1.3125)

Image Classification

Let’s say we’re predicting a batch of four items that are each one of three classes (0-2)

y_true = torch.IntTensor(np.array([1,2,0,1]))
y_true
tensor([1, 2, 0, 1], dtype=torch.int32)
y_true.shape
torch.Size([4])

But your predictions will have an extra dimension in them. So they might look like

y_pred = torch.FloatTensor(np.array([[0.1, 0.5,0.4,],
                                    [0.1, 0.2, 0.7],
                                    [0.3, 0.25, 0.55],
                                    [0.3, 0.4, 0.1]]))
y_pred
tensor([[0.1000, 0.5000, 0.4000],
        [0.1000, 0.2000, 0.7000],
        [0.3000, 0.2500, 0.5500],
        [0.3000, 0.4000, 0.1000]])

To get the actual value predicted, we have to take the argmax

y_pred_labels = torch.argmax(y_pred, dim=1)
y_pred_labels
tensor([1, 2, 2, 1])
Negative Log Loss

NLL is used in multiclass classification problems. It uses a softmax

nll = nn.NLLLoss()
y_pred = torch.tensor([[0.5153, 0.7051, 0.4947, 0.3446,  0.5288],
        [0.3464, 0.2458, 0.8569, 0.4821, 0.3244],
        [0.4474, 0.6615,  0.0062,  0.6603, 0.2461]])
y_true = torch.tensor([1, 0, 4])
softmax = nn.LogSoftmax(dim=1)
nll(softmax(y_pred), y_true)
tensor(1.6554)

Note that it’s important that we remember to do a softmax first. If we don’t, we won’t get an error, we’ll just get the wrong answer.

nll(y_pred, y_true)
tensor(-0.4325)