In this blog post, I will discuss how to use loss functions in PyTorch. I will cover how loss functions work in both regression and classification tasks, how to work with numpy arrays, the expected shape and type of loss functions in PyTorch, and demonstrate some types of losses.

```
import numpy as np
import torch
from torch import nn
```

**Table of Contents**

# Working with PyTorch

I wanted to start with a few notes on loss functions in PyTorch. We’ll talk about things that people switching from TensorFlow should be aware of.

## Requires_grad

It’s important to understand the role of `requires_grad`

in loss functions and gradient computation. Our goal is to find how the loss changes with each parameter, i.e., \(\frac{\partial loss}{\partial x}\) for each parameter x.

`requires_grad`

is a flag that can be set on a tensor to indicate that gradients need to be computed for this tensor during the backward pass. When you create a tensor with `requires_grad=True`

, PyTorch will track all the operations performed on that tensor and store the gradients when the backward pass is called.

Here’s an example demonstrating the use of requires_grad:

```
y_pred = torch.randn(3, 5, requires_grad=True)
y_true = torch.randn(3, 5)
mae_loss = nn.L1Loss()
output = mae_loss(y_pred, y_true)
print(output)
# Compute gradients
output.backward()
print("Gradients:", y_pred.grad)
```

```
tensor(1.6688, grad_fn=<L1LossBackward>)
Gradients: tensor([[ 0.0667, -0.0667, -0.0667, 0.0667, -0.0667],
[-0.0667, -0.0667, -0.0667, -0.0667, -0.0667],
[-0.0667, 0.0667, -0.0667, -0.0667, 0.0667]])
```

In this example, we create an input tensor `y_pred`

with `requires_grad=True`

and a target tensor `y_true`

. We compute the L1 loss between the two tensors and call the `backward()`

method on the output tensor to compute gradients for `y_pred`

.

In this notebook, I’ll be focused on calculating the loss values and will skip the `requires_grad`

flag and the `backward()`

method. But if we were training a model, we would require gradients for optimization and include the `requires_grad=True`

flag.

## Numpy Arrays

PyTorch primarily works with tensors, but it provides easy interoperability with numpy arrays. You can convert a numpy array to a PyTorch tensor using `torch.from_numpy()`

and convert a tensor back to a numpy array using the `.numpy()`

method. It is important to note that PyTorch expects input tensors to be of type float and target tensors to be of type long for classification tasks.

This means that you can’t directly put numpy arrays in a loss function. PyTorch losses rely on being able to call a `.size()`

method, which doesn’t exist for numpy arrays.

```
y_pred_np = np.random.randn(3, 5)
y_true_np = np.random.randn(3, 5)
# Using PyTorch loss function directly with numpy arrays (will raise an error)
loss = nn.L1Loss()
try:
output = loss(y_pred_np, y_true_np)
except TypeError:
print("TypeError: PyTorch loss functions expect tensors, not numpy arrays")
# Converting numpy arrays to PyTorch tensors
y_pred = torch.from_numpy(y_pred_np)
y_true = torch.from_numpy(y_true_np)
# Now, we can calculate the loss
output = loss(y_pred, y_true)
print("L1 Loss:", output.item())
```

```
TypeError: PyTorch loss functions expect tensors, not numpy arrays
L1 Loss: 1.0592096183606834
```

OK, let’s get to the loss functions.

# Loss Functions

## Inputs

Your inputs are going to look different based on the task. For a regression task, you’ll generally have `y_pred`

and `y_true`

tensors that are the same size. But for image classification tasks, you’ll have a prediction probability associated with each class, so your `y_pred`

tensor will have an extra dimension of size `N`

where `N`

is the number of possible classes.

### Regression

For regression, the most commons losses are L1 and L2. Let’s take a look at them.

##### L1 and L2 Loss

```
input_tensor = torch.tensor([1, 2.5, 4, 0.5])
target_tensor = torch.tensor([2, 2.5, 2, 1])
l1_loss = nn.L1Loss()
l1_output = l1_loss(input_tensor, target_tensor)
print("L1 (Mean Absolute Error) Loss:", l1_output.item())
l2_loss = nn.MSELoss()
l2_output = l2_loss(input_tensor, target_tensor)
print("L2 (Mean Squared Error) Loss:", l2_output.item())
```

```
L1 (Mean Absolute Error) Loss: 0.875
L2 (Mean Squared Error) Loss: 1.3125
```

### Image Classification

Let’s say we’re predicting a batch of four items that are each one of three classes (0-2)

```
y_true = torch.IntTensor(np.array([1,2,0,1]))
```

```
y_true
```

```
tensor([1, 2, 0, 1], dtype=torch.int32)
```

```
y_true.shape
```

```
torch.Size([4])
```

But your predictions will have an extra dimension in them. So they might look like

```
y_pred = torch.FloatTensor(np.array([[0.1, 0.5,0.4,],
[0.1, 0.2, 0.7],
[0.3, 0.25, 0.55],
[0.3, 0.4, 0.1]]))
```

```
y_pred
```

```
tensor([[0.1000, 0.5000, 0.4000],
[0.1000, 0.2000, 0.7000],
[0.3000, 0.2500, 0.5500],
[0.3000, 0.4000, 0.1000]])
```

To get the actual value predicted, we have to take the argmax

```
y_pred_labels = torch.argmax(y_pred, dim=1)
```

```
y_pred_labels
```

```
tensor([1, 2, 2, 1])
```

###### Negative Log Loss

NLL is used in multiclass classification problems. It uses a softmax

```
input_tensor = torch.randn(5, 3)
target_tensor = torch.tensor([0, 1, 2, 1, 0], dtype=torch.long)
# Softmax to convert input tensor to probabilities
softmax = nn.Softmax(dim=1)
input_prob = softmax(input_tensor)
# Negative Log Loss (Negative Log Likelihood)
nll_loss = nn.NLLLoss()
nll_output = nll_loss(torch.log(input_prob), target_tensor)
print("Negative Log Loss:", nll_output.item())
```

```
Negative Log Loss: 0.952965259552002
```

Note that you can also use `LogSoftmax`

.

```
y_pred = torch.tensor([[0.5153, 0.7051, 0.4947, 0.3446, 0.5288],
[0.3464, 0.2458, 0.8569, 0.4821, 0.3244],
[0.4474, 0.6615, 0.0062, 0.6603, 0.2461]])
```

```
y_true = torch.tensor([1, 0, 4])
```

```
softmax = nn.LogSoftmax(dim=1)
```

```
nll_loss(softmax(y_pred), y_true)
```

```
tensor(1.6554)
```

Note that it’s important that we remember to do a softmax first. If we don’t, we won’t get an error, we’ll just get the wrong answer.

```
nll_loss(y_pred, y_true)
```

```
tensor(-0.4325)
```