```
import torch
from torch import nn
import numpy as np
```

How does one divide up loss functions? They could be divided by regression and classification loss functions.

**Table of Contents**

# Notes

I wanted to start with a few notes on loss functions in PyTorch.

## Requires_grad

Loss functions exist to help compute the gradients for trainable parameters. We want to find how the loss changes with each parameter, that is, \(\frac{\partial loss}{\partial x}\) for each parameter $ x $.

So what you will see is this

```
y_pred = torch.randn(3, 5, requires_grad=True)
y_true = torch.randn(3, 5)
mae_loss = nn.L1Loss()
output = mae_loss(y_pred, y_true)
print(output)
output.backward()
```

```
tensor(1.1468, grad_fn=<L1LossBackward>)
```

For our purposes, we’re just going to be stopping at creating the value, so we can skip `requires_grad`

for some of the demo.

### Numpy

Note that you can’t directly put numpy arrays in a loss function. PyTorch losses rely on being able to call a `.size()`

method, which doesn’t exist for numpy arrays.

```
np.random.randn(3,5)
```

```
array([[-0.12188827, 0.51116533, -0.23330246, -0.26176837, 0.35833209],
[ 0.17633687, 0.10065291, -0.44017884, -1.42619676, -0.95035462],
[-2.09363552, 0.85312672, -1.69011215, 0.03842335, 0.86324746]])
```

```
y_pred = np.random.randn(3,5)
y_true = np.random.randn(3,5)
```

```
loss = nn.L1Loss()
```

```
try:
output = loss(y_pred, y_true)
except TypeError:
pass
```

If you have a numpy array, you’ll need to convert it to a numpy tensor first.

```
y_pred_np = np.array([0.1, 0.3, 0.5, 0.9])
y_pred_np
```

```
array([0.1, 0.3, 0.5, 0.9])
```

```
y_true_np = np.array([0, 1, 1, 1])
```

```
y_pred = torch.from_numpy(y_pred_np)
y_pred
```

```
tensor([0.1000, 0.3000, 0.5000, 0.9000], dtype=torch.float64)
```

```
y_true = torch.from_numpy(y_true_np)
y_true
```

```
tensor([0, 1, 1, 1], dtype=torch.int32)
```

```
loss(y_pred, y_true)
```

```
tensor(0.3500, dtype=torch.float64)
```

OK, let’s get to the loss functions.

# Loss Functions

## Inputs

Your inputs are going to look different based on the task. For a regression task, you’ll generally have `y_pred`

and `y_true`

tensors that are the same size. But for image classification tasks, you’ll have a prediction probability associated with each class, so your `y_pred`

tensor will have an extra dimension of size `N`

where `N`

is the number of possible classes.

### Regression

For regression, the most commons losses are L1 and L2. Let’s take a look at them.

##### L1

```
y_pred = torch.tensor([1, 2.5, 4, 0.5])
```

```
y_pred
```

```
tensor([1.0000, 2.5000, 4.0000, 0.5000])
```

```
y_true = torch.tensor([2, 2.5, 2, 1])
```

```
y_true
```

```
tensor([2.0000, 2.5000, 2.0000, 1.0000])
```

```
mae_loss = nn.L1Loss()
```

```
mae_loss(y_pred, y_true)
```

```
tensor(0.8750)
```

##### L2

The same can be done with L2 loss.

```
mse_loss = nn.MSELoss()
```

```
mse_loss(y_pred, y_true)
```

```
tensor(1.3125)
```

#### Image Classification

Let’s say we’re predicting a batch of four items that are each one of three classes (0-2)

```
y_true = torch.IntTensor(np.array([1,2,0,1]))
```

```
y_true
```

```
tensor([1, 2, 0, 1], dtype=torch.int32)
```

```
y_true.shape
```

```
torch.Size([4])
```

But your predictions will have an extra dimension in them. So they might look like

```
y_pred = torch.FloatTensor(np.array([[0.1, 0.5,0.4,],
[0.1, 0.2, 0.7],
[0.3, 0.25, 0.55],
[0.3, 0.4, 0.1]]))
```

```
y_pred
```

```
tensor([[0.1000, 0.5000, 0.4000],
[0.1000, 0.2000, 0.7000],
[0.3000, 0.2500, 0.5500],
[0.3000, 0.4000, 0.1000]])
```

To get the actual value predicted, we have to take the argmax

```
y_pred_labels = torch.argmax(y_pred, dim=1)
```

```
y_pred_labels
```

```
tensor([1, 2, 2, 1])
```

###### Negative Log Loss

NLL is used in multiclass classification problems. It uses a softmax

```
nll = nn.NLLLoss()
```

```
y_pred = torch.tensor([[0.5153, 0.7051, 0.4947, 0.3446, 0.5288],
[0.3464, 0.2458, 0.8569, 0.4821, 0.3244],
[0.4474, 0.6615, 0.0062, 0.6603, 0.2461]])
```

```
y_true = torch.tensor([1, 0, 4])
```

```
softmax = nn.LogSoftmax(dim=1)
```

```
nll(softmax(y_pred), y_true)
```

```
tensor(1.6554)
```

Note that it’s important that we remember to do a softmax first. If we don’t, we won’t get an error, we’ll just get the wrong answer.

```
nll(y_pred, y_true)
```

```
tensor(-0.4325)
```

```
```