DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on

Loss functions in PyTorch

Buy Me a Coffee

*Memos:

  • My post explains layers in PyTorch.
  • My post explains activation functions in PyTorch.
  • My post explains optimizers in PyTorch.

A loss function is the function which can get the mean(average) of the sum of the losses(differences) between a model's predictions and true values(train or test data) to optimize a model during training or to evaluate how good a model is during testing. *Loss function is also called Cost Function or Error Function.

There are popular loss functions as shown below:

(1) L1 Loss:

  • can compute the mean(average) of the sum of the absolute losses(differences) between a model's predictions and true values(train and test data).
  • 's formula: Image description
  • is used for a regression model.
  • is also called Mean Absolute Error(MAE).
  • is L1Loss() in PyTorch.
  • 's pros:
    • It's less sensitive to outliers and anomalies.
    • The losses can be easily compared because they are just made absolute so the range of them is not big.
  • 's cons:

(2) L2 Loss:

  • can compute the mean(average) of the sum of the squared losses(differences) between a model's predictions and true values(train and test data).
  • 's formula: Image description
  • is used for a regression model.
  • is also called Mean Squared Error(MSE).
  • is MSELoss() in PyTorch
  • 's pros:
    • All squared losses can be differentiable.
  • 's cons:
    • It's sensitive to outliers and anomalies.
    • The losses cannot be easily compared because they are squared so the range of them is big.

(3) Huber Loss:

  • can do the similar computation of either L1 Loss or L2 Loss depending on the absolute losses(differences) between a model's predictions and true values(train and test data) compared with delta which you set. *Memos:
    • delta is 1.0 basically.
    • Be careful, the computation is not exactly same as L1 Loss or L2 Loss according to the formulas below.
  • 's formula. *The 1st one is L2 Loss-like one and the 2nd one is L1 Loss-like one: Image description
  • is used for a regression model.
  • is HuberLoss() in PyTorch.
  • with delta of 1.0 is same as Smooth L1 Loss which is SmoothL1Loss() in PyTorch.
  • 's pros:
    • It's less sensitive to outliers and anomalies.
    • All losses can be differentiable.
    • The losses can be more easily compared than L2 Loss because only small losses are squared so the range of them is smaller than L2 Loss.
  • 's cons:
    • The computation is more than L1 Loss and L2 Loss because the formula is more complex than them.

(4) BCE(Binary Cross Entropy) Loss:

  • can compute the mean(average) of the sum of the losses(differences) between a model's binary predictions and true binary values(train and test data).
  • s' formula: Image description
  • is used for Binary Classification. *Binary Classification is the technology to classify data into two classes.
  • is also called Binary Cross Entropy or Log(Logarithmic) Loss.
  • is BCELoss() in PyTorch. *Memos:

(5) Cross Entropy Loss:

  • can compute the mean(average) of the sum of the losses(differences) between a model's predictions and true values(train and test data). *A loss is between 0 and 1.
  • s' formula: Image description
  • is used for Multiclass Classification and Computer Vision. *Memos:
    • Multiclass Classification is the technology to classify data into multiple classes.
    • Computer vision is the technology which enables a computer to understand objects.
  • is CrossEntropyLoss() in PyTorch.
  • s' code from scratch in PyTorch:
    </li>
    </ul>
    
    <p>import torch</p>
    
    <p>y_pred = torch.tensor([7.4, 2.8, -0.6])<br>
    y_true = torch.tensor([3.9, -5.1, 9.3])</p>
    
    <p>def cross_entropy(y_pred, y_true):<br>
        return -torch.sum(y_true * torch.log(y_pred))<br>
    print(cross_entropy(y_pred.softmax(dim=0), y_true.softmax(dim=0)))</p>
    <h1>
      <a name="tensor79744" href="#tensor79744">
      </a>
      tensor(7.9744)
    </h1>
    
    <p>y_pred = torch.tensor([[7.4, 2.8, -0.6], [1.3, 0.0, 4.2]])<br>
    y_true = torch.tensor([[3.9, -5.1, 9.3], [-5.3, 7.2, -8.4]])</p>
    
    <p>print(cross_entropy(y_pred.softmax(dim=1), y_true.softmax(dim=1)))</p>
    <h1>
      <a name="tensor122420" href="#tensor122420">
      </a>
      tensor(12.2420)
    </h1>
    
    <p></p>
    <div class="highlight"><pre class="highlight plaintext"><code>- s' code with **mean** from scratch in PyTorch:
    ```python
    
    
    import torch
    
    y_pred = torch.tensor([7.4, 2.8, -0.6])
    y_true = torch.tensor([3.9, -5.1, 9.3])
    
    def cross_entropy(y_pred, y_true):               # ↓ ↓ mean ↓ ↓
        return (-torch.sum(y_true * torch.log(y_pred))) / y_pred.ndim
    print(cross_entropy(y_pred.softmax(dim=0), y_true.softmax(dim=0)))
    # tensor(7.9744)
    
    y_pred = torch.tensor([[7.4, 2.8, -0.6], [1.3, 0.0, 4.2]])
    y_true = torch.tensor([[3.9, -5.1, 9.3], [-5.3, 7.2, -8.4]])
    
    print(cross_entropy(y_pred.softmax(dim=1), y_true.softmax(dim=1)))
    # tensor(6.1210)
    
    
    </code></pre></div>
    
    Enter fullscreen mode Exit fullscreen mode

Top comments (0)