DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on

Activation functions in PyTorch (2)

Buy Me a Coffee

*Memos:

(1) Leaky ReLU(Leaky Rectified Linear Unit):

  • is improved ReLU, being able to mitigate Dying ReLU Problem.
  • can convert an input value(x) to the output value between ax and x. *Memos:
    • If x < 0, then ax while if 0 <= x, then x.
    • a is 0.01 by default basically.
  • 's formula is y = max(ax, x).
  • is also called LReLU.
  • is LeakyReLU() in PyTorch.
  • is used in:
    • GAN.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

(2) PReLU(Parametric Rectified Linear Unit):

  • is improved Leaky ReLU, having the 0 or more learnable parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
  • can convert an input value(x) to the output value between ax and x: *Memos:
    • If x < 0, then ax while if 0 <= x, then x.
    • a is 0.25 by default basically. *a is the initial value for 0 or more learnable parameters.
  • 's formula is y = max(ax, x).
  • is PReLU() in PyTorch.
  • is used in:
    • SRGAN(Super-Resolution Generative Adversarial Network). *SRGAN is a type of GAN(Generative Adversarial Network).
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

(3) FReLU(Flexible Rectified Linear Unit):

  • is improved ReLU, having the 0 or more learnable bias parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
  • 's formula is y = ReLU(x) + b. *b is the initial value for 0 or more learnable bias parameters.
  • is also called Funnel Activation.
  • isn't in PyTorch so you can use frelu.pytorch or FunnelAct_Pytorch.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It avoid Dying ReLU Problem when b < 0 or 0 < b.
  • 's cons:
    • It causes Dying ReLU Problem when b = 0.
    • It causes Exploding Gradient Problem when b is greater and greater than 0.
    • It's non-differentiable at the angle. *The gradient for FReLU doesn't exist at the angle during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

Top comments (0)