Activation functions in PyTorch (2)

#python #pytorch #activationfunction #deeplearning

*Memos:

My post explains ReLU() and LeakyReLU().
My post explains PReLU() and ELU().
My post explains Step function, Identity and ReLU.
My post explains ELU, SELU and CELU.
My post explains GELU, Mish, SiLU and Softplus.
My post explains Tanh, Softsign, Sigmoid and Softmax.
My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
My post explains layers in PyTorch.
My post explains loss functions in PyTorch.
My post explains optimizers in PyTorch.

(1) Leaky ReLU(Leaky Rectified Linear Unit):

is improved ReLU, being able to mitigate Dying ReLU Problem.
can convert an input value(x) to the output value between ax and x. *Memos:
- If x < 0, then ax while if 0 <= x, then x.
- a is 0.01 by default basically.
's formula is y = max(ax, x).
is also called LReLU.
is LeakyReLU() in PyTorch.
is used in:
- GAN.
's pros:
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
's cons:
- It's non-differentiable at x = 0.
's graph in Desmos:

(2) PReLU(Parametric Rectified Linear Unit):

is improved Leaky ReLU, having the 0 or more learnable parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
can convert an input value(x) to the output value between ax and x: *Memos:
- If x < 0, then ax while if 0 <= x, then x.
- a is 0.25 by default basically. *a is the initial value for 0 or more learnable parameters.
's formula is y = max(ax, x).
is PReLU() in PyTorch.
is used in:
- SRGAN(Super-Resolution Generative Adversarial Network). *SRGAN is a type of GAN(Generative Adversarial Network).
's pros:
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
's cons:
- It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
's graph in Desmos:

(3) FReLU(Flexible Rectified Linear Unit):

is improved ReLU, having the 0 or more learnable bias parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
's formula is y = ReLU(x) + b. *b is the initial value for 0 or more learnable bias parameters.
is also called Funnel Activation.
isn't in PyTorch so you can use frelu.pytorch or FunnelAct_Pytorch.
's pros:
- It mitigates Vanishing Gradient Problem.
- It avoid Dying ReLU Problem when b < 0 or 0 < b.
's cons:
- It causes Dying ReLU Problem when b = 0.
- It causes Exploding Gradient Problem when b is greater and greater than 0.
- It's non-differentiable at the angle. *The gradient for FReLU doesn't exist at the angle during Backpropagation which does differential to calculate and get a gradient.
's graph in Desmos:

DEV Community