*Memos:
- My post explains ReLU() and LeakyReLU().
- My post explains PReLU() and ELU().
- My post explains Step function, Identity and ReLU.
- My post explains ELU, SELU and CELU.
- My post explains GELU, Mish, SiLU and Softplus.
- My post explains Tanh, Softsign, Sigmoid and Softmax.
- My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
- My post explains layers in PyTorch.
- My post explains loss functions in PyTorch.
- My post explains optimizers in PyTorch.
(1) Leaky ReLU(Leaky Rectified Linear Unit):
- is improved ReLU, being able to mitigate Dying ReLU Problem.
- can convert an input value(
x
) to the output value betweenax
andx
. *Memos:- If
x
< 0, thenax
while if 0 <=x
, thenx
. -
a
is 0.01 by default basically.
- If
- 's formula is y = max(ax, x).
- is also called LReLU.
- is LeakyReLU() in PyTorch.
- is used in:
- GAN.
- 's pros:
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- 's cons:
- It's non-differentiable at
x = 0
.
- It's non-differentiable at
- 's graph in Desmos:
(2) PReLU(Parametric Rectified Linear Unit):
- is improved Leaky ReLU, having the 0 or more learnable parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
- can convert an input value(
x
) to the output value betweenax
andx
: *Memos:- If
x
< 0, thenax
while if 0 <=x
, thenx
. -
a
is 0.25 by default basically. *a
is the initial value for 0 or more learnable parameters.
- If
- 's formula is y = max(ax, x).
- is PReLU() in PyTorch.
- is used in:
- SRGAN(Super-Resolution Generative Adversarial Network). *SRGAN is a type of GAN(Generative Adversarial Network).
- 's pros:
- It mitigates Vanishing Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- 's cons:
- It's non-differentiable at
x = 0
. *The gradient for step function doesn't exist atx = 0
during Backpropagation which does differential to calculate and get a gradient.
- It's non-differentiable at
- 's graph in Desmos:
(3) FReLU(Flexible Rectified Linear Unit):
- is improved ReLU, having the 0 or more learnable bias parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
- 's formula is y = ReLU(x) + b. *
b
is the initial value for 0 or more learnable bias parameters. - is also called Funnel Activation.
- isn't in PyTorch so you can use frelu.pytorch or FunnelAct_Pytorch.
- 's pros:
- It mitigates Vanishing Gradient Problem.
- It avoid Dying ReLU Problem when
b < 0
or0 < b
.
- 's cons:
- It causes Dying ReLU Problem when
b = 0
. - It causes Exploding Gradient Problem when
b
is greater and greater than 0. - It's non-differentiable at the angle. *The gradient for FReLU doesn't exist at the angle during Backpropagation which does differential to calculate and get a gradient.
- It causes Dying ReLU Problem when
- 's graph in Desmos:
Top comments (0)