*Memos:
- My post explains heaviside() and Identity().
- My post explains ReLU() and LeakyReLU().
- My post explains Leaky ReLU, PReLU and FReLU.
- My post explains ELU, SELU and CELU.
- My post explains GELU, Mish, SiLU and Softplus.
- My post explains Tanh, Softsign, Sigmoid and Softmax.
- My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
- My post explains layers in PyTorch.
- My post explains loss functions in PyTorch.
- My post explains optimizers in PyTorch.
An activation function is the function or layer which enables neural network to learn complex(non-linear) relationships by transforming the output of the previous layer. *Without activation functions, neural network can only learn linear relationships.
(1) Step function:
- can convert an input value(
x
) to 0 or 1. *Ifx
< 0, then 0 while ifx
>= 0, then 1. - is also called Binary step function, Unit step function, Binary threshold function, Threshold function, Heaviside step function or Heaviside function.
- is heaviside() in PyTorch.
- 's pros:
- It's simple, only expressing the two values
0
and1
. - It avoids Exploding Gradient Problem.
- It's simple, only expressing the two values
- 's cons:
- is rarely used in Deep Learning because the cons are more than other activation functions.
- It can only express the two values
0
and1
so the created model has bad accuracy, predicting inaccurately. *The activation functions which can express wider values can create the model of good accuracy, predicting accurately. - It causes Dying ReLU Problem.
- It's non-differentiable at
x = 0
. *The gradient for step function doesn't exist atx = 0
during Backpropagation which does differential to calculate and get a gradient.
- 's graph in Desmos:
(2) Identity:
- can just return the same value as an input value(
x
) without any conversion. - 's formula is y = x.
- is also called Linear function.
- is Identity() in PyTorch.
- 's pros:
- It's simple, just returning the same value as an input value.
- 's cons:
- It's non-differentiable at
x = 0
.
- It's non-differentiable at
- 's graph in Desmos:
(3) ReLU(Rectified Linear Unit):
- can convert an input value(
x
) to the output value between 0 andx
. *Ifx
< 0, then 0 while if 0 <=x
, thenx
. - 's formula is y = max(0, x).
- is ReLU() in PyTorch.
- is used in:
- Binary Classification Model.
- Multi-Class Classification Model.
- CNN(Convolutional Neural Network).
- RNN(Recurrent Neural Network). *RNN in PyTorch.
- Transformer. *Transformer() in PyTorch.
- NLP(Natural Language Processing) based on RNN.
- GAN(Generative Adversarial Network).
- 's pros:
- It mitigates Vanishing Gradient Problem.
- 's cons:
- It causes Dying ReLU Problem.
- It's non-differentiable at
x = 0
.
- 's graph in Desmos:
Top comments (0)