*Memos:
- My post explains Tanh() and Softsign().
- My post explains Sigmoid() and Softmax().
- My post explains Step function, Identity and ReLU.
- My post explains Leaky ReLU, PReLU and FReLU.
- My post explains ELU, SELU and CELU.
- My post explains GELU, Mish, SiLU and Softplus.
- My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
- My post explains layers in PyTorch.
- My post explains loss functions in PyTorch.
- My post explains optimizers in PyTorch.
(1) Tanh:
- can convert an input value(
x
) to the output value between -1 and 1. *0 and 1 are exclusive. - 's formula is y = (e
x
- e-x
) / (ex
+ e-x
). - is also called Hyperbolic Tangent Function.
- is Tanh() in PyTorch.
- is used in:
- s'pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- s'cons:
- It causes Vanishing Gradient Problem.
- It's computationally expensive because of exponential and complex operation.
- 's graph in Desmos:
(2) Softsign:
- can convert an input value(
x
) to the output value between 1 and -1. *1 and -1 are exclusive. - 's formula is y = x / (1 + |x|).
- is Softsign() in PyTorch.
- 's pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
- 's cons:
- It causes Vanishing Gradient Problem.
- 's graph in Desmos:
(3) Sigmoid:
- can convert an input value(
x
) to the output value between 0 and 1. *0 and 1 are exclusive. - 's formula is y = 1 / (1 + e-
x
). - is Sigmoid() in PyTorch.
- is used in:
- Binary Classification Model.
- Logistic Regression.
- LSTM.
- GRU.
- GAN.
- 's pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
- 's cons:
- It causes Vanishing Gradient Problem.
- It's computationally expensive because of exponential operation.
- 's graph in Desmos:
(4) Softmax:
- can convert input values(
x
s) to the output values between 0 and 1 each and whose sum is 1(100%): *Memos:- *0 and 1 are exclusive.
- If input values are [5, 4, -1], then the output values are [0.730, 0.268, 0.002] which is 0.730(73%) + 0.268(26.8%) + 0.002(0.2%) = 1(100%). *Sometimes, approximately 100%.
- 's formula is:
- is Softmax() in PyTorch.
- is used in:
- Multi-Class Classification Model.
- 's pros:
- It normalizes input values.
- The convergence is stable.
- It mitigates Exploding Gradient Problem.
- It avoids Dying ReLU Problem.
- 's cons:
- It causes Vanishing Gradient Problem.
- It's computationally expensive because of exponential and complex operation.
- 's graph in Desmos:
- 's code from scratch in PyTorch. *
dim=0
must be set for sum() otherwise different values are returned for a different D(Dimensional) tensor.
import torch
def softmax(input):
e_i = torch.exp(input)
return e_i / e_i.sum(dim=0)
# ↑↑↑↑↑ Must be set.
my_tensor = torch.tensor([8., -3., 0., 1.])
print(softmax(my_tensor))
# tensor([9.9874e-01, 1.6681e-05, 3.3504e-04, 9.1073e-04])
my_tensor = torch.tensor([[8., -3.], [0., 1.]])
print(softmax(my_tensor))
# tensor([[9.9966e-01, 1.7986e-02],
# [3.3535e-04, 9.8201e-01]])
my_tensor = torch.tensor([[[8.], [-3.]], [[0.], [1.]]])
print(softmax(my_tensor))
# tensor([[[9.9966e-01],
# [1.7986e-02]],
# [[3.3535e-04],
# [9.8201e-01]]])
Top comments (0)