Asked By – alvas
From the Udacity’s deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:
S(y_i) is the softmax function of
e is the exponential and
j is the no. of columns in the input vector Y.
I’ve tried the following:
import numpy as np def softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() scores = [3.0, 1.0, 0.2] print(softmax(scores))
[ 0.8360188 0.11314284 0.05083836]
But the suggested solution was:
def softmax(x): """Compute softmax values for each sets of scores in x.""" return np.exp(x) / np.sum(np.exp(x), axis=0)
which produces the same output as the first implementation, even though the first implementation explicitly takes the difference of each column and the max and then divides by the sum.
Can someone show mathematically why? Is one correct and the other one wrong?
Are the implementation similar in terms of code and time complexity? Which is more efficient?
Now we will see solution for issue: How to implement the Softmax function in Python
They’re both correct, but yours is preferred from the point of view of numerical stability.
You start with
e ^ (x - max(x)) / sum(e^(x - max(x))
By using the fact that a^(b – c) = (a^b)/(a^c) we have
= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x))) = e ^ x / sum(e ^ x)
Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.