Fix Python – How to implement the Softmax function in Python

Question

Asked By – alvas

From the Udacity’s deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:

enter image description here

Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the input vector Y.

I’ve tried the following:

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

scores = [3.0, 1.0, 0.2]
print(softmax(scores))

which returns:

[ 0.8360188   0.11314284  0.05083836]

But the suggested solution was:

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    return np.exp(x) / np.sum(np.exp(x), axis=0)

which produces the same output as the first implementation, even though the first implementation explicitly takes the difference of each column and the max and then divides by the sum.

Can someone show mathematically why? Is one correct and the other one wrong?

Are the implementation similar in terms of code and time complexity? Which is more efficient?

Now we will see solution for issue: How to implement the Softmax function in Python


Answer

They’re both correct, but yours is preferred from the point of view of numerical stability.

You start with

e ^ (x - max(x)) / sum(e^(x - max(x))

By using the fact that a^(b – c) = (a^b)/(a^c) we have

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

This question is answered By – Trevor Merrifield

This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0