## Question

Asked By – alvas

From the Udacity’s deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:

Where `S(y_i)`

is the softmax function of `y_i`

and `e`

is the exponential and `j`

is the no. of columns in the input vector Y.

I’ve tried the following:

```
import numpy as np
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
scores = [3.0, 1.0, 0.2]
print(softmax(scores))
```

which returns:

```
[ 0.8360188 0.11314284 0.05083836]
```

But the suggested solution was:

```
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
```

which produces the **same output as the first implementation**, even though the first implementation explicitly takes the difference of each column and the max and then divides by the sum.

**Can someone show mathematically why? Is one correct and the other one wrong?**

**Are the implementation similar in terms of code and time complexity? Which is more efficient?**

**Now we will see solution for issue: How to implement the Softmax function in Python **

## Answer

They’re both correct, but yours is preferred from the point of view of numerical stability.

You start with

```
e ^ (x - max(x)) / sum(e^(x - max(x))
```

By using the fact that a^(b – c) = (a^b)/(a^c) we have

```
= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))
= e ^ x / sum(e ^ x)
```

Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

This question is answered By – Trevor Merrifield

**This answer is collected from stackoverflow and reviewed by FixPython community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 **