A Simple Explanation of the Softmax Function

What Softmax is, how it's used, and how to implement it in Python.

July 22, 2019 | UPDATED December 26, 2019

Softmax turns arbitrary real values into probabilities, which are often useful in Machine Learning. The math behind it is pretty simple: given some numbers,

Raise e (the mathematical constant) to the power of each of those numbers.
Sum up all the exponentials (powers of $e$ ). This result is the denominator.
Use each number’s exponential as its numerator.
$\text{Probability} = \frac{\text{Numerator}}{\text{Denominator}}$ .

Written more fancily, Softmax performs the following transform on $n$ numbers $x_1 \ldots x_n$ :

s(x_i) = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}}

The outputs of the Softmax transform are always in the range $[0, 1]$ and add up to 1. Hence, they form a probability distribution.

A Simple Example

Say we have the numbers -1, 0, 3, and 5. First, we calculate the denominator:

\begin{aligned} \text{Denominator} &= e^{-1} + e^0 + e^3 + e^5 \\ &= \boxed{169.87} \\ \end{aligned}

Then, we can calculate the numerators and probabilities:

$x$	Numerator ( $e^x$ )	Probability ( $\frac{e^x}{169.87}$ )
-1	0.368	0.002
0	1	0.006
3	20.09	0.118
5	148.41	0.874

The bigger the $x$ , the higher its probability. Also, notice that the probabilities all add up to 1, as mentioned before.

Implementing Softmax in Python

Using numpy makes this super easy:

import numpy as np

def softmax(xs):
    return np.exp(xs) / sum(np.exp(xs))

xs = np.array([-1, 0, 3, 5])
print(softmax(xs)) # [0.0021657, 0.00588697, 0.11824302, 0.87370431]

np.exp() raises e to the power of each element in the input array.

Note: for more advanced users, you’ll probably want to implement this using the LogSumExp trick to avoid underflow/overflow problems.

Why is Softmax useful?

Imagine building a Neural Network to answer the question: Is this picture of a dog or a cat?

A common design for this neural network would have it output 2 real numbers, one representing dog and the other cat, and apply Softmax on these values. For example, let’s say the network outputs $[-1, 2]$ :

Animal	$x$	$e^x$	Probability
Dog	-1	0.368	0.047
Cat	2	7.39	0.953

This means our network is 95.3% confident that the picture is of a cat. Softmax lets us answer classification questions with probabilities, which are more useful than simpler answers (e.g. binary yes/no).

Victor Zhou

A Simple Explanation of the Softmax Function

What Softmax is, how it's used, and how to implement it in Python.

A Simple Example

Implementing Softmax in Python

Why is Softmax useful?

Tags:

YOU MIGHT ALSO LIKE

Victor Zhou @victorczhou

At least this isn't a full screen popup

Victor Zhou

A Simple Explanation of the Softmax Function

What Softmax is, how it's used, and how to implement it in Python.

A Simple Example

Implementing Softmax in Python

Why is Softmax useful?

Tags:

YOU MIGHT ALSO LIKE

Victor Zhou @victorczhou

SHARE THIS POST

At least this isn't a full screen popup