A Simple Explanation of the Softmax Function

What Softmax is, how it's used, and how to implement it in Python.

Softmax turns arbitrary real values into probabilities, which are often useful in Machine Learning. The math behind it is pretty simple: given some numbers,

  1. Raise e (the mathematical constant) to the power of each of those numbers.
  2. Sum up all the exponentials (powers of ee). This result is the denominator.
  3. Use each number’s exponential as its numerator.
  4. Probability=NumeratorDenominator\text{Probability} = \frac{\text{Numerator}}{\text{Denominator}}.

Written more fancily, Softmax performs the following transform on nn numbers x1xnx_1 \ldots x_n:

s(xi)=exij=1nexjs(x_i) = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}}

The outputs of the Softmax transform are always in the range [0,1][0, 1] and add up to 1. Hence, they form a probability distribution.

A Simple Example

Say we have the numbers -1, 0, 3, and 5. First, we calculate the denominator:

Denominator=e1+e0+e3+e5=169.87\begin{aligned} \text{Denominator} &= e^{-1} + e^0 + e^3 + e^5 \\ &= \boxed{169.87} \\ \end{aligned}

Then, we can calculate the numerators and probabilities:

xx Numerator (exe^x) Probability (ex169.87\frac{e^x}{169.87})
-1 0.368 0.002
0 1 0.006
3 20.09 0.118
5 148.41 0.874

The bigger the xx, the higher its probability. Also, notice that the probabilities all add up to 1, as mentioned before.

Implementing Softmax in Python

Using numpy makes this super easy:

import numpy as np

def softmax(xs):
    return np.exp(xs) / sum(np.exp(xs))

xs = np.array([-1, 0, 3, 5])
print(softmax(xs)) # [0.0021657, 0.00588697, 0.11824302, 0.87370431]
np.exp() raises e to the power of each element in the input array.

Why is Softmax useful?

Imagine building a Neural Network to answer the question: Is this picture of a dog or a cat?

A common design for this neural network would have it output 2 real numbers, one representing dog and the other cat, and apply Softmax on these values. For example, let’s say the network outputs [1,2][-1, 2]:

Animal xx exe^x Probability
Dog -1 0.368 0.047
Cat 2 7.39 0.953

This means our network is 95.3% confident that the picture is of a cat. Softmax lets us answer classification questions with probabilities, which are more useful than simpler answers (e.g. binary yes/no).

I write about ML, Web Dev, and more. Subscribe to get new posts by email!

This blog is open-source on Github.

At least this isn't a full screen popup

That would be more annoying. Anyways, consider subscribing to my newsletter to get new posts by email! I write about ML, Web Dev, and more.