A Simple Explanation of the Softmax Function

What Softmax is, how it's used, and how to implement it in Python.

Softmax turns arbitrary real values into probabilities, which are often useful in Machine Learning. The math behind it is pretty simple: given some numbers,

  1. Raise e (the mathematical constant) to the power of each of those numbers.
  2. Sum up all the exponentials (powers of ee). This result is the denominator.
  3. Use each number’s exponential as its numerator.
  4. Probability=NumeratorDenominator\text{Probability} = \frac{\text{Numerator}}{\text{Denominator}}.

Written more fancily, Softmax performs the following transform on nn numbers x1xnx_1 \ldots x_n:

s(xi)=exij=1nexjs(x_i) = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}}

The outputs of the Softmax transform are always in the range [0,1][0, 1] and add up to 1. Hence, they form a probability distribution.

A Simple Example

Say we have the numbers -1, 0, 3, and 5. First, we calculate the denominator:

Denominator=e1+e0+e3+e5=169.87\begin{aligned} \text{Denominator} &= e^{-1} + e^0 + e^3 + e^5 \\ &= \boxed{169.87} \\ \end{aligned}

Then, we can calculate the numerators and probabilities:

xxNumerator (exe^x)Probability (ex169.87\frac{e^x}{169.87})
-10.3680.002
010.006
320.090.118
5148.410.874

The bigger the xx, the higher its probability. Also, notice that the probabilities all add up to 1, as mentioned before.

Implementing Softmax in Python

Using numpy makes this super easy:

import numpy as np

def softmax(xs):
    return np.exp(xs) / sum(np.exp(xs))

xs = np.array([-1, 0, 3, 5])
print(softmax(xs)) # [0.0021657, 0.00588697, 0.11824302, 0.87370431]
np.exp() raises e to the power of each element in the input array.

Note: for more advanced users, you’ll probably want to implement this using the LogSumExp trick to avoid underflow/overflow problems.

Why is Softmax useful?

Imagine building a Neural Network to answer the question: Is this picture of a dog or a cat?

A common design for this neural network would have it output 2 real numbers, one representing dog and the other cat, and apply Softmax on these values. For example, let’s say the network outputs [1,2][-1, 2]:

Animalxxexe^xProbability
Dog-10.3680.047
Cat27.390.953

This means our network is 95.3% confident that the picture is of a cat. Softmax lets us answer classification questions with probabilities, which are more useful than simpler answers (e.g. binary yes/no).

I write about ML, Web Dev, and more topics. Subscribe to get new posts by email!



This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

This blog is open-source on Github.

At least this isn't a full screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about ML, Web Dev, and more topics.



This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.