# A Simple Explanation of the Softmax Function

## What Softmax is, how it's used, and how to implement it in Python.

Softmax turns arbitrary real values into probabilities, which are often useful in Machine Learning. The math behind it is pretty simple: given some numbers,

1. Raise e (the mathematical constant) to the power of each of those numbers.
2. Sum up all the exponentials (powers of $e$). This result is the denominator.
3. Use each number’s exponential as its numerator.
4. $\text{Probability} = \frac{\text{Numerator}}{\text{Denominator}}$.

Written more fancily, Softmax performs the following transform on $n$ numbers $x_1 \ldots x_n$:

$s(x_i) = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}}$

The outputs of the Softmax transform are always in the range $[0, 1]$ and add up to 1. Hence, they form a probability distribution.

## A Simple Example

Say we have the numbers -1, 0, 3, and 5. First, we calculate the denominator:

\begin{aligned} \text{Denominator} &= e^{-1} + e^0 + e^3 + e^5 \\ &= \boxed{169.87} \\ \end{aligned}

Then, we can calculate the numerators and probabilities:

$x$Numerator ($e^x$)Probability ($\frac{e^x}{169.87}$)
-10.3680.002
010.006
320.090.118
5148.410.874

The bigger the $x$, the higher its probability. Also, notice that the probabilities all add up to 1, as mentioned before.

## Implementing Softmax in Python

Using numpy makes this super easy:

import numpy as np

def softmax(xs):
return np.exp(xs) / sum(np.exp(xs))

xs = np.array([-1, 0, 3, 5])
print(softmax(xs)) # [0.0021657, 0.00588697, 0.11824302, 0.87370431]

Note: for more advanced users, you’ll probably want to implement this using the LogSumExp trick to avoid underflow/overflow problems.

## Why is Softmax useful?

Imagine building a Neural Network to answer the question: Is this picture of a dog or a cat?

A common design for this neural network would have it output 2 real numbers, one representing dog and the other cat, and apply Softmax on these values. For example, let’s say the network outputs $[-1, 2]$:

Animal$x$$e^x$Probability
Dog-10.3680.047
Cat27.390.953

This means our network is 95.3% confident that the picture is of a cat. Softmax lets us answer classification questions with probabilities, which are more useful than simpler answers (e.g. binary yes/no).

I write about ML, Web Dev, and more topics. Subscribe to get new posts by email!