# One-Hot Encoding, Explained

## A simple guide on the what, why, and how of One-Hot Encoding.

One-Hot Encoding takes a single integer and produces a vector where a single element is 1 and all other elements are 0, like $[0, 1, 0, 0]$.

For example, imagine we’re working with categorical data, where only a limited number of colors are possible: red, green, or blue. One way we could represent this numerically is by assigning each color a number:

ColorValue
Red0
Green1
Blue2

This is known as integer encoding. For Machine Learning, this encoding can be problematic - in this example, we’re essentially saying “green” is the average of “red” and “blue”, which can lead to weird unexpected outcomes.

It’s often more useful to use the one-hot encoding instead:

ColorInteger EncodingOne-Hot Encoding
Red0$[1, 0, 0]$
Green1$[0, 1, 0]$
Blue2$[0, 0, 1]$

This is much more useful to pass into something like a neural network.

## One-Hot Encoding in Python

Below are several different ways to implement one-hot encoding in Python.

### scikit-learn

Using scikit-learn’s OneHotEncoder:

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False)
print(encoder.fit_transform([['red'], ['green'], ['blue']]))
'''
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
'''

### Keras

Using Keras’s to_categorical:

from keras.utils import to_categorical

print(to_categorical([0, 1, 2]))
'''
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
'''

### NumPy

Using NumPy:

import numpy as np

arr = [2, 1, 0]
max = np.max(arr) + 1
print(np.eye(max)[arr])
'''
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
'''

I write about ML, Web Dev, and more topics. Subscribe to get new posts by email!