One-Hot Encoding, Explained

A simple guide on the what, why, and how of One-Hot Encoding.

One-Hot Encoding takes a single integer and produces a vector where a single element is 1 and all other elements are 0, like [0,1,0,0][0, 1, 0, 0].

For example, imagine we’re working with categorical data, where only a limited number of colors are possible: red, green, or blue. One way we could represent this numerically is by assigning each color a number:

ColorValue
Red0
Green1
Blue2

This is known as integer encoding. For Machine Learning, this encoding can be problematic - in this example, we’re essentially saying “green” is the average of “red” and “blue”, which can lead to weird unexpected outcomes.

It’s often more useful to use the one-hot encoding instead:

ColorInteger EncodingOne-Hot Encoding
Red0[1,0,0][1, 0, 0]
Green1[0,1,0][0, 1, 0]
Blue2[0,0,1][0, 0, 1]

This is much more useful to pass into something like a neural network.

One-Hot Encoding in Python

Below are several different ways to implement one-hot encoding in Python.

scikit-learn

Using scikit-learn’s OneHotEncoder:

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False)
print(encoder.fit_transform([['red'], ['green'], ['blue']]))
'''
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
 '''

Keras

Using Keras’s to_categorical:

from keras.utils import to_categorical

print(to_categorical([0, 1, 2]))
'''
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
 '''

NumPy

Using NumPy:

import numpy as np

arr = [2, 1, 0]
max = np.max(arr) + 1
print(np.eye(max)[arr])
'''
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
'''

I write about ML, Web Dev, and more topics. Subscribe to get new posts by email!



This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

This blog is open-source on Github.

At least this isn't a full screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about ML, Web Dev, and more topics.



This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.