One-Hot Encoding, Explained

A simple guide on the what, why, and how of One-Hot Encoding.

March 15, 2020 | UPDATED May 11, 2020

One-Hot Encoding takes a single integer and produces a vector where a single element is 1 and all other elements are 0, like $[0, 1, 0, 0]$ .

For example, imagine we’re working with categorical data, where only a limited number of colors are possible: red, green, or blue. One way we could represent this numerically is by assigning each color a number:

Color	Value
Red	0
Green	1
Blue	2

This is known as integer encoding. For Machine Learning, this encoding can be problematic - in this example, we’re essentially saying “green” is the average of “red” and “blue”, which can lead to weird unexpected outcomes.

It’s often more useful to use the one-hot encoding instead:

Color	Integer Encoding	One-Hot Encoding
Red	0	$[1, 0, 0]$
Green	1	$[0, 1, 0]$
Blue	2	$[0, 0, 1]$

This is much more useful to pass into something like a neural network.

One-Hot Encoding in Python

Below are several different ways to implement one-hot encoding in Python.

scikit-learn

Using scikit-learn’s OneHotEncoder:

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False)
print(encoder.fit_transform([['red'], ['green'], ['blue']]))
'''
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
 '''

Keras

Using Keras’s to_categorical:

from keras.utils import to_categorical

print(to_categorical([0, 1, 2]))
'''
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
 '''

NumPy

Using NumPy:

import numpy as np

arr = [2, 1, 0]
max = np.max(arr) + 1
print(np.eye(max)[arr])
'''
[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
'''

Victor Zhou

One-Hot Encoding, Explained

A simple guide on the what, why, and how of One-Hot Encoding.

One-Hot Encoding in Python

scikit-learn

Keras

NumPy

Tags:

YOU MIGHT ALSO LIKE

Victor Zhou @victorczhou

Victor Zhou

One-Hot Encoding, Explained

A simple guide on the what, why, and how of One-Hot Encoding.

One-Hot Encoding in Python

scikit-learn

Keras

NumPy

Tags:

YOU MIGHT ALSO LIKE

Victor Zhou @victorczhou

SHARE THIS POST