# One-Hot Encoding, Explained

## A simple guide on the what, why, and how of One-Hot Encoding.

| UPDATED

One-Hot Encoding takes a single integer and produces a vector where **a single element is 1** and **all other elements are 0**, like $[0, 1, 0, 0]$.

For example, imagine we’re working with **categorical data**, where only a limited number of colors are possible: red, green, or blue. One way we could represent this numerically is by assigning each color a number:

Color | Value |
---|---|

Red | 0 |

Green | 1 |

Blue | 2 |

This is known as **integer encoding**. For Machine Learning, this encoding can be problematic - in this example, we’re essentially saying “green” is the *average* of “red” and “blue”, which can lead to weird unexpected outcomes.

It’s often more useful to use the **one-hot encoding** instead:

Color | Integer Encoding | One-Hot Encoding |
---|---|---|

Red | 0 | $[1, 0, 0]$ |

Green | 1 | $[0, 1, 0]$ |

Blue | 2 | $[0, 0, 1]$ |

This is much more useful to pass into something like a neural network.

## One-Hot Encoding in Python

Below are several different ways to implement one-hot encoding in Python.

### scikit-learn

Using scikit-learn’s OneHotEncoder:

```
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
print(encoder.fit_transform([['red'], ['green'], ['blue']]))
'''
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
'''
```

### Keras

Using Keras’s to_categorical:

```
from keras.utils import to_categorical
print(to_categorical([0, 1, 2]))
'''
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
'''
```

### NumPy

Using NumPy:

```
import numpy as np
arr = [2, 1, 0]
max = np.max(arr) + 1
print(np.eye(max)[arr])
'''
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]]
'''
```