Building a Better Profanity Detection Library with scikit-learn

Why existing libraries are uninspiring and how I built a better one.

A few months ago, I needed a way to detect profanity in user-submitted text strings:

This shouldn’t be that hard, right?

I ended up building and releasing my own library for this purpose called profanity-check.

Of course, before I did that, I looked in the Python Package Index (PyPI) for any existing libraries that could do this for me. The only half decent results for the search query “profanity” were:

Third-party libraries can sometimes be sketchy, though, so I did my due diligence on these 4 results.

profanity, better-profanity, and profanityfilter

After a quick dig through the profanity repository, I found a file named wordlist.txt:

NSFW

The entire profanity library is just a wrapper over this list of 32 words! profanity detects profanity simply by looking for one of these words.

To my dismay, better-profanity and profanityfilter both took the same approach:

This is bad because profanity detection libraries based on wordlists are extremely subjective. For example, better-profanity’s wordlist includes the word “suck.” Are you willing to say that any sentence containing the word “suck” is profane? Furthermore, any hard-coded list of bad words will inevitably be incomplete — do you think profanity’s 32 bad words are the only ones out there?

xkcd 290

Fucking Blue Shells. source: xkcd

Having already ruled out 3 libraries, I put my hopes on the 4th and final one: profanity-filter.

profanity-filter

profanity-filter uses Machine Learning! Sweet!

Turns out, it’s really slow. Here’s a benchmark I ran in December 2018 comparing (1) profanity-filter, (2) my library profanity-check, and (3) profanity (the one with the list of 32 words):

A human could probably do this faster than profanity-filter can

I needed to be able to perform many predictions in real time, and profanity-filter was not even close to being fast enough. But hey, maybe this is a classic tradeoff of accuracy for speed, right?

Nope.

At least profanity-filter is not dead last this time

None of the libraries I’d found on PyPI met my needs, so I built my own.

Building profanity-check, Part 1: Data

I knew that I wanted profanity-check to base its classifications on data to avoid being subjective (read: to be able to say I used Machine Learning). I put together a combined dataset from two publicly-available sources:

Each of these datasets contains text samples hand-labeled by humans through crowdsourcing sites like Figure Eight.

Here’s what my dataset ended up looking like:

Combined = Tweets + Wikipedia

The Twitter dataset has a column named class that’s 0 if the tweet contains hate speech, 1 if it contains offensive language, and 2 if it contains neither. I classified any tweet with a class of 2 as “Not Offensive” and all other tweets as “Offensive.”

The Wikipedia dataset has several binary columns (e.g. toxic or threat) that represent whether or not that text contains that type of toxicity. I classified any text that contained any of the types of toxicity as “Offensive” and all other texts as “Not Offensive.”

Building profanity-check, Part 2: Training

Now armed with a cleaned, combined dataset (which you can download here), I was ready to train the model!

I’m skipping over how I cleaned the dataset because, honestly, it’s pretty boring— if you’re interested in learning more about preprocessing text datasets check out this article or this post.

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import LinearSVC
from sklearn.externals import joblib

# Read in data
data = pd.read_csv('clean_data.csv')
texts = data['text'].astype(str)
y = data['is_offensive']

# Vectorize the text
vectorizer = CountVectorizer(stop_words='english', min_df=0.0001)
X = vectorizer.fit_transform(texts)

# Train the model
model = LinearSVC(class_weight="balanced", dual=False, tol=1e-2, max_iter=1e5)
cclf = CalibratedClassifierCV(base_estimator=model)
cclf.fit(X, y)

# Save the model
joblib.dump(vectorizer, 'vectorizer.joblib')
joblib.dump(cclf, 'model.joblib')
Are you also surprised the code is so short? Apparently scikit-learn does everything.

Two major steps are happening here: (1) vectorization and (2) training.

Vectorization: Bag of Words

I used scikit-learn’s CountVectorizer class, which basically turns any text string into a vector by counting how many times each given word appears. This is known as a Bag of Words (BOW) representation. For example, if the only words in the English language were the, cat, sat, and hat, a possible vectorization of the sentence the cat sat in the hat might be:

“the cat sat in the hat” -> [2, 1, 1, 1, 1]

The ??? represents any unknown word, which for this sentence is in. Any sentence can be represented in this way as counts of the, cat, sat, hat, and ???!

A handy reference table for the next time you need to vectorize “cat cat cat cat cat”

Of course, there are far more words in the English language, so in the code above I use the fit_transform() method, which does 2 things:

  • Fit: learns a vocabulary by looking at all words that appear in the dataset.
  • Transform: turns each text string in the dataset into its vector form.

Training: Linear SVM

The model I decided to use was a Linear Support Vector Machine (SVM), which is implemented by scikit-learn’s LinearSVC class. This post and this tutorial are good introductions if you don’t know what SVMs are.

The CalibratedClassifierCV in the code above exists as a wrapper to give me the predict_proba() method, which returns a probability for each class instead of just a classification. You can pretty much just ignore it if that last sentence made no sense to you, though.

Here’s one (simplified) way you could think about why the Linear SVM works: during the training process, the model learns which words are “bad” and how “bad” they are because those words appear more often in offensive texts. It’s as if the training process is picking out the “bad” words for me, which is much better than using a wordlist I write myself!

A Linear SVM combines the best aspects of the other profanity detection libraries I found: it’s fast enough to run in real-time yet robust enough to handle many different kinds of profanity.

Caveats

That being said, profanity-check is far from perfect. Let me be clear: take predictions from profanity-check with a grain of salt because it makes mistakes. For example, its not good at picking up less common variants of profanities like “f4ck you” or “you b1tch” because they don’t appear often enough in the training data. You’ll never be able to detect all profanity (people will come up with new ways to evade filters), but profanity-check does a good job at finding most.

profanity-check

profanity-check is open source and available on PyPI! To use it, simply

$ pip install profanity-check

How could profanity-check be even better? Feel free to reach out or comment with any thoughts or suggestions!


This article was originally posted on Medium.

Subscribe to know whenever I post new content. I don't spam!


This blog is open-source on Github.

At least this isn't a full screen popup

That would be more annoying. Anyways, if you like what you're reading, consider subscribing to my newsletter! I'll notify you when I publish new posts - no spam.