In this article, I will show you how to build a toxic comment detector API using FastAPI.
From the image below, you will see the API responded to the text trash stuff with a response of toxic comment and the degree to which it is a toxic comment.
Table of Contents
- Introduction
- Get model and Vectorizer
- Setup fastapi app
- Add model and vectorizer
- Test app
- Conclusion
Introduction
Toxicity is anything rude, disrespectful, or otherwise likely to make someone leave a discussion. We will use Fastapi to build the API.
See how the API works:
When I enter a comment, for Example: "trash stuff" as seen below:
The API will detect whether "trash stuff" is a toxic comment. From the prediction below, "trash stuff" is definitely a toxic comment.
Get model and Vectorizer
Download the trained model and the vectorizer which will help to transform words into numerical vectors from here:
https://www.kaggle.com/code/emmamichael101/toxicity-bias-logistic-regression-tfidfvectorizer/data?scriptVersionId=113780035
You can also get it from the project repo: https://github.com/emmakodes/fastapi-toxic-comment-detector
We are using a Logistic Regression model.
Setup fastapi app
Create a new folder and open it with your code editor
Run the following command on your terminal to create a new virtual environment called .venv
python -m venv .venv
- Activate the virtual environment
.venv\Scripts\activate
- Run the following command to install fastapi, pytest, scikit-learn
pip install "fastapi[all]" pytest scikit-learn
- Create a new directory called app. Inside the app directory create an init.py, main.py, schemas.py, test_main.py, dependencies.py files, and a routers directory. Your directory should be as below:
- main.py: add the following code to main.py:
from fastapi import Depends, FastAPI
from .routers import comments
app = FastAPI()
app.include_router(comments.router)
The above code sets up a FastAPI web application with some initial configurations. It creates an instance of the FastAPI class and includes a router that defines routes related to comments.
- schemas.py: add the following code to schemas.py file:
from pydantic import BaseModel
class Comment(BaseModel):
comments: str
The Comment model above is used to define the structure of comments, specifically that they should be represented as strings.
- dependencies.py add the following code to dependencies.py file:
import pickle
from . import schemas
def check_toxicity(comment: schemas.Comment):
comment_dictionary = {}
comment_dictionary['comment_key'] = [comment.comments]
# load the vectorizer and vectorize comment
vectorizer = pickle.load(open("./Vectorize.pickle", "rb"))
testing = vectorizer.transform(comment_dictionary['comment_key'])
# load the model
with open('./Pickle_LR_Model.pkl', 'rb') as file:
lr_model = pickle.load(file)
# predict toxicity. prediction range from 0.0 to 1.0 (0.0 = non-toxic and 1.0 toxic)
prediction = lr_model.predict_proba(testing)[:, 1]
prediction = float(prediction)
if prediction >= 0.9 and prediction <= 1.0:
response_list = ["toxic comment", prediction]
return {"response": response_list}
elif prediction >= 0.0 and prediction <= 0.1:
response_list = ["non toxic comment", prediction]
return {"response": response_list}
else:
response_list = ["Manually check this", prediction]
return {"response": response_list}
check_toxicity function receives a comment, loads the vectorizer to transform the comment to numerical values, and then passes the numerical values to the model to predict the toxicity of the comment. The model will return any prediction value from 0.0 to 1.0.
Prediction at exactly 0.0 signifies a strongly nontoxic comment while prediction at 1.0 signifies a strongly toxic comment.
For the API, I decided to call predictions greater than or equal to 0.9 and less than or equal to 1.0 toxic comments since the model is pretty sure at these instances, predictions greater than or equal to 0.0 and less than or equal to 0.1 nontoxic comments and regarded every prediction from 0.2 to 0.8 neither toxic nor non-toxic comments.
- test_main.py: Add the following code to test_main.py file:
from fastapi.testclient import TestClient
from .main import app
client = TestClient(app)
def test_check_toxicity():
response = client.post("/comments/")
response = client.post(
"/comments/",
json={"comments": "string"},
)
assert response.status_code == 200
def test_check_toxicity_wrong_datatype():
response = client.post(
"/comments/",
json={"comments": 20},
)
assert response.status_code == 422
assert response.json() == {
"detail": [
{
"type": "string_type",
"loc": [
"body",
"comments"
],
"msg": "Input should be a valid string",
"input": 20,
"url": "https://errors.pydantic.dev/2.3/v/string_type"
}
]
}
These test cases use the TestClient to simulate HTTP requests to the FastAPI application and then assert that the responses match the expected behavior. They help ensure that the endpoint for checking the toxicity of comments handles various scenarios correctly, including incorrect data types and missing data.
- In routers directory, create an init.py file, and comments.py file. Inside comments.py file, add the following code:
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, status
from ..dependencies import check_toxicity
router = APIRouter(
prefix="/comments",
tags=["comments"],
)
@router.post('/')
async def check_toxic_comment(toxicity_check: Annotated[dict, Depends(check_toxicity)]):
return toxicity_check
The code above defines a FastAPI router (router) that handles HTTP POST requests to "/comments/" by using the check_toxicity dependency to validate and check the toxicity of the incoming data. The endpoint itself simply returns if the comment is toxic or not. The use of annotations and dependencies allows for modular and reusable code, as well as a clear separation of concerns.
Add model and vectorizer
Add the downloaded model and vectorizer to the project root directory.
- Also, you can create a .gitignore file and add the following:
__pycache__/
.venv/
.pytest_cache
- Generate a requirements.txt file using the following command:
pip freeze > requirements.txt
The final project directory should be as follows:
Test app
Now, you can test the project.
Open Postman
Add a new request
change method to POST
enter the follownig url http://127.0.0.1:8000/comments/
Select Body, raw and JSON
Enter the following JSON comment for example and click Send:
{
"comments": "trash stuff"
}
The API should respond with the following to show that trash stuff is a toxic comment:
{
"response": [
"toxic comment",
0.9616073904446953
]
}
Conclusion
Basically, when you send a JSON comment, the fastapi application takes the comment and calls check_toxic_comment function which calls check_toxicity dependency that predicts if a comment is toxic or not and returns the value.
You can get the complete code here: https://github.com/emmakodes/fastapi-toxic-comment-detector
Top comments (2)
Why exactly is 'trash stuff' considered a toxic comment? Devoid of context, it's impossible to make that judgement.
This kind of detector should only really be used to flag comments that would then be passed to a human for review... but even a system like that would not be foolproof due to the fact that the 'toxicity' of any comment is subjective.
A reliable toxic comment detector would be a very difficult thing to build indeed. It would need to take into account: context, regional dialects/slang/idioms, sarcasm, jokes, etc.
Yes, @jonrandy It is right to also have a human act as a second reviewer for this system so as to improve the system over time and indeed you are right. We can think of it as a system to help owners know comments to pay more attention to.