DEV Community

Liz Acosta
Liz Acosta

Posted on

Build an AI-Powered Anomaly Detection Application for E-Commerce Analytics

#ai

Learn how AI can be used to prevent stricter buyer return policies

This cookbook recipe provides a hands-on example of using unsupervised machine learning to detect anomalies in e-commerce data. The application leverages algorithms like Isolation Forest to identify unusual sales, revenue, and traffic patterns. Then it prompts a large language model to generate an explanation for any detected anomaly and provide recommended actions.

A screenshot of the app

To see an animated gif demonstrating the app, click here.

How to use this cookbook recipe

This recipe is designed to accommodate many different learning styles. It is atomized into sections so you can choose your own adventure:

  • Recipe context: This provides the reason why this recipe was created and the problem that it solves.
  • Recipe: These are the step-by-step instructions for how to set up and run the sample application and explanations of any notable sections of code.
  • Repo and README: Or you can choose to jump right to the repository and follow the README to get the application up and running.

Recipe context: AI could have prevented Amazon’s new stricter return policies

In the summer of 2024, Amazon hit shoppers with a stricter return policy. Gone now are the days of seemingly infinite return windows, lax acceptable return conditions, and the possibility of keeping an incorrect or duplicate item. The stricter return policy is an effort to combat the rise of fraudulent returns in the e-commerce industry.

According to the National Retail Federation almost 14% of returns in 2023 amounted to fraud, resulting in $101 billion in losses for retailers. For Amazon in particular, these return scams had become organized and sophisticated. Scammers were taking advantage of Amazon’s lenient return policies by shipping back junk or nothing at all while racking up the refunds.

With AI, however, companies can leverage the high volume of buyer data they already have to train machine learning models to detect anomalies in sales. In addition to indicating possible fraud, these anomalies can help identify sales opportunities or the efficacy of a particular campaign. Moreover, integrating a large language model and prompting it with detected anomalies can provide quick natural language summarizations of sales outliers as well as generate a recommended course of action.

In other words, AI-powered anomaly detection could have identified strange behaviors in sales data sooner rather than later. This would have enabled Amazon to mitigate the impact of fraudulent returns so the rest of us could still enjoy a little wiggle room when trying to get our return package to the post office.

Follow the recipe below to learn how to create an application that uses AI to learn how to detect anomalies in e-commerce data, provide a natural language possible explanation of those anomalies, and recommend actions to take.

Recipe

Prerequisites

Set up

  1. Clone the repo: git clone https://github.com/liz-acosta/ai-anomaly-detection.git
  2. Change directory to the project directory: cd ai-anomaly-detection
  3. Create the virtual environment: virtualenv venv
  4. Activate the virtual environment: source venv/bin/activate
  5. Install the dependencies: pip install -r requirements.txt

Machine learning workflow

For this recipe, we need to complete a machine learning workflow. A machine learning workflow is the process by which we collect and process data, and then train, evaluate, and deploy a model based on that data.

  1. Generate the sample e-commerce data and output it to a .csv file in the data/ directory: python3 -m utilities.generate_ecommerce_data
  2. Train the anomaly model and output it to a .pkl file in the data/ directory: python3 -m utilities.train_anomaly_model
  3. Detect anomalies and output them to a .csv file in the data/ directory: python3 -m utilities.detect_anomalies

This recipe uses Scikit-learn to train and generate a model based on the e-commerce data. Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. In particular, the code uses the Isolation Forest algorithm to train our model to identify anomalies based on the parameters we provide.

def train_model(data):

    """Train an Isolation Forest model for anomaly detection."""

    # Isolation Forest parameters
    model = IsolationForest(
        n_estimators=100,
        max_samples="auto",
        contamination=0.07,  # Approx. % of anomalies expected
        random_state=42
    )

    model.fit(data)

    return model
Enter fullscreen mode Exit fullscreen mode

💡 Adjust these parameters and repeat steps 2 and 3 to compare how different values affect how many anomalies are detected.

Run the application

This recipe uses Streamlit to create a web app that interactively visualizes the e-commerce data and anomalies, and allows users to prompt OpenAI for an explanation of a particular anomaly. Using the OpenAI API allows us to access pre-trained AI models like GPT-3, DALL-E, and Codex.

  1. Add your OpenAI API key to the .template_env file
  2. Rename the file: mv .template_env .env
  3. Run the Streamlit app: streamlit run app.py
  4. The app should deploy to http://localhost:8501/

Taking a closer look at the code, we can see where we make a call to the OpenAI API with a prompt based on the user specified anomaly. More specifically, the code uses OpenAI’s LLMs to generate a text explanation of the anomaly.

def explain_anomaly_openai(row):
    """Use OpenAI to summarize the anomaly and recommend actions.
    Returns the generated text."""

    # Format the data row as a readable text
    prompt = f"""
    Analyze the following anomaly in e-commerce data and provide a summary along with suggested actions:
    - Date: {row['Date']}
    - Sales: {row['Sales']}
    - Revenue: {row['Revenue']}
    - Traffic: {row['Traffic']}

    The anomaly appears to be abnormal compared to typical patterns. Summarize why this might be happening and suggest actions to address or investigate it.
    """

    # Call the OpenAI API
    openai_client = OpenAI(api_key=OPENAI_API_KEY)

    response = openai_client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "system", "content": "You are a data analyst specialized in e-commerce."},
                  {"role": "user", "content": prompt}],
        temperature=0.7
    )

    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

💡 Adjust these parameters and repeat step 3 to compare how different values affect the text that is generated.

💡 You can use this sample application as a template for other similar applications.

In summary

Congratulations! You now know how to build an AI-powered anomaly detection application tailored for e-commerce analytics. You have explored how unsupervised machine learning techniques – specifically using the Isolation Forest algorithm – can identify unusual patterns in sales, revenue, and traffic data. You’ve also explored how by integrating a large language model like OpenAI's GPT-3, you can generate natural language explanations for detected anomalies and suggest actions.

Top comments (0)