DEV Community

Cover image for Exploring Machine Learning for Credit Card Fraud Detection
Hilda Ogamba
Hilda Ogamba

Posted on

Exploring Machine Learning for Credit Card Fraud Detection

Today, I would like to share some key takeaways from a fascinating project I recently worked on with my team as part of the CIS 635: Knowledge Discovery and Data Mining course, supervised by Professor Zhuang. We tackled a problem that's not only technically challenging but also incredibly relevant: credit card fraud detection. Here's an informal breakdown of what we learned, explored, and achieved during this journey.


Why Fraud Detection?

Machine learning in finance has always intrigued me. With so much at stake in the real world—billions of dollars in losses from fraud annually—I wanted to see if we could design a system that could actually make a difference. When I started this project, I wondered how well our machine learning models could perform on such a critical problem in real-world finance. The rare nature of fraud in datasets posed an added challenge, making this a perfect opportunity to experiment with advanced techniques and evaluate their practical potential.


A Little Disclaimer

Before diving deeper, I should mention that I’m not really a data scientist. My background is more in software engineering and cybersecurity, but I love exploring new challenges. This project gave me a chance to dip my toes into data science concepts like model evaluation, handling class imbalance, and working with algorithms like Random Forest and XGBoost. It’s been a fun learning experience, and I’m excited to share what I’ve learned!


Our Workflow at a Glance

Our approach followed a robust pipeline:

  1. Data Preprocessing: We scaled numerical features (like transaction amounts) and handled the class imbalance using the Synthetic Minority Over-sampling Technique (SMOTE). This allowed us to generate synthetic samples for the minority (fraud) class, enabling more balanced model training.

Class distribution before and after SMOTE

  1. Model Selection: We tried out several machine learning algorithms, including:

    • Logistic Regression (a simple and interpretable baseline)
    • Random Forest
    • XGBoost (Extreme Gradient Boosting)
    • Neural Networks
    • Ensemble Learning (combining Random Forest and XGBoost)
  2. Evaluation Metrics: Accuracy, while commonly used, tends to be influenced by the majority class (non-fraudulent transactions), making it a less valuable metric in highly imbalanced datasets like ours. For a problem like fraud detection, metrics such as Precision, Recall, F1-score, and ROC-AUC are more critical than plain accuracy. After all, catching fraud is about finding a balance between false positives (flagging legitimate transactions as fraud) and false negatives (missing actual fraud).

  3. Tools Used: We ran all experiments and analyses on Google Colab, which made it easy to leverage GPU acceleration for training our models. Its collaborative environment also helped streamline our teamwork by allowing us to share and iterate on notebooks efficiently.


Key Highlights from Our Results

  • Logistic Regression was a great starting point, achieving a ROC-AUC score of 0.9825. However, it struggled with precision, generating too many false positives.

Logistic Regression Precision-Recall and ROC Curve

  • Random Forest offered a balanced performance with a precision of 0.80 and a recall of 0.88, making it a robust choice for real-world deployment.

Random Forest Precision-Recall and ROC Curve

  • XGBoost, true to its reputation, slightly outperformed Random Forest with a ROC-AUC score of 0.9911, capturing complex patterns in the data.

XGBoost Precision-Recall and ROC Curve

  • Neural Networks excelled in recall (0.90) but suffered from a high false positive rate, making it less ideal for scenarios where precision matters.

Neural Network Precision-Recall and ROC Curve

  • Ensemble Learning was the star of the show, combining Random Forest and XGBoost for a ROC-AUC score of 0.9912 and delivering balanced precision (0.80) and recall (0.87).

Ensemble Voting Classifier Precision-Recall and ROC Curve


Comparison of Models

To better illustrate the performance of each model, here's a summary table comparing key metrics like precision, recall, F1-score, ROC-AUC, and AUPRC for each classifier:

Model Precision Recall F1 Score ROC-AUC AUPRC Key Insights
Logistic Regression 0.14 0.93 0.24 0.9825 0.8086 High recall but low precision, leading to many false positives. Suitable for initial screening.
Random Forest 0.80 0.88 0.84 0.9900 0.881 Strong balance of precision and recall with high reliability. Robust for operational environments.
XGBoost 0.78 0.86 0.82 0.9911 0.8858 Slightly better ROC-AUC and AUPRC than Random Forest, effectively captures complex interactions.
Neural Network 0.18 0.90 0.30 0.9684 0.823 High recall but poor precision, leading to frequent false positives.
Ensemble (Voting) 0.80 0.87 0.83 0.9912 0.8812 Combines strengths of Random Forest and XGBoost, achieving a balanced and robust performance.

Comparison table here, showing metrics for Logistic Regression, Random Forest, XGBoost, Neural Networks, and Ensemble Learning.


Challenges We Encountered

  1. Class Imbalance: Fraudulent transactions are rare, and even with SMOTE, striking the right balance without overfitting was tricky.
  2. Interpretable Features: The dataset's anonymized features (due to PCA transformation) limited our ability to interpret the underlying drivers of fraud.
  3. Computational Costs: Hyperparameter tuning across multiple models was resource-intensive. Although Google Colab helped a lot, some configurations took over 7 hours to run!

Why This Matters to Me

This project wasn’t just about fulfilling a course requirement—it was a chance to test how theoretical machine-learning models could hold up in the world of finance. Fraud detection isn't just a fascinating technical problem; it’s also deeply impactful. Testing this on real-world data gave me insights into the nuances of applying ML in high-stakes industries. It was thrilling to see how even basic techniques like Logistic Regression could provide value while advanced models like XGBoost and ensemble methods added an extra layer of sophistication.


Future Directions

Fraud patterns evolve, and our models need to keep up. Here's what we'd love to explore further:

  • Incorporating unsupervised learning methods for detecting novel fraud patterns.
  • Using Explainable AI (XAI) to make fraud predictions more transparent and trustworthy.
  • Enhancing the pipeline for real-time fraud detection by incorporating temporal and geospatial data.

Final Thoughts

This project reaffirmed my belief in the power of collaboration and the importance of domain-specific challenges in shaping technical solutions. Fraud detection is just one application of machine learning where technology can have a real-world impact. Thank you to my teammates, Lynn and Joyce for all your contributions.

I’d love to hear your thoughts! Have you worked on similar problems or encountered challenges with imbalanced datasets? Let’s chat in the comments.


Resources:

Top comments (0)