Jonathan Harel for Fine

Posted on Aug 10, 2023 • Edited on Sep 13, 2023 • Originally published at fine.dev

Using AI Agents To Get Started Quickly On Kaggle

Discover how to overcome the repetitive setup process when working on Kaggle projects. This blog post guides you through building an AI agent to streamline PyTorch training pipelines. Learn to define steps, automate data preprocessing, and generate training loops effortlessly.

If you've spent time on Kaggle, you're likely familiar with the initial hurdles of setting up your first lines of code. The process often involves repetitive tasks, from loading and preprocessing the data to defining the model architecture, selecting loss functions, setting up optimizers, and running training loops.

While these steps are essential, they can be quite boilerplate in nature, demanding your attention each time you embark on a new project. This repetitive process not only consumes time but also requires effort that could otherwise be directed toward the core creative aspects of your project – designing innovative models, exploring unique data insights, and pushing the boundaries of what AI can achieve.

Today, I'm excited to share my journey of creating an AI agent that helps me get started quickly on Kaggle competitions by building a training pipeline in PyTorch for me. If you're new to the world of deep learning and data science, don't worry – I'll guide you through the process step by step. Let's dive in and embark on this exciting adventure together!

Step 1: Gathering Resources

Before we start coding, let's gather the tools we'll need for our journey:

Python and PyTorch: Make sure you have Python installed (version 3.6 or higher). To install PyTorch, head over to the official website and choose the version compatible with your system.
Kaggle Account: If you don't have one, create an account on Kaggle. It's an incredible platform filled with datasets and competitions that you can explore.
Fine Account: Similarly, create an account on Fine if you don’t have one. File lets you build, deploy and run agents quickly and easily.
Fine’s CLI tool: Install the CLI tool that will allow the agents to operate. You can do it by running npm i @fine-dev/cli

Step 2: Choose Your Dataset

For our first training pipeline, let's select a dataset from Kaggle. Choose something that interests you – whether it's images, text, or tabular data. Once you've found the perfect dataset, download it and unzip the files to a dedicated folder on your machine.

In this tutorial I will use the famous Titanic dataset.

Step 3: Building the Agent

It's time to introduce some automation magic into our process. Enter the agent – a trusty companion that will carry out our predefined tasks, sparing us the repetitive setup and execution steps.

In this step, I'll guide you through the process of building your agent using a workflow.yaml file. This file will serve as a roadmap for our agent, detailing the sequence of tasks it should perform. It's like providing your agent with a to-do list that it can follow diligently.

Let's get started:

Create a workflow File

In your project directory, create a new file named workflow.yaml. This is where we'll define the steps that our agent will execute.

Define Steps

Next, inside workflow.yaml give your agent an id, a name, and an identity, followed by the agent's tasks as a list of steps. Each step will have a name, id, and a list of commands:

Our workflow.yaml sets up an agent to locate a .csv file, read its content, keep the first row, and then generate a PyTorch training loop that predicts the most suitable column in the CSV. It ensures that we follow Python and data science best practices throughout the process.

Step 4: Deploying the Agent

Now that we have our agent’s workflow defined, it’s time to deploy it to Fine. To do so, follow these steps:
‍

Login to Fine

First, we need to connect our local environment to our Fine account. Run the command:

$ fine-dev login

Deploy the agent

Run the following commands from your project’s directory:

$ fine-dev deploy -p workflow.yaml

Step 5: Running the Agent

After building and deploying our agent, it’s time to take it for a spin! To do it, we need to do two things:

Download the dataset of choice

I will be using the Titanic dataset, so I downloaded the train.csv file and put in my project's directory.

Run the proxy

To allow the agent to operate, we need to set up the proxy. Run the following command from your project's directory:

$ fine-dev proxy

Run the Agent

From the Fine web interface, open a notebook and hit ctrl+p to open the agent palette. If everything went well, you should find our PyTorch Starter Agent waiting for you in the list. Select it and our agent will start running!

Step 6: Validate the Results

‍Our agent has finished running, let’s see what it created for us:

In the file src/main.py we can find the following code:

Nice! That's a great starter for our Kaggle competition.

Step 7: Celebrate Your Achievement!

Congratulations! You've successfully set up your first training pipeline in PyTorch using data from Kaggle. This is just the beginning of your journey into the exciting world of AI and machine learning.

Remember, every great accomplishment starts with a single step. Keep learning, experimenting, and don't be afraid to ask questions. Happy coding and may your AI adventures be filled with curiosity and discovery! 🚀🧠

DEV Community