DEV Community

Cover image for How I re-categories 14K products using plang programming language in 2 hours
Ingi
Ingi

Posted on

How I re-categories 14K products using plang programming language in 2 hours

This article is about a new programming language called plang (as in pseudo language). You can find it at plang.is

The plang language allows you to write your code in natural language, it shortens the development time considerably, and it works very well for people who are constantly writing scripts and need to work with data. I am the creator, so this is my experience with it.

While developing plang, I took a side project for a company that I know pretty well.

One issue is that they have 1-n relationship with products → categories, which makes sense for a website, because you want to display a product in multiple categories, but when you want to analyze the sales this creates bad data, as each product is calculated multiple times. The category needs to be 1-1 in the database.

This company has 14K products, so going through each product and giving it a category is going to take dozens of hours, multiple people and thousands of dollars.

I will show you how I did it in couple of hours for about $20 using plang.

This is the plan. Export all the product with current categories (lets call these website categories) to csv and ask the LLM to assign it a new main category from a predefined list of categories.

I defined a list of main categories with the employees at the company, this is how it looks (shorted)

# main categories #
id=100 - Health->Other
id=101 - Health->Vitamins
id=102 - Health->Exercise Equipment
id=103 - Health->Health Treatments
id=104 - Cosmetics->Skin
id=105 - Cosmetics->Hair
.....
# main categories #

Now I go to the database and select all the products with categories that they are assigned to. The sql is something like this

select p.name as product_name, group_concat(c2.name) as parent_categories, group_concat(c.name) as sub_categories from products p join products_categories pc on pc.product_id=p.id join categories c on …….

I execute this query and export the data to data.csv. I use DBeaver as my sql tool, they provide export function. Now I have all 14K products with name and parent and sub categories

Now lets go to the OpenAI playground, select the latest model and start to construct the system command.

Lets call it system.txt. We will use system.txt later in our plang code.

==== system.txt ===
I have products in db, each product is in what I call "website category".

The mapping in db is 1-n.

I am creating "main category" in the db and need to map products to a main category, this mapping must be 1-1.

You will be provided with list of products and list of their website categories.

Your job is to map a product to main category.

Use the name (this is product name), parent_categories and sub_categories to determine main category for each product

The main category is defined and can only be mapped to the defined main categories.

# main categories #

id=100 - Health->Other
id=101 - Health->Vitamins
id=102 - Health->Exercise Equipment
id=103 - Health->Health Treatments
id=104 - Cosmetics->Skin
id=105 - Cosmetics->Hair
..... I list all main categories, it is just shorted for this post

# main categories #

return csv in format; product_id;main_category_id
==== system.txt ===

In the user section, I paste in few of my product from data.csv and see if I am getting satisfying results.

After couple test with the system command, I am getting the results I want. Excellent. Lets move on to PLang.

This is the code I wrote in plang, you should be able to understand what it does by simply reading it.

=== MapCategory.goal ===

MapCategory

  • read file system.txt, write to %system%
  • read csv file data/data.csv, delimiter=; write to %products% / we must split up the products since LLM cannot handle 14K products / I decided to go with 200 products at a time
  • [code] split %products% into multiple list, each with 200 rows write to %splitProducts%
  • foreach %splitProducts%, call !Process item=%items%, position=%splitPos%

Process

  • [code] check if file 'data/%splitPos%.csv' exists, write to %exists%(bool)
  • if %exists% is false, then ProcessItem

ProcessItem

  • write out 'doing nr %splitPos% of %listCount%'
  • [llm] system: %system% user:%items% scheme: 'product_id;main_category_id' respones type: csv model:'gpt-4-0125-preview' write to %mapping%
  • write out 'Writing %splitPos% to file'
  • write %mapping% to file data/%splitPos%.csv, delimiter=; include header === MapCategory.goal ===

I run the code, and about 30 minutes later I have 70 csv files, each containing 200 products.

I now need to create the SQL statement to update the products in database, it should be something like this for each product

UPDATE products SET main_category_id=@categoryId WHERE id=@id

This is probably not the fastest way to update the database, but it doesn’t matter, I will run it only once.

So my next step is to create a new plang code, that reads all 70 csv files and creates this UPDATE statement

=== MapMain.goal ===
MapMain

  • get path of all .csv in data folder, write to %paths%
  • set var %sql% = ''
  • foreach %paths%, call !Process item=%path%
  • write %sql% to sql/update.sql

Process

  • read file %path%, write to %content%
  • [code] split on new line in %content%(string), skip first line, write to %lines%
  • foreach %lines%, call AppendSql item=%line%

AppendSql

  • if %line% is not empty
    • [code] split %line%(string) on ';', write to %columns%
    • append to var %sql%, "UPDATE products SET main_category_id=%columns[2]% WHERE id=%columns[1]%\n" === MapMain.goal ===

That is it, I now have mapped all 14K products to a main category, a task that would have been incredibly expensive to do manually, with higher error rate.

Total cost of about $20 + 1 hour developer work.

You can find the github repo here, https://github.com/ingig/MapCategories

If you are interested in learning how plang works, there is great tutorial that guides you through the first steps found in the plang Github repo, https://github.com/PLangHQ/plang/blob/main/Documentation/Todo_webservice.md

Top comments (0)