Create account

DEV Community

Justin Poehnelt for Google Workspace Developers

Posted on Mar 12

Comparing AI Models by Generating Apps Script Code

#appsscript #ai #googleworkspace #llm

In expanding the scope of what developer relations means, I sought to find the shortcomings of AI in writing code, specifically Google Apps Script.

jpoehnelt / apps-script

Everything Apps Script

This repository is a monorepo containing tools, projects, and more related to Apps Script.

View on GitHub

Generating Code

I put together a script to call multiple models with different prompts and a website to show the different outputs.

To simplify things, I used the NPM ai package by Vercel.

I then just iterated through the prompts and models saving the outputs to files locally.

There are some things to note here:

Requesting an object with a particular schema using Zod.
A 5000 token limit. (started with 1000, but Claude was verbose)
Temperature is 0.

Visualizing Results

I wanted a better way to see the differences side-by-side, so I created a web app using Svelte at https://apps-script-ai-testing.jpoehnelt.dev! (best on desktop)

Explore the results

Editorial Impressions

As to actual takeaways from this analysis, here are a few:

none of the models used CacheService for memoization
claude-3-7-sonnet is very verbose
only gemini 1.5 pro tried to used an Editor add-on in Gmail (wrong), all other models used CardService (correct)
all of them failed to do a robust UrlFetchApp get of example.com. they all let the non 200 error be caught and then raised again (this usually hides the actual root cause)
gpt-4o-mini decided to sleep with a while loop

Why do this?

As I mentioned earlier, I believe developer relations and the definition of developer is evolving quickly. The term "vibe coding" captures this phenomena by shifting the focus to the intent instead of writing. Either way, as a member of a developer relations team, my goal is to reduce friction from intent to implementation.

Next Steps

Explore creating rules files for the AI LLMs, apps-script.(txt|md).
Improve documentation for my editorial takes, e.g. make sure UrlFetchApp TypeScript types and reference docs are complete.

DEV Community

Comparing AI Models by Generating Apps Script Code

jpoehnelt / apps-script

Everything Apps Script

Generating Code

Visualizing Results

Editorial Impressions

Why do this?

Next Steps

Top comments (0)

Read next

Exploring Cloudflare's AI Gateway: Simplifying AI Integration and Security

Rethinking AI in the Future of QA and Testing

Recipe Generator

The Hidden Truth About ELIZA the Tech World Doesn’t Want You to Know