DEV Community

Designing APIs for humans: Object IDs

Paul Asjes on August 30, 2022

Choosing your ID type Regardless of what type of business you run, you very likely require a database to store important data like custo...

Read full post

Joost Helberg • Sep 1 '22

Enumeration attacks don't exist when using row-level-security. The prefixing is nice though and adds more than just human-readability. Uuids are problematic though, as they don't index very well.
Thanks for suggesting prefixing, I may investigate that for future use; there is a lot to say about it.

Yordis Prieto • Jul 15 '23 • Edited

Hey, thank you so much for such insight. I am wondering about two things.

Do you save the IDs as string in your databases following that format of [object type]_[object id] or do you only save the ID part and add the prefixes at the application layer?
Any reason why you didn't follow URN? Without being dogmatic, just a simple format as [object type]:[object id] rather than [object type]_[object id].

Paul Asjes • Jul 18 '23

We do store the IDs as strings including the prefix in the database. This helps immensely when we're doing things that don't include the application layer, like data analysis.
As TJ Mazeika mentioned, copy and pasting is easier with underscores than with colons :)

Yordis Prieto • Aug 5 '23

Do you do any optimization for those keys as primary keys?

TJ Mazeika • Jul 17 '23

Regarding 2, I'm going to assume that it's because the latter is easier to copy and paste. Try double clicking cus:123 vs. cus_123.

fillon • Sep 15 '22

Very good article. It should ease support when all your tables use UUID and trying to debug.

On the implementation part, how do you store the ids in a table (PK)?

Do you store the prefix_ in the table or just the part and handle the prefix outside the database?

Paul Asjes • Sep 20 '22

It of course depends on your implementation and how you organise your database, but I'd just use the ID including prefix as the primary key. You could separate it into multiple columns, but that just introduces potential fail states where you accidentally use the randomised part without the prefix.

David Mair Spiess • Sep 15 '22 • Edited

Very interesting, thank you for the insights!
How do you decide how long a generated ID should be?
I noticed, that some stripe IDs are longer than others.

For example
customers: cus_MNlbRsTWfvcJ01
payments: ch_3AhqJiJdgChykuGw0S2YVeil

Does this mean you guess for each ressource the probability of a collision separately?
Did you ever need to increase the ID length for a specific resource after some time?
How do you store this ID efficient in your database? Do you use it as primary key or do you have a separate internal unique identifier?

Paul Asjes • Aug 9 '23

Excellent questions! There is some additional magic that goes into the generated part of the ID. Long story short, we use part of the ID for database sharding. Some resource IDs are indeed longer than others, this is mainly to avoid collisions for resources that we expect to have a lot more of.

We have in the past increased the length of IDs. One example that comes to mind are API keys, which we changed to be up to 255 characters in length.

We don't use the ID as the primary key, as mentioned before we do some sharding magic with the exposed ID so the internal ID is a little different.

Michael Fecher • Aug 2 '23

very good question, was asking myself the same when i read the article.

let's try the tag to notify some moderators from stripe to get their attention. :D

stripe

Dávid Szabó • Jul 24 '23

I'd be really interested in this, unfortunately, as far as I see Paul didn't answer your questions. @paulasjes I'm really hoping you have a few minutes to answer these questions. Thank you!

OmegaRogue • Sep 19 '22

Snowflakes are similar in that they dont collide, but are completly numerical and the generation method involves the unix time stamp, meaning that by sorting them in ascending order you still get the same benefits you get from using sequential integers

Michael Fecher • Aug 2 '23

Another question regarding "exposure" of those IDs to REST APIs.
Officially, the underscore isn't supported in URLs (same as for colon).
Why did you chose it anway?
I'd rather go for a dash than for underscore.

Paul Asjes • Aug 9 '23

Ease of use mainly, specifically for copy and pasting. Try double clicking on "pi-123" and "pi_123" to see the difference.

Ngonidzashe Mangudya • Sep 7 '22

Interesting 👌Whats the best way of choosing the prefix itself?

Paul Asjes • Sep 7 '22 • Edited

Here's my recommendation:

Plan your prefixes. Have an internal style guide for how to name objects. If you don't, you end up with inconsistent schemes. For example if you had a bank account object you could do:

ba_

or

bankacct_

Either is fine as long as you're consistent with all your objects.
Remember your audience. Whether the object is public or internal only the intended audience is still an engineer. Your prefix should be obvious to anyone even if they don't have the necessary context. We made this mistake with PaymentIntents and SetupIntents:

pi_

and

seti_

(notice how they aren't consistent)

If we could go back and redo those we'd name them payint_ and setint_ respectively. Slightly longer prefixes make understanding them much easier. You might have heard of PaymentIntents but you might not connect the dots with pi_, but you likely will with payint_.

Ngonidzashe Mangudya • Dec 24 '22

Thank you.

Juan Esteban Garcia • Sep 17 '22

Paul, thanks for sharing this article - it really made me think a lot about the way our existing API is designed. I have one question for your and I'd appreciate your insights... how would you go about implementing this ID format in an existing API with hundreds of users? Any guidance would be highly appreciated.

THANK YOU.

Paul Asjes • Sep 20 '22

That's definitely tricky, but I'd just bite the bullet and start using prefixes for all new objects first. Downside is that you'd have a world where you have a mixture of both IDs, but hopefully over time the prefixed IDs would become dominant.

You could run a migration to add prefixes to older IDs, but you'd have to make sure that users of your API can still use the old IDs without a prefix to ensure backwards compatibility.

Juan Esteban Garcia • Oct 16 '22

This is helpful, Paul. Thank you so much.

chris damour • Sep 21 '22 • Edited

Exposing an id directly outside your app domain is fine for json-rpc.

RESTful practice is to always expose “ids” as urls, then any client can fetch that resource and know what actions can be taken on it. You change your http services, you 301 the old href to the new and the client updates all its references. It works beautifully, and its so simple to make that leap. The primary reason clients want an id is to plug it into some spot in another services url..but if you respond with that services href in your initial response they never need to do that plugging to begin with.

Then if youre really worried about replay attacks or enumerating your hypermedia controls hrefs can use temporary urls that only work for a window of time from a specific client (eg you append a signed jwt to em, with a client id/ip present). The href is opaque to the honest client, and the nefarious client cant hack the http request by just changing one part of it. And if the honest client takes to long, your 302 em to an auth challenge. Trust but validate with zero trust! But none of these things are options in json-rpc which is what you seem to be working with…too bad

Mario DeSousa • Mar 30 '23

Great article Paul! These are great insights on the benefits of the IDs with a prefix.
I was wondering if you ever faced any challenges where the business team decides to rename a certain object... as an example, changing "customer" to "client". In this case the IDs that start with "cust_" would lose the meaning, and possibly cause confusion?

has stripe ever faced this? how was it resolved? just change the IDs going forward and leave legacy IDs as they are?

also, have you ever faced issues with changes to the prefix breaking code for your developers? for example, if their code expects an id with "cust_" and suddenly starts receive an id with "cli_"... has this been a problem?

Paul Asjes • Aug 9 '23

It's tricky for sure. As a general rule once an object is named we don't ever rename it. We certainly would never start returning unexpected IDs without some sort of initial outreach to users. In your example, instead of changing "customer" to "client" we'd probably have those as two separate resources initially and deprecate the older one over time, whilst keeping the new resource backwards compatible with the older IDs. That way we don't ever unintentionally break anyone's integration.

Naming things is hard and we get it wrong sometimes too. One recent example I can think of is that we used pi_ for PaymentIntents. Later on we introduced SetupIntents, but couldn't use the prefix si_ as that was already being used for Subscription Items. We ended up in a world where we use pi_ and seti_, which is confusing as conceptually those are two similar objects.

We learned that when choosing prefixes it's better to lean on the more verbose side to be clear and to avoid future naming collisions. If we could go back and redo it I think we'd probably end up going with payint_ and setint_ for PaymentIntents and SetupIntents respectively.

timothyokooboh • Jul 17 '23

Awesome article! Thank you for the insights.

What about combining prefixes with uuid?
That is either [resource_type]_[uuid] or using the URN spec syntax [resource_type]:[uuid]

Paul Asjes • Jul 18 '23

You could certainly combine prefixes with UUIDs, we don't at Stripe because we also use database sharding. The IDs we generate have a shard key baked into them for faster lookup.

As for URN, we opted to use underscores rather than colons as it makes for easier copy and pasting (try double clicking on cus:123 versus cus_123).

Gaurav • Jun 14 • Edited

Hi @paulasjes Do you generate the IDs in the application layer in that case? Do you only use the prefix object IDs as external IDs or as actual primary key identifiers in the DB? I think for many use cases it is preferred to have DB generated IDs (say via a DB generator function). For example, if you need to use INSERT INTO SELECT to copy rows but with new IDs. It would be inefficient to engage the application layer for such tasks, simply because the ID generation is dependent on the application rather than the underlying database.

Also, are your prefixed IDs sortable? In many cases it is useful to have IDs as a secondary sort factor for deterministic sort and pagination.

I am guessing Stripe's prefixed object IDs are external resource IDs used in URLs but not DB primary keys for these two reasons. But I am more curious whether these are DB generated or application code generated. Please clarify.

And thank you for the wonderful write-up!

timothyokooboh • Jul 18 '23

Awesome! Thanks for the reply.

Abhik Banerjee • Aug 13 '23

This is a very interesting read indeed! Learned something new. Hopefully will incorporate in our company's next project.