DEV Community

Cover image for Designing APIs for humans: Object IDs
Paul Asjes for Stripe

Posted on • Updated on

Designing APIs for humans: Object IDs

Choosing your ID type

Regardless of what type of business you run, you very likely require a database to store important data like customer information or status of orders. Storing is just one part of it though; you also need a way to swiftly retrieve the data – which is where IDs come in.

Also known as a primary key, IDs are what you use to uniquely specify a row in a table. When designing your table, you want a system where your IDs are easy to generate, unique and human readable.

The most simplistic approach you might take to IDs when using a relational database is to use the row ID, which is an integer. The idea here is that whenever you add a new row (i.e. a new customer is created) the ID would be the next sequential number. This sounds like a nice idea since it makes it easy to discuss in conversation (“Order 56 is having problems, can you take a look?”), plus it takes no work to set up. In practice however this is a security nightmare waiting to happen. Using integer IDs leaves you wide open to enumeration attacks, where it becomes trivially easy for malicious actors to guess IDs that they should not be able to since your IDs are sequential.

For example, if I sign up to your service and discover that my user ID is “42”, then I could make an educated guess that a user with the ID of “41” exists. Armed with that knowledge, I might be able to obtain sensitive data on user “41” that I absolutely shouldn’t be allowed to, for instance an unsecured API endpoint like /api/customers/:id/. If the ID is something I can’t guess, then exploiting that endpoint becomes a lot harder.

Integer IDs also mean you are likely leaking some very sensitive information about your company, like the size and success based on the number of customers and orders you have. After signing up and seeing that I’m only user number 42, I might doubt any claims you make in terms of how big your operation is.

Instead you need to ensure that your IDs are unique and impossible to guess.

A much better candidate for IDs is the Universally Unique Identifier, or UUID. It’s a 32 digit mix of alphanumeric characters (and therefore stored as a string). Here’s an example of one:

4c4a82ed-a3e1-4c56-aa0a-26962ddd0425

It’s fast to generate, widely adopted, and collisions (the chance of a newly generated UUID having occurred before, or will occur in the future) are so vanishingly rare that it is considered one of the best ways to uniquely identify objects for your systems where uniqueness is important.

On the other hand, here’s a Stripe object ID:

pi_3LKQhvGUcADgqoEM3bh6pslE

Ever wondered why Stripe uses this format specifically? Let’s dive in and break down how and why Stripe IDs are structured the way they are.

Make it human readable



pi_3LKQhvGUcADgqoEM3bh6pslE
└─┘└──────────────────────┘
 └─ Prefix    └─ Randomly generated characters


Enter fullscreen mode Exit fullscreen mode

You might have noticed that all Stripe Objects have a prefix at the beginning of the ID. The reason for this is quite simple: adding a prefix makes the ID human readable. Without knowing anything else about the ID we can immediately confirm that we’re talking about a PaymentIntent object here, thanks to the pi_ prefix.

When you create a PaymentIntent via the API, you actually create or reference several other objects, including the Customer (cus_), PaymentMethod (pm_) and Charge (ch_). With prefixes you can immediately differentiate all these different objects at just a glance:



$pi = $stripe->paymentIntents->create([
  'amount' => 1000,
  'currency' => 'usd',
  'customer' => 'cus_MJA953cFzEuO1z',
  'payment_method' => 'pm_1LaXpKGUcADgqoEMl0Cx0Ygg',
]);


Enter fullscreen mode Exit fullscreen mode

This helps Stripe employees internally just as much as it helps developers integrating with Stripe. For example, here’s a code snippet I’ve seen before when asked to help debug an integration:



$pi = $stripe->paymentIntents->retrieve(
  $id,
  [],
  ['stripe_account' => 'cus_1KrJdMGUcADgqoEM']
);


Enter fullscreen mode Exit fullscreen mode

The above snippet is trying to retrieve a PaymentIntent from a connected account, however without even looking at the code you can immediately spot the error: a Customer ID (cus_) is being used instead of an Account ID (acct_). Without prefixes this would be much harder to debug; if Stripe used UUIDs instead then we’d have to look up the ID (probably in the Stripe Dashboard) to find out what kind of object it is and if it’s even valid.

At Stripe we’ve gone so far as to develop an internal browser extension to automatically look up Stripe Objects based on their ID. Because we can infer the object type by the prefix, triple clicking on an ID automatically opens up the relevant internal page, making debugging so much easier.

Polymorphic lookups

Speaking of inferring object types, this is especially relevant when designing APIs with backwards compatibility in mind.

When creating a PaymentIntent, you can optionally provide a payment_method parameter to indicate what type of payment instrument you’d like to use. You might not know that you can actually choose to provide a Source (src_) or Card (card_) ID instead of a PaymentMethod (pm_) ID here. PaymentMethods replaced Sources and Cards as the canonical way to represent a payment instrument within Stripe, yet for backwards compatibility reasons we still need to be able to support these older objects.



$pi = $stripe->paymentIntents->create([
  'amount' => 1000,
  'currency' => 'usd',
  // This could be a PaymentMethod, Card or Source ID
  'payment_method' => 'card_1LaRQ7GUcADgqoEMV11wEUxU',
]);


Enter fullscreen mode Exit fullscreen mode

Without prefixes, we’d have no way of knowing what kind of object the ID represents, meaning we don’t know which table to query for the object data. Querying every single table to find one ID is extremely inefficient, so we need a better method. One way could be to require an additional “type” parameter:



$pi = $stripe->paymentIntents->create([
  'amount' => 1000,
  'currency' => 'usd',
  // Without prefixes, we'd have to supply a 'type'
  'payment_method' => [
    'type' => 'card',
    'id' => '1LaRQ7GUcADgqoEMV11wEUxU'
  ],
]);


Enter fullscreen mode Exit fullscreen mode

This would work, but this complicates our API with no additional gain. Rather than payment_method being a simple string, it’s now a hash. Plus there’s no additional information here that can’t be combined into a single string. Whenever you use an ID, you’ll want to know what type of object it represents, making combining these two types of information into one source a much better solution than requiring additional “type” parameters.

With a prefix we can immediately infer whether the payment instrument is one of PaymentMethod, Source or Card and know which table to query despite these being completely different types of objects.

Preventing human error

There are other less obvious benefits of prefixing, one being the ease of working with IDs when you can infer their type from the first few characters. For example, on the Stripe Discord server we use Discord’s AutoMod feature to automatically flag and block messages that contain a Stripe live secret API key, which starts with sk_live_. Leaking such a sensitive key could have drastic consequences for your business, so we take steps to avoid this happening in the environments that we control.

By having keys start with sk_live_, writing a regex to filter out accidental leaks is trivial:

Using Discord's AutoMod tool to prevent secret key leaks

This way we can prevent secret live API keys from leaking in our Discord, but allow the posting of test keys in the format sk_test_123 (although you should absolutely keep those secret as well).

Speaking of API keys, the live and test prefixes are a built-in layer of protection to guard you against mixing up the two. For the especially security aware, you could go even further and set up checks to make sure you’re only using the key for the appropriate environment:



if (preg_match("/sk_live/i", $_ENV["STRIPE_SECRET_API_KEY"])) {
  echo "Live key detected! Aborting!";
  return;
}

echo "Proceeding in test mode";


Enter fullscreen mode Exit fullscreen mode

Stripe has been using this prefixing technique since 2012, and as far as I know, we’re the first ones to implement it at scale. (Is this incorrect? Let me know in the comments below!). Before 2012, all Object IDs at Stripe looked more like traditional UUIDs. If you were an early Stripe adopter you might notice that your account ID still looks like this, without the prefix.

Edit: The IETF beat Stripe to the idea by a number of years with the URN spec. Are you using the URN format in your work? Let me know!

Designing APIs for humans

The anatomy of a Stripe ID is mostly influenced by our desire to design APIs for the human developers who need to integrate them. Computers generally don’t care about what an ID looks like, as long as it’s unique. The humans that develop using those IDs do care very much though, which is why we put a lot of effort into the developer experience of our API.

Hopefully this article has convinced you of the benefits of prefixing your IDs. If you’re curious on how to effectively implement them (and happen to be working in Ruby), Chris Oliver built a gem that makes adding this to your systems trivial.

About the author

Paul Asjes

Paul Asjes is a Developer Advocate at Stripe where he writes, codes and hosts a monthly Q&A series talking to developers. Outside of work he enjoys brewing beer, making biltong and losing to his son in Mario Kart.

Top comments (29)

Collapse
 
jhelberg profile image
Joost Helberg

Enumeration attacks don't exist when using row-level-security. The prefixing is nice though and adds more than just human-readability. Uuids are problematic though, as they don't index very well.
Thanks for suggesting prefixing, I may investigate that for future use; there is a lot to say about it.

Collapse
 
yordisprieto profile image
Yordis Prieto • Edited

Hey, thank you so much for such insight. I am wondering about two things.

  1. Do you save the IDs as string in your databases following that format of [object type]_[object id] or do you only save the ID part and add the prefixes at the application layer?

  2. Any reason why you didn't follow URN? Without being dogmatic, just a simple format as [object type]:[object id] rather than [object type]_[object id].

Collapse
 
paulasjes profile image
Paul Asjes
  1. We do store the IDs as strings including the prefix in the database. This helps immensely when we're doing things that don't include the application layer, like data analysis.

  2. As TJ Mazeika mentioned, copy and pasting is easier with underscores than with colons :)

Collapse
 
yordisprieto profile image
Yordis Prieto

Hey @paulasjes, I'm coming back to this after a while. After reading the whole article and the comments, I read dev.to/paulasjes/comment/28ecl, and I am a bit confused now.

You said in that comment: "We don't use the ID as the primary key, as mentioned before we do some sharding magic with the exposed ID so the internal ID is a little different."

But your comment here says: "We do store the IDs as strings, including the prefix in the database."

It seems to conflict with each other; I am wondering which one it is.

I appreciate any help you can provide.

Collapse
 
yordisprieto profile image
Yordis Prieto

Do you do any optimization for those keys as primary keys?

Collapse
 
tmazeika profile image
TJ Mazeika

Regarding 2, I'm going to assume that it's because the latter is easier to copy and paste. Try double clicking cus:123 vs. cus_123.

Collapse
 
davidspiess profile image
David Mair Spiess • Edited

Very interesting, thank you for the insights!
How do you decide how long a generated ID should be?
I noticed, that some stripe IDs are longer than others.

For example
customers: cus_MNlbRsTWfvcJ01
payments: ch_3AhqJiJdgChykuGw0S2YVeil

Does this mean you guess for each ressource the probability of a collision separately?
Did you ever need to increase the ID length for a specific resource after some time?
How do you store this ID efficient in your database? Do you use it as primary key or do you have a separate internal unique identifier?

Collapse
 
paulasjes profile image
Paul Asjes

Excellent questions! There is some additional magic that goes into the generated part of the ID. Long story short, we use part of the ID for database sharding. Some resource IDs are indeed longer than others, this is mainly to avoid collisions for resources that we expect to have a lot more of.

We have in the past increased the length of IDs. One example that comes to mind are API keys, which we changed to be up to 255 characters in length.

We don't use the ID as the primary key, as mentioned before we do some sharding magic with the exposed ID so the internal ID is a little different.

Collapse
 
michaelfecher profile image
Michael Fecher

very good question, was asking myself the same when i read the article.

let's try the tag to notify some moderators from stripe to get their attention. :D

stripe

Collapse
 
davidszabo97 profile image
Dávid Szabó

I'd be really interested in this, unfortunately, as far as I see Paul didn't answer your questions. @paulasjes I'm really hoping you have a few minutes to answer these questions. Thank you!

Collapse
 
fillon profile image
fillon

Very good article. It should ease support when all your tables use UUID and trying to debug.

On the implementation part, how do you store the ids in a table (PK)?

Do you store the prefix_ in the table or just the part and handle the prefix outside the database?

Collapse
 
paulasjes profile image
Paul Asjes

It of course depends on your implementation and how you organise your database, but I'd just use the ID including prefix as the primary key. You could separate it into multiple columns, but that just introduces potential fail states where you accidentally use the randomised part without the prefix.

Collapse
 
omegarogue profile image
OmegaRogue

Snowflakes are similar in that they dont collide, but are completly numerical and the generation method involves the unix time stamp, meaning that by sorting them in ascending order you still get the same benefits you get from using sequential integers

Collapse
 
michaelfecher profile image
Michael Fecher

Another question regarding "exposure" of those IDs to REST APIs.
Officially, the underscore isn't supported in URLs (same as for colon).
Why did you chose it anway?
I'd rather go for a dash than for underscore.

Collapse
 
paulasjes profile image
Paul Asjes

Ease of use mainly, specifically for copy and pasting. Try double clicking on "pi-123" and "pi_123" to see the difference.

Collapse
 
iamngoni profile image
Ngonidzashe Mangudya

Interesting 👌Whats the best way of choosing the prefix itself?

Collapse
 
paulasjes profile image
Paul Asjes • Edited

Here's my recommendation:

  1. Plan your prefixes. Have an internal style guide for how to name objects. If you don't, you end up with inconsistent schemes. For example if you had a bank account object you could do:

    ba_

    or

    bankacct_

    Either is fine as long as you're consistent with all your objects.

  2. Remember your audience. Whether the object is public or internal only the intended audience is still an engineer. Your prefix should be obvious to anyone even if they don't have the necessary context. We made this mistake with PaymentIntents and SetupIntents:

    pi_

    and

    seti_

    (notice how they aren't consistent)

    If we could go back and redo those we'd name them payint_ and setint_ respectively. Slightly longer prefixes make understanding them much easier. You might have heard of PaymentIntents but you might not connect the dots with pi_, but you likely will with payint_.

Collapse
 
iamngoni profile image
Ngonidzashe Mangudya

Thank you.

Collapse
 
juan_es_teban profile image
Juan Esteban Garcia

Paul, thanks for sharing this article - it really made me think a lot about the way our existing API is designed. I have one question for your and I'd appreciate your insights... how would you go about implementing this ID format in an existing API with hundreds of users? Any guidance would be highly appreciated.

THANK YOU.

Collapse
 
paulasjes profile image
Paul Asjes

That's definitely tricky, but I'd just bite the bullet and start using prefixes for all new objects first. Downside is that you'd have a world where you have a mixture of both IDs, but hopefully over time the prefixed IDs would become dominant.

You could run a migration to add prefixes to older IDs, but you'd have to make sure that users of your API can still use the old IDs without a prefix to ensure backwards compatibility.

Collapse
 
juan_es_teban profile image
Juan Esteban Garcia

This is helpful, Paul. Thank you so much.

Collapse
 
drdamour profile image
chris damour • Edited

Exposing an id directly outside your app domain is fine for json-rpc.

RESTful practice is to always expose “ids” as urls, then any client can fetch that resource and know what actions can be taken on it. You change your http services, you 301 the old href to the new and the client updates all its references. It works beautifully, and its so simple to make that leap. The primary reason clients want an id is to plug it into some spot in another services url..but if you respond with that services href in your initial response they never need to do that plugging to begin with.

Then if youre really worried about replay attacks or enumerating your hypermedia controls hrefs can use temporary urls that only work for a window of time from a specific client (eg you append a signed jwt to em, with a client id/ip present). The href is opaque to the honest client, and the nefarious client cant hack the http request by just changing one part of it. And if the honest client takes to long, your 302 em to an auth challenge. Trust but validate with zero trust! But none of these things are options in json-rpc which is what you seem to be working with…too bad

Collapse
 
mdesousa profile image
Mario DeSousa

Great article Paul! These are great insights on the benefits of the IDs with a prefix.
I was wondering if you ever faced any challenges where the business team decides to rename a certain object... as an example, changing "customer" to "client". In this case the IDs that start with "cust_" would lose the meaning, and possibly cause confusion?

has stripe ever faced this? how was it resolved? just change the IDs going forward and leave legacy IDs as they are?

also, have you ever faced issues with changes to the prefix breaking code for your developers? for example, if their code expects an id with "cust_" and suddenly starts receive an id with "cli_"... has this been a problem?

Collapse
 
paulasjes profile image
Paul Asjes

It's tricky for sure. As a general rule once an object is named we don't ever rename it. We certainly would never start returning unexpected IDs without some sort of initial outreach to users. In your example, instead of changing "customer" to "client" we'd probably have those as two separate resources initially and deprecate the older one over time, whilst keeping the new resource backwards compatible with the older IDs. That way we don't ever unintentionally break anyone's integration.

Naming things is hard and we get it wrong sometimes too. One recent example I can think of is that we used pi_ for PaymentIntents. Later on we introduced SetupIntents, but couldn't use the prefix si_ as that was already being used for Subscription Items. We ended up in a world where we use pi_ and seti_, which is confusing as conceptually those are two similar objects.

We learned that when choosing prefixes it's better to lean on the more verbose side to be clear and to avoid future naming collisions. If we could go back and redo it I think we'd probably end up going with payint_ and setint_ for PaymentIntents and SetupIntents respectively.