DEV Community

Why you should never use an UUID as the primary key in SQL Databases

Frederik Van Lierde on December 09, 2023

Using a UUID (Universally Unique Identifier) as a primary key in SQL databases has both advantages and disadvantages. While UUIDs offer benefits in...
Collapse
 
vbilopav profile image
vbilopav • Edited

And how would you ensure pk uniqueness on a same entity in a distributed system?

Because you can't do it with int autoincrement.

And I don't see how declaring a primary key field as uuid default gen_random_uuid() instead of int autoincrement is any different in terms of extra complexity.

UUID exists for a reason and people use it for a reason. That doesn't mean it's good fit for every scenario. Samo goes for int autoincrement.

Collapse
 
bbkr profile image
Paweł bbkr Pabian • Edited

You can get unique autoincremented IDs in distribited systems. For example in MySQL you can configure autoincrement increment and offset. Using those first server will generate 1, 101, 201, 301, 401, etc (increment 100 + offset 1). Second server will generate 2, 102, 202, 302, 402, etc (increment 100 + offset 2). Very simple and effective. And you can migrate those rows between servers without risk of collision.

Collapse
 
mkoehrer profile image
mkoehrer

This will not work with anonymus replicates (for example notebooks running a local database - my old company did this with MS Server Replication -

The only thing that will work in this scenario are uuids

Collapse
 
mjaggard profile image
mjaggard

That is definitely not simple. You need to know (at least approximately) how many servers you'll have, you also presumably need to either number them somehow and then you need to get your next increment value before creating the object in some cases to be truly atomic from the application user's point of view.

Thread Thread
 
bbkr profile image
Paweł bbkr Pabian

Bigint unsigned with increment 1000 solves this issue entirely. Be realistic.

Yes, numbering servers requires discipline and misconfiguration can be fatal.

As for ability to generate ID before actual insert and never hit reordering penalty - same trick can be applied. Just use sequence generator.

I think this is great topic for more advanced post. OP barely scratched the surface, because in almost all relational databases there must be PK / UK for data internal ordering and row replication. Using UUIDs has nasty consequences in large scale. I'll write my own post soon, clearing few misconceptions around PK generators.

Collapse
 
csgeek profile image
csgeek

Why wouldn't you do what basically every distributed database does? Hash the pkey and split that across the n number of nodes available.

Your approach won't scale as soon as the are not keys used then you owned for.

Thread Thread
 
bbkr profile image
Paweł bbkr Pabian

Because databases like MySQL or PostgreSQL organize data by PK internally, so using UUID or any other non incremental sequence results in huuuuuuuge performance drop on inserts.

Collapse
 
mickmister profile image
Michael Kochell

Pretty clickbaity. "Why you should never" is a bit extreme. At least explain a horror story that has scarred you personally or something that caused you to write this. I use UUIDs at work and it has worked out fine for me.

Collapse
 
randreu28 profile image
Rubén Chiquin • Edited

There's a spanish chanel who is well spoken on this subject that made a video arguing the contrary.

youtu.be/wR5b0OhbUyw?si=JeJmicrHey...

It's main argument is the fact that id's never travel from the backend to the frontend.

An argument against not using them is security. Regular id's are ordered. If you were to have an endpoint like users/:Id, everyone could know every user id just by counting (user/1, user/2, user/3...). It could be solved with some Salt, but then you get more complexity..

They also provide multiple ways to aliviate some of the pains that are generated by using UUID.

Deserves a watch! Anyway good post

Collapse
 
palalet profile image
RadekHavelka

And lastly, altho the chance of collision / two uuids being the same is very small, it still exists ...

Collapse
 
vulwsztyn profile image
Artur Mostowski

Yeah, but you should worry more about spontaneous combustion

Collapse
 
karlatt profile image
karlatt

Do you know they say you have enough unique UUID to number all the atoms of the known universe ? saying the risk of collision is existing, is like saying you can win the lottery 10 times in a row , as the probability exists to , it should be a number like 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 chances on two billions
In fact, ANY probability can find a singularity where it can happen: I'll be happy if I win the lottery just once, and all the poor morons like me who spend money in a ticket would be happy too ... But the fact is that there are a lot more people who lose than people who win on each lottery session. So the probability you loose ten times in a row is much much bigger than my first assertion .
If you want to say something that gives structured information , you have to provide all the variables.

So how much probability do you have to have a collision against the probability of having a unique UUID ? This would be interesting and just show that the chance of collision is so small that it would be one of the biggest miracles of all times if it happens once

Collapse
 
palalet profile image
RadekHavelka

the problem isnt that the chance is LOW, the problem is that the chance IS. There is no guarantee that this will not happen if you generate two uuids, thats the point, but it is less significant than what reasons were mentioned in the article, from usability and logic point of view UUIDs makes no sense.

Collapse
 
jlous profile image
Joachim Lous

Can be viewed as a transient error, since a retry will fix it. But do make sure id generation is part of the retry.

Collapse
 
adaptive-shield-matrix profile image
Adaptive Shield Matrix • Edited

Currently one of the most popular uuid generators
github.com/paralleldrive/cuid2

Performance loss of uuid vs integer -> is/was about 6% in real world systems in one article I read (measure your own data) -> so nearly completely negletible

Readability and Debugging: -> if you have 7-9 character long integers -> readability is same as an uuid

Clustering -> adressed by cuid2
Fragmentation -> issue is negletable, since only a minor perf loss

Application Complexity -> completely wrong
uuid massively decrease Client-Server complexity, because the id can be generated by the client/browser without having to wait for and parse the server response. This reduction in architectural complexity is more worth than any perf loss suffered by the db.

Collapse
 
richardcrawshaw profile image
Richard Crawshaw

I've worked with a couple of SQL Server databases where GUIDs had been used for clustered primary keys. The performance in those areas truly sucked: a couple of orders of magnitude worse than where ints were used.

If a GUID was deemed necessary as an ID my first choice for a design would be to split that from the clustered key. Put the GUID in a non-clustered UNIQUE index and the int /bigint PK as an auto-incremented clustered index. Depending on the other data requirements I might split that from the rest of the data, putting the real data in one or more "extension" tables using the same PK and a 1:1 relationship.

Collapse
 
immotus profile image
immōtus

Frederik, what methodology did you use to draw these conclusions?

Without first describing the scientific method you used, all these are just unsubstantiated hypothesis (except for human readability and storage issues) and nothing else, really.

Collapse
 
m0n0x41d profile image
Ivan Zakutnii

100% agreed.

However, we are using UUID as the PKeys, but I suppose it is suitable for microservices, at least for the time being.

Someday, it might come back to haunt us... I am almost sure it will :D

Collapse
 
hewitt profile image
Neil

As long as these are comb/sequential ID's you will have no issues even in indexing but if they are not and are not sequential then you may. I have experience first hand in this.

Collapse
 
m0n0x41d profile image
Ivan Zakutnii

Sorry, I don't get it here.

Collapse
 
peledzohar profile image
Zohar Peled

I wouldn't say never, there are situations where it's actually better to use a UUID over simple 32 bit integers, however as with any other tool, you should know what you're doing if you want to get good results.

Collapse
 
fjones profile image
FJones

I would add: Locking behaviour on UUIDs can be surprising. When using UUIDs as a PK (or even unique key), partial locking can result in ranges being locked that are completely unrelated, due to the effective randomness of UUIDs. Deleting an entry from a long time ago can still result in a lock that may impact a fresh insert, for example.

Collapse
 
jlous profile image
Joachim Lous

On the upside, accessing a (usually) recent one doesn't lock al the other recent ones, where the most work is probably ongoing.

Collapse
 
todds profile image
Todd

The upside to uuids as primary keys far out way any down side on modern hardware. Us uuids guys won this debate 20 years ago. The biggest win from my perspective, knowing what the key of an object will be before you write it to the database simplifies the overall architecture.

Collapse
 
mattiasnordqvist profile image
Mattias Nordqvist

You can do this with numbers as well!

Collapse
 
srbhr profile image
Saurabh Rai

This reminded me of this video:

Collapse
 
tausiq2003 profile image
Tausiq Samantaray • Edited

You know what, you should write about solutions to this, instead of straight ratioing uuid. If a beginner reads this they will, yeah autoincrement is a good idea, but its not, cuz lets say if id is exposed in endpoints, yeah your DX(Developer Experience) is improved, but bad guys sitting behind you, their DX is also far better than you. They will simply sneak in and type "Hey, give me the /products/123, then /product/122 like that. So, autoincrement is a terrible idea. According to you, UUIDs are bad (but not for me). Politely asking suggest some solutions, if UUIDs and autoincrement are bad.
Thanks.

Collapse
 
frederik_vl profile image
Frederik Van Lierde

I never expose internal keys .
Why would you do that ?

Also I didn't say UUId are bad. I am using them all the time.
I just don't use then for primary keys and foreign keys

Collapse
 
anuragbhatt1805 profile image
Anurag Bhatt

I guess we can use UUID concept on mixed basis, Like for user Identification or payment id we can use uuid as it will be unique and hard to crack, where as for smaller concepts like product ID, customer ID, etc, we can use normal Smaller datatype

Collapse
 
ballzer0 profile image
Tommy

I've seen plenty of social media applications that did exactly this.

It was a bit awkward when I could manually increase the number in the URL and see strangers profile pictures.

I know you should have more security than that. But just having a number that increases makes your sites a hell of a lot easier to scrape

Collapse
 
williamlongii profile image
William Long II

Honestly think it's another case of know your requirements. Their are valid arguments for either usage ie size ints win no matter what basically a big int if your going nuts weighs in at 64 bits and a uuid is 128 bit number or 32 hex chars and 4 dashes if you storing it char method. Uuid is in my opinion and as others have mentioned much easier to get atomicity out of then ints especially in distributed systems. That being said we just rolled our own in most of our projects mixing 64 bit timestamp and 32-64 bit randomness so weigh in at uuid level or just under but get a sequentially generated first 64 bits taking inspiration from nosql style id's. To each their own just is a matter of what fits your needs. Nice article though definitely some.stuff to think on when making choices.

Collapse
 
0xf_0b1111 profile image
0xf

Insist on credible evidence to substantiate your claims, as your current discussion lacks empirical support, additionally, various statements seem baseless and lack credibility.
If you manage a substantial database and harbor concerns about UUIDs, contemplate replacing your UUID primary key with a Snowflake ID, which proves more effective than the conventional integer auto-increment.
en.m.wikipedia.org/wiki/Snowflake_ID

Note: avoid using chatgpt in future articles.

Collapse
 
mattiasnordqvist profile image
Mattias Nordqvist

I've been using snowflake IDs exclusively the last 2 years

Collapse
 
hewitt profile image
Neil

I used to think this when i was a junior developer. But this article is WRONG and should instead read ( why you should not use a non sequential or non-comb type UUID ). If you are going to use UUID/GUID's for Id fields then my recommendation is to use sequential or comb guids. We use these in our large scale SaaS solutions and they are much better in many ways than integers. They are performant ( comparable to integers for indexing ) and can be generated within the app layer or db layer and can be generated on distributed systems without clashes. Anyone reading this article should first read up on the above points because restricting yourself based on this article will really hinder your ability to build large-scale applications. integers are fine for small scale e.g a few million record tables but when you start to move to large scale and more importantly in browser record creation such as ticketing systems etc you WILL have to use UUIDs or some kind of string. At this point, Comb GUIDs become the go-to solution. @frederik_vl I urge you to amend your original article as it is factually incorrect and missleading.

Collapse
 
hewitt profile image
Neil

Further more to provide additional context to my above comment. There are some things people should be aware of regarding GUIDs that not everybody is.

  1. Only v4 guid's are cryptographically secure and can be used for ( security by obscurity URL's, such as when you use them for password resets, though this is somewhat debateable and is a business decision )
  2. Sequential/Comb GUIDs are by their nature guessable and should never be used externally for ( Security by obscurity reasons ).
  3. The size(4-byte) of GUID's are obviously larger than an integer field so if you live in 1988 then this may be a concern, otherwise it is a non-issue.

There are many articles that compare the overall performance of int's vs ( newsequentialId as it is in SQL Server or Comb GUID's) and i am not going to waste time linking through to them. What i would refer people to is the following repo that covers some of this and provides a way to generate sequential guids across different application layers ( COMB ).
github.com/richardtallent/RT.Comb

Collapse
 
neil_crow profile image
Neil Crow

"Never" is the wrong with for your heading.
Some of the behaviours that you describe are desirable, e.g clustering vs row distribution.
UUID's offer better security because the next one can't be guessed.
Also the debugging argument is pathetic, you should be using copy paste.

Collapse
 
vulwsztyn profile image
Artur Mostowski

The article is not only ai generated, but also wrong

Collapse
 
abrahamn profile image
Abraham

Suspiciously chatgpt styled indeed

Collapse
 
clipso profile image
Clipso

When comparing the collision probability of NanoID and UUID, here's what you need to know:

UUID (Universally Unique Identifier):

Uses a 128-bit random number, represented as 36 characters (hexadecimal digits and hyphens).
Collision probability is extremely low for most practical purposes. Estimates suggest a negligible chance of collision even with billions of generated IDs.
However, the collision probability increases with the number of generated IDs and depends on the specific version of UUID (e.g., UUIDv4 is considered more collision-resistant than UUIDv1).
Enter fullscreen mode Exit fullscreen mode

NanoID:

Uses a smaller alphabet (typically alphanumeric + symbols) and generates shorter IDs (around 21 characters on average).
Employs a cryptographically secure random number generator for higher randomness.
While the smaller alphabet theoretically increases the collision probability compared to UUID, the cryptographic randomness and careful algorithm design make collisions highly unlikely even with large datasets.
Enter fullscreen mode Exit fullscreen mode

Collision Comparison:

Both UUID and NanoID offer very low collision probability, appropriate for most real-world scenarios.
NanoID offers a shorter and possibly more "unique" appearance due to the larger alphabet.
If absolute certainty of uniqueness is paramount, UUIDv4 might be slightly safer due to its larger state space. However, even NanoID's estimated collision probability is practically negligible for most applications.
Enter fullscreen mode Exit fullscreen mode

Choosing the right ID:

Consider the trade-off between ID length and collision probability.
If conciseness is crucial, NanoID provides shorter IDs without a significant risk of collisions.
If absolute assurance of uniqueness is critical, UUIDv4 might be preferable.
For most cases, either technology is a valid choice.
Enter fullscreen mode Exit fullscreen mode

Ultimately, the decision depends on your specific project requirements and priorities.

I hope this explanation clarifies the collision differences between NanoID and UUID. Feel free to ask if you have any further questions!

Collapse
 
manchicken profile image
Mike Stemle

There are some deficiencies in these explanations, and I think some more testing is in order.

Collapse
 
jlous profile image
Joachim Lous • Edited

Mandating that keys originate in your persistence layer is a huge architecture decision, and as such should always be an informed one.
Sometimes warranted, I'm sure, but in lots of modern cases: just an egregious case of premature optimisation, introducing unnecessary dependencies, complexity, and limits on your design space.
The cost is real, so you better make sure the benefit is too.
Not every app is or should be db-centric.

Collapse
 
mehen_cto profile image
Steven Fisher | Mehen CTO

This doesn't solve all the problems listed above, but it definitely reduces the key size:

Base64 encode your UUID and you'll get a 22 character ID. That's significantly shorter than UUID by itself.

Note that you'll probably want to make it URL safe by replacing all "/" and "+" characters with something like "-" and "_".

Collapse
 
quyettranvu profile image
Trần Vũ Quyết

For me every programming termins exist for some reasons, of course with the unique and identified id for the schema in the database, it would be the best choice for choosing unique autoincremented IDs. But in some specific cases, it would be again best choice if starting with the use of uuid. For me I used to make the use of them for generating id of receipt cause it fits my wishes.

Collapse
 
ldrscke profile image
Christian Ledermann

Do you think that UUID7 may have an advantage in DB scenarios?

Collapse
 
frederik_vl profile image
Frederik Van Lierde

I don't think any non-number primary key can beat the number primary key
Example, the fragmentation is not solved with UUID7

Collapse
 
jlous profile image
Joachim Lous

everything is numbers

Thread Thread
 
nikunjbhatt profile image
Nikunj Bhatt

Right. We see Universally Unique Ids as characters and bytes but they are just numbers, 0s and 1s, for the database and machines; however, the old UUIDs weren't that simple. The newer generation of UUIDs are based on timestamp and so are sequential, and therefore they are not much significantly different than integer ids.

Collapse
 
wknak profile image
William Knak • Edited

I had a scenario and solve like this:

  1. Multiple Android devices with SQLite inserting new records, working mostly offline.
  2. SQLite tables, with related FKs (update/delete cascading)
  3. When generating a new ID on the device, I get the LOWEST number (yeah), and just to make that clear, I make sure it is less than NEGATIVE 1.000.000

This is my new inserted Id:
-1000001
Next one
-1000002, -1000003

So this records are pending to be “synchronized”.

To know which records need to be inserted, just select the negative IDs.

After successfully synchronizing them, the devices IDs are CASCADE updates to the IDs generated at the server, at a single transaction if possible. All tables / FKs are correctly updated.

Never got an issue related with duplicated at server side, nor device.

Did you ever use Update Cascade before? SQLite and many other databases have this feature.

I also achieve a good synch strategy for updating as well as deleting records with other approaches.

Collapse
 
insign profile image
Hélio oliveira • Edited

I not even think UUID as UID anymore, I prefer ULID since it breaks the disadvantages of UUID.

Collapse
 
robinbastiaan profile image
Robin Bastiaan

UUID's have there use and place, it just happens to be not as a primary key.

Collapse
 
tselmek profile image
Antoine Jésus

What do you think about NanoID?
It can generate unique IDs with a fixed size and a custom format depending on your needs

Collapse
 
gksk profile image
gks
Collapse
 
gksk profile image
gks

ULID??

Collapse
 
monichre profile image
Liam Ellis

This feels like low effort AI

Collapse
 
zeroows profile image
Abdulrhman A. AlKhodiry

I think a better solution is to use ULID. As it starts with timestamp.

Collapse
 
gregorygaines profile image
Gregory Gaines

There are multiple variations and iterations of UUID that solve the issues above if you looked for them.

So far this article seems all hypothetical, backed by hastily gathered information.

Collapse
 
momander profile image
Martin Omander

Interesting article. It would be helpful of you include measurements when you make claims about performance. Otherwise readers might think it's just your personal opinion.

Collapse
 
anantrp profile image
Anant Parmar

What are your thoughts on ULIDs, they address many of the concern you pointed out

Collapse
 
christian_go3 profile image
Christian GO

Interesting point, thanks for sharing!

Collapse
 
leovolpatto profile image
Leonardo Volpatto

UUID V7 and postgres. The better choice to avoid those "why you should never use uuid..."

Collapse
 
bop profile image
bop

I think CUID2 solves some of the problems we have here. But I think the best scenario is having both. One unique id for the entity and incremental for relations and etc..

Collapse
 
acip profile image
Ciprian Amariei

How Firebase generates unique distributed IDs firebase.blog/posts/2015/02/the-21...

Collapse
 
prithuadhikary profile image
Prithu Adhikary

Its difficult to agree to. Reason why its used is coz its difficult to guess. Integers are guessable. Autoincrement is a no no!

Collapse
 
prithuadhikary profile image
Prithu Adhikary

And often its not the index on primary keys that adds value. But ya, its the default one anyways.

Collapse
 
gpt14 profile image
GPT14

Great article! I'd have also included a couple of alternatives to UUID as well

Collapse
 
axoibc profile image
Bill

Isn’t this why KSUIDs were invented? It’s a sortable globally unique key.