DEV Community

Why you should never use an UUID as the primary key in SQL Databases

Frederik Van Lierde on December 09, 2023

Using a UUID (Universally Unique Identifier) as a primary key in SQL databases has both advantages and disadvantages. While UUIDs offer benefits in...

Read full post

vbilopav • Dec 9 '23 • Edited

And how would you ensure pk uniqueness on a same entity in a distributed system?

Because you can't do it with int autoincrement.

And I don't see how declaring a primary key field as uuid default gen_random_uuid() instead of int autoincrement is any different in terms of extra complexity.

UUID exists for a reason and people use it for a reason. That doesn't mean it's good fit for every scenario. Samo goes for int autoincrement.

Paweł bbkr Pabian • Dec 10 '23 • Edited

You can get unique autoincremented IDs in distribited systems. For example in MySQL you can configure autoincrement increment and offset. Using those first server will generate 1, 101, 201, 301, 401, etc (increment 100 + offset 1). Second server will generate 2, 102, 202, 302, 402, etc (increment 100 + offset 2). Very simple and effective. And you can migrate those rows between servers without risk of collision.

mkoehrer • Dec 11 '23

This will not work with anonymus replicates (for example notebooks running a local database - my old company did this with MS Server Replication -

The only thing that will work in this scenario are uuids

mjaggard • Dec 10 '23

That is definitely not simple. You need to know (at least approximately) how many servers you'll have, you also presumably need to either number them somehow and then you need to get your next increment value before creating the object in some cases to be truly atomic from the application user's point of view.

Paweł bbkr Pabian • Dec 10 '23

Bigint unsigned with increment 1000 solves this issue entirely. Be realistic.

Yes, numbering servers requires discipline and misconfiguration can be fatal.

As for ability to generate ID before actual insert and never hit reordering penalty - same trick can be applied. Just use sequence generator.

I think this is great topic for more advanced post. OP barely scratched the surface, because in almost all relational databases there must be PK / UK for data internal ordering and row replication. Using UUIDs has nasty consequences in large scale. I'll write my own post soon, clearing few misconceptions around PK generators.

csgeek • Dec 10 '23

Why wouldn't you do what basically every distributed database does? Hash the pkey and split that across the n number of nodes available.

Your approach won't scale as soon as the are not keys used then you owned for.

Paweł bbkr Pabian • Dec 10 '23

Because databases like MySQL or PostgreSQL organize data by PK internally, so using UUID or any other non incremental sequence results in huuuuuuuge performance drop on inserts.

Michael Kochell • Dec 10 '23

Pretty clickbaity. "Why you should never" is a bit extreme. At least explain a horror story that has scarred you personally or something that caused you to write this. I use UUIDs at work and it has worked out fine for me.

Rubén Chiquin • Dec 10 '23 • Edited

There's a spanish chanel who is well spoken on this subject that made a video arguing the contrary.

youtu.be/wR5b0OhbUyw?si=JeJmicrHey...

It's main argument is the fact that id's never travel from the backend to the frontend.

An argument against not using them is security. Regular id's are ordered. If you were to have an endpoint like users/:Id, everyone could know every user id just by counting (user/1, user/2, user/3...). It could be solved with some Salt, but then you get more complexity..

They also provide multiple ways to aliviate some of the pains that are generated by using UUID.

Deserves a watch! Anyway good post

RadekHavelka • Dec 9 '23

And lastly, altho the chance of collision / two uuids being the same is very small, it still exists ...

Artur Mostowski • Dec 10 '23

Yeah, but you should worry more about spontaneous combustion

karlatt • Dec 20 '23

Do you know they say you have enough unique UUID to number all the atoms of the known universe ? saying the risk of collision is existing, is like saying you can win the lottery 10 times in a row , as the probability exists to , it should be a number like 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 chances on two billions
In fact, ANY probability can find a singularity where it can happen: I'll be happy if I win the lottery just once, and all the poor morons like me who spend money in a ticket would be happy too ... But the fact is that there are a lot more people who lose than people who win on each lottery session. So the probability you loose ten times in a row is much much bigger than my first assertion .
If you want to say something that gives structured information , you have to provide all the variables.

So how much probability do you have to have a collision against the probability of having a unique UUID ? This would be interesting and just show that the chance of collision is so small that it would be one of the biggest miracles of all times if it happens once

RadekHavelka • Dec 21 '23

the problem isnt that the chance is LOW, the problem is that the chance IS. There is no guarantee that this will not happen if you generate two uuids, thats the point, but it is less significant than what reasons were mentioned in the article, from usability and logic point of view UUIDs makes no sense.

Joachim Lous • Dec 11 '23

Can be viewed as a transient error, since a retry will fix it. But do make sure id generation is part of the retry.

Adaptive Shield Matrix • Dec 11 '23 • Edited

Currently one of the most popular uuid generators
github.com/paralleldrive/cuid2

Performance loss of uuid vs integer -> is/was about 6% in real world systems in one article I read (measure your own data) -> so nearly completely negletible

Readability and Debugging: -> if you have 7-9 character long integers -> readability is same as an uuid

Clustering -> adressed by cuid2
Fragmentation -> issue is negletable, since only a minor perf loss

Application Complexity -> completely wrong
uuid massively decrease Client-Server complexity, because the id can be generated by the client/browser without having to wait for and parse the server response. This reduction in architectural complexity is more worth than any perf loss suffered by the db.

Richard Crawshaw • Dec 10 '23

I've worked with a couple of SQL Server databases where GUIDs had been used for clustered primary keys. The performance in those areas truly sucked: a couple of orders of magnitude worse than where ints were used.

If a GUID was deemed necessary as an ID my first choice for a design would be to split that from the clustered key. Put the GUID in a non-clustered UNIQUE index and the int /bigint PK as an auto-incremented clustered index. Depending on the other data requirements I might split that from the rest of the data, putting the real data in one or more "extension" tables using the same PK and a 1:1 relationship.

immōtus • Dec 11 '23

Frederik, what methodology did you use to draw these conclusions?

Without first describing the scientific method you used, all these are just unsubstantiated hypothesis (except for human readability and storage issues) and nothing else, really.

Ivan Zakutnii • Dec 10 '23

100% agreed.

However, we are using UUID as the PKeys, but I suppose it is suitable for microservices, at least for the time being.

Someday, it might come back to haunt us... I am almost sure it will :D

Neil • Dec 11 '23

As long as these are comb/sequential ID's you will have no issues even in indexing but if they are not and are not sequential then you may. I have experience first hand in this.

Ivan Zakutnii • Dec 14 '23

Sorry, I don't get it here.

Zohar Peled • Dec 10 '23

I wouldn't say never, there are situations where it's actually better to use a UUID over simple 32 bit integers, however as with any other tool, you should know what you're doing if you want to get good results.

FJones • Dec 10 '23

I would add: Locking behaviour on UUIDs can be surprising. When using UUIDs as a PK (or even unique key), partial locking can result in ranges being locked that are completely unrelated, due to the effective randomness of UUIDs. Deleting an entry from a long time ago can still result in a lock that may impact a fresh insert, for example.

Joachim Lous • Dec 11 '23

On the upside, accessing a (usually) recent one doesn't lock al the other recent ones, where the most work is probably ongoing.

Todd • Dec 11 '23

The upside to uuids as primary keys far out way any down side on modern hardware. Us uuids guys won this debate 20 years ago. The biggest win from my perspective, knowing what the key of an object will be before you write it to the database simplifies the overall architecture.

Mattias Nordqvist • Dec 23 '23

You can do this with numbers as well!

Saurabh Rai • Dec 10 '23

This reminded me of this video:

Tausiq Samantaray • Dec 13 '23 • Edited

You know what, you should write about solutions to this, instead of straight ratioing uuid. If a beginner reads this they will, yeah autoincrement is a good idea, but its not, cuz lets say if id is exposed in endpoints, yeah your DX(Developer Experience) is improved, but bad guys sitting behind you, their DX is also far better than you. They will simply sneak in and type "Hey, give me the /products/123, then /product/122 like that. So, autoincrement is a terrible idea. According to you, UUIDs are bad (but not for me). Politely asking suggest some solutions, if UUIDs and autoincrement are bad.
Thanks.

Frederik Van Lierde • Dec 13 '23

I never expose internal keys .
Why would you do that ?

Also I didn't say UUId are bad. I am using them all the time.
I just don't use then for primary keys and foreign keys

Anurag Bhatt • Dec 11 '23

I guess we can use UUID concept on mixed basis, Like for user Identification or payment id we can use uuid as it will be unique and hard to crack, where as for smaller concepts like product ID, customer ID, etc, we can use normal Smaller datatype

Tommy • Dec 11 '23

I've seen plenty of social media applications that did exactly this.

It was a bit awkward when I could manually increase the number in the URL and see strangers profile pictures.

I know you should have more security than that. But just having a number that increases makes your sites a hell of a lot easier to scrape

William Long II • Dec 11 '23

Honestly think it's another case of know your requirements. Their are valid arguments for either usage ie size ints win no matter what basically a big int if your going nuts weighs in at 64 bits and a uuid is 128 bit number or 32 hex chars and 4 dashes if you storing it char method. Uuid is in my opinion and as others have mentioned much easier to get atomicity out of then ints especially in distributed systems. That being said we just rolled our own in most of our projects mixing 64 bit timestamp and 32-64 bit randomness so weigh in at uuid level or just under but get a sequentially generated first 64 bits taking inspiration from nosql style id's. To each their own just is a matter of what fits your needs. Nice article though definitely some.stuff to think on when making choices.

0xf • Dec 11 '23

Insist on credible evidence to substantiate your claims, as your current discussion lacks empirical support, additionally, various statements seem baseless and lack credibility.
If you manage a substantial database and harbor concerns about UUIDs, contemplate replacing your UUID primary key with a Snowflake ID, which proves more effective than the conventional integer auto-increment.
en.m.wikipedia.org/wiki/Snowflake_ID

Note: avoid using chatgpt in future articles.

Mattias Nordqvist • Jan 13

I've been using snowflake IDs exclusively the last 2 years

Neil • Dec 11 '23

I used to think this when i was a junior developer. But this article is WRONG and should instead read ( why you should not use a non sequential or non-comb type UUID ). If you are going to use UUID/GUID's for Id fields then my recommendation is to use sequential or comb guids. We use these in our large scale SaaS solutions and they are much better in many ways than integers. They are performant ( comparable to integers for indexing ) and can be generated within the app layer or db layer and can be generated on distributed systems without clashes. Anyone reading this article should first read up on the above points because restricting yourself based on this article will really hinder your ability to build large-scale applications. integers are fine for small scale e.g a few million record tables but when you start to move to large scale and more importantly in browser record creation such as ticketing systems etc you WILL have to use UUIDs or some kind of string. At this point, Comb GUIDs become the go-to solution. @frederik_vl I urge you to amend your original article as it is factually incorrect and missleading.

Neil • Dec 11 '23

Further more to provide additional context to my above comment. There are some things people should be aware of regarding GUIDs that not everybody is.

Only v4 guid's are cryptographically secure and can be used for ( security by obscurity URL's, such as when you use them for password resets, though this is somewhat debateable and is a business decision )
Sequential/Comb GUIDs are by their nature guessable and should never be used externally for ( Security by obscurity reasons ).
The size(4-byte) of GUID's are obviously larger than an integer field so if you live in 1988 then this may be a concern, otherwise it is a non-issue.

There are many articles that compare the overall performance of int's vs ( newsequentialId as it is in SQL Server or Comb GUID's) and i am not going to waste time linking through to them. What i would refer people to is the following repo that covers some of this and provides a way to generate sequential guids across different application layers ( COMB ).
github.com/richardtallent/RT.Comb

Neil Crow • Dec 11 '23

"Never" is the wrong with for your heading.
Some of the behaviours that you describe are desirable, e.g clustering vs row distribution.
UUID's offer better security because the next one can't be guessed.
Also the debugging argument is pathetic, you should be using copy paste.

Artur Mostowski • Dec 10 '23

The article is not only ai generated, but also wrong

Abraham • Dec 11 '23

Suspiciously chatgpt styled indeed

Clipso • Dec 15 '23

When comparing the collision probability of NanoID and UUID, here's what you need to know:

UUID (Universally Unique Identifier):

Uses a 128-bit random number, represented as 36 characters (hexadecimal digits and hyphens).
Collision probability is extremely low for most practical purposes. Estimates suggest a negligible chance of collision even with billions of generated IDs.
However, the collision probability increases with the number of generated IDs and depends on the specific version of UUID (e.g., UUIDv4 is considered more collision-resistant than UUIDv1).

NanoID:

Uses a smaller alphabet (typically alphanumeric + symbols) and generates shorter IDs (around 21 characters on average).
Employs a cryptographically secure random number generator for higher randomness.
While the smaller alphabet theoretically increases the collision probability compared to UUID, the cryptographic randomness and careful algorithm design make collisions highly unlikely even with large datasets.

Collision Comparison:

Both UUID and NanoID offer very low collision probability, appropriate for most real-world scenarios.
NanoID offers a shorter and possibly more "unique" appearance due to the larger alphabet.
If absolute certainty of uniqueness is paramount, UUIDv4 might be slightly safer due to its larger state space. However, even NanoID's estimated collision probability is practically negligible for most applications.

Choosing the right ID:

Consider the trade-off between ID length and collision probability.
If conciseness is crucial, NanoID provides shorter IDs without a significant risk of collisions.
If absolute assurance of uniqueness is critical, UUIDv4 might be preferable.
For most cases, either technology is a valid choice.

Ultimately, the decision depends on your specific project requirements and priorities.

I hope this explanation clarifies the collision differences between NanoID and UUID. Feel free to ask if you have any further questions!

Mike Stemle • Dec 9 '23

There are some deficiencies in these explanations, and I think some more testing is in order.

Joachim Lous • Dec 11 '23 • Edited

Mandating that keys originate in your persistence layer is a huge architecture decision, and as such should always be an informed one.
Sometimes warranted, I'm sure, but in lots of modern cases: just an egregious case of premature optimisation, introducing unnecessary dependencies, complexity, and limits on your design space.
The cost is real, so you better make sure the benefit is too.
Not every app is or should be db-centric.

Steven Fisher | Mehen CTO • Jan 1

This doesn't solve all the problems listed above, but it definitely reduces the key size:

Base64 encode your UUID and you'll get a 22 character ID. That's significantly shorter than UUID by itself.

Note that you'll probably want to make it URL safe by replacing all "/" and "+" characters with something like "-" and "_".

Trần Vũ Quyết • Dec 18 '23

For me every programming termins exist for some reasons, of course with the unique and identified id for the schema in the database, it would be the best choice for choosing unique autoincremented IDs. But in some specific cases, it would be again best choice if starting with the use of uuid. For me I used to make the use of them for generating id of receipt cause it fits my wishes.

Christian Ledermann • Dec 9 '23

Do you think that UUID7 may have an advantage in DB scenarios?

Frederik Van Lierde • Dec 9 '23

I don't think any non-number primary key can beat the number primary key
Example, the fragmentation is not solved with UUID7

Joachim Lous • Dec 11 '23

everything is numbers

Nikunj Bhatt • Dec 12 '23

Right. We see Universally Unique Ids as characters and bytes but they are just numbers, 0s and 1s, for the database and machines; however, the old UUIDs weren't that simple. The newer generation of UUIDs are based on timestamp and so are sequential, and therefore they are not much significantly different than integer ids.

William Knak • Dec 11 '23 • Edited

I had a scenario and solve like this:

Multiple Android devices with SQLite inserting new records, working mostly offline.
SQLite tables, with related FKs (update/delete cascading)
When generating a new ID on the device, I get the LOWEST number (yeah), and just to make that clear, I make sure it is less than NEGATIVE 1.000.000

This is my new inserted Id:
-1000001
Next one
-1000002, -1000003

So this records are pending to be “synchronized”.

To know which records need to be inserted, just select the negative IDs.

After successfully synchronizing them, the devices IDs are CASCADE updates to the IDs generated at the server, at a single transaction if possible. All tables / FKs are correctly updated.

Never got an issue related with duplicated at server side, nor device.

Did you ever use Update Cascade before? SQLite and many other databases have this feature.

I also achieve a good synch strategy for updating as well as deleting records with other approaches.

Hélio oliveira • Dec 14 '23 • Edited

I not even think UUID as UID anymore, I prefer ULID since it breaks the disadvantages of UUID.

Robin Bastiaan • Jan 7

UUID's have there use and place, it just happens to be not as a primary key.

Antoine Jésus • Dec 16 '23

What do you think about NanoID?
It can generate unique IDs with a fixed size and a custom format depending on your needs

gks • Dec 16 '23

linkedin.com/pulse/ulid-way-genera...

gks • Dec 16 '23

ULID??

Liam Ellis • Dec 15 '23

This feels like low effort AI

Abdulrhman A. AlKhodiry • Dec 11 '23

I think a better solution is to use ULID. As it starts with timestamp.

Gregory Gaines • Dec 11 '23

There are multiple variations and iterations of UUID that solve the issues above if you looked for them.

So far this article seems all hypothetical, backed by hastily gathered information.

Martin Omander • Dec 11 '23

Interesting article. It would be helpful of you include measurements when you make claims about performance. Otherwise readers might think it's just your personal opinion.

Anant Parmar • Dec 11 '23

What are your thoughts on ULIDs, they address many of the concern you pointed out

Christian GO • Dec 10 '23

Interesting point, thanks for sharing!

Leonardo Volpatto • Dec 10 '23

UUID V7 and postgres. The better choice to avoid those "why you should never use uuid..."

bop • Dec 10 '23

I think CUID2 solves some of the problems we have here. But I think the best scenario is having both. One unique id for the entity and incremental for relations and etc..