Using a UUID (Universally Unique Identifier) as a primary key in SQL databases has both advantages and disadvantages. While UUIDs offer benefits in...
For further actions, you may consider blocking this person and/or reporting abuse
And how would you ensure pk uniqueness on a same entity in a distributed system?
Because you can't do it with int autoincrement.
And I don't see how declaring a primary key field as uuid default gen_random_uuid() instead of int autoincrement is any different in terms of extra complexity.
UUID exists for a reason and people use it for a reason. That doesn't mean it's good fit for every scenario. Samo goes for int autoincrement.
You can get unique autoincremented IDs in distribited systems. For example in MySQL you can configure autoincrement increment and offset. Using those first server will generate 1, 101, 201, 301, 401, etc (increment 100 + offset 1). Second server will generate 2, 102, 202, 302, 402, etc (increment 100 + offset 2). Very simple and effective. And you can migrate those rows between servers without risk of collision.
This will not work with anonymus replicates (for example notebooks running a local database - my old company did this with MS Server Replication -
The only thing that will work in this scenario are uuids
That is definitely not simple. You need to know (at least approximately) how many servers you'll have, you also presumably need to either number them somehow and then you need to get your next increment value before creating the object in some cases to be truly atomic from the application user's point of view.
Bigint unsigned with increment 1000 solves this issue entirely. Be realistic.
Yes, numbering servers requires discipline and misconfiguration can be fatal.
As for ability to generate ID before actual insert and never hit reordering penalty - same trick can be applied. Just use sequence generator.
I think this is great topic for more advanced post. OP barely scratched the surface, because in almost all relational databases there must be PK / UK for data internal ordering and row replication. Using UUIDs has nasty consequences in large scale. I'll write my own post soon, clearing few misconceptions around PK generators.
Why wouldn't you do what basically every distributed database does? Hash the pkey and split that across the n number of nodes available.
Your approach won't scale as soon as the are not keys used then you owned for.
Because databases like MySQL or PostgreSQL organize data by PK internally, so using UUID or any other non incremental sequence results in huuuuuuuge performance drop on inserts.
Pretty clickbaity. "Why you should never" is a bit extreme. At least explain a horror story that has scarred you personally or something that caused you to write this. I use UUIDs at work and it has worked out fine for me.
There's a spanish chanel who is well spoken on this subject that made a video arguing the contrary.
youtu.be/wR5b0OhbUyw?si=JeJmicrHey...
It's main argument is the fact that id's never travel from the backend to the frontend.
An argument against not using them is security. Regular id's are ordered. If you were to have an endpoint like users/:Id, everyone could know every user id just by counting (user/1, user/2, user/3...). It could be solved with some Salt, but then you get more complexity..
They also provide multiple ways to aliviate some of the pains that are generated by using UUID.
Deserves a watch! Anyway good post
And lastly, altho the chance of collision / two uuids being the same is very small, it still exists ...
Yeah, but you should worry more about spontaneous combustion
Do you know they say you have enough unique UUID to number all the atoms of the known universe ? saying the risk of collision is existing, is like saying you can win the lottery 10 times in a row , as the probability exists to , it should be a number like 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 chances on two billions
In fact, ANY probability can find a singularity where it can happen: I'll be happy if I win the lottery just once, and all the poor morons like me who spend money in a ticket would be happy too ... But the fact is that there are a lot more people who lose than people who win on each lottery session. So the probability you loose ten times in a row is much much bigger than my first assertion .
If you want to say something that gives structured information , you have to provide all the variables.
So how much probability do you have to have a collision against the probability of having a unique UUID ? This would be interesting and just show that the chance of collision is so small that it would be one of the biggest miracles of all times if it happens once
the problem isnt that the chance is LOW, the problem is that the chance IS. There is no guarantee that this will not happen if you generate two uuids, thats the point, but it is less significant than what reasons were mentioned in the article, from usability and logic point of view UUIDs makes no sense.
Can be viewed as a transient error, since a retry will fix it. But do make sure id generation is part of the retry.
Currently one of the most popular uuid generators
github.com/paralleldrive/cuid2
Performance loss of uuid vs integer -> is/was about 6% in real world systems in one article I read (measure your own data) -> so nearly completely negletible
Readability and Debugging: -> if you have 7-9 character long integers -> readability is same as an uuid
Clustering -> adressed by cuid2
Fragmentation -> issue is negletable, since only a minor perf loss
Application Complexity -> completely wrong
uuid massively decrease Client-Server complexity, because the id can be generated by the client/browser without having to wait for and parse the server response. This reduction in architectural complexity is more worth than any perf loss suffered by the db.
I've worked with a couple of SQL Server databases where GUIDs had been used for clustered primary keys. The performance in those areas truly sucked: a couple of orders of magnitude worse than where ints were used.
If a GUID was deemed necessary as an ID my first choice for a design would be to split that from the clustered key. Put the GUID in a non-clustered UNIQUE index and the int /bigint PK as an auto-incremented clustered index. Depending on the other data requirements I might split that from the rest of the data, putting the real data in one or more "extension" tables using the same PK and a 1:1 relationship.
Frederik, what methodology did you use to draw these conclusions?
Without first describing the scientific method you used, all these are just unsubstantiated hypothesis (except for human readability and storage issues) and nothing else, really.
100% agreed.
However, we are using UUID as the PKeys, but I suppose it is suitable for microservices, at least for the time being.
Someday, it might come back to haunt us... I am almost sure it will :D
As long as these are comb/sequential ID's you will have no issues even in indexing but if they are not and are not sequential then you may. I have experience first hand in this.
Sorry, I don't get it here.
I wouldn't say never, there are situations where it's actually better to use a UUID over simple 32 bit integers, however as with any other tool, you should know what you're doing if you want to get good results.
I would add: Locking behaviour on UUIDs can be surprising. When using UUIDs as a PK (or even unique key), partial locking can result in ranges being locked that are completely unrelated, due to the effective randomness of UUIDs. Deleting an entry from a long time ago can still result in a lock that may impact a fresh insert, for example.
On the upside, accessing a (usually) recent one doesn't lock al the other recent ones, where the most work is probably ongoing.
The upside to uuids as primary keys far out way any down side on modern hardware. Us uuids guys won this debate 20 years ago. The biggest win from my perspective, knowing what the key of an object will be before you write it to the database simplifies the overall architecture.
You can do this with numbers as well!
This reminded me of this video:
You know what, you should write about solutions to this, instead of straight ratioing uuid. If a beginner reads this they will, yeah autoincrement is a good idea, but its not, cuz lets say if id is exposed in endpoints, yeah your DX(Developer Experience) is improved, but bad guys sitting behind you, their DX is also far better than you. They will simply sneak in and type "Hey, give me the /products/123, then /product/122 like that. So, autoincrement is a terrible idea. According to you, UUIDs are bad (but not for me). Politely asking suggest some solutions, if UUIDs and autoincrement are bad.
Thanks.
I never expose internal keys .
Why would you do that ?
Also I didn't say UUId are bad. I am using them all the time.
I just don't use then for primary keys and foreign keys
I guess we can use UUID concept on mixed basis, Like for user Identification or payment id we can use uuid as it will be unique and hard to crack, where as for smaller concepts like product ID, customer ID, etc, we can use normal Smaller datatype
I've seen plenty of social media applications that did exactly this.
It was a bit awkward when I could manually increase the number in the URL and see strangers profile pictures.
I know you should have more security than that. But just having a number that increases makes your sites a hell of a lot easier to scrape
Honestly think it's another case of know your requirements. Their are valid arguments for either usage ie size ints win no matter what basically a big int if your going nuts weighs in at 64 bits and a uuid is 128 bit number or 32 hex chars and 4 dashes if you storing it char method. Uuid is in my opinion and as others have mentioned much easier to get atomicity out of then ints especially in distributed systems. That being said we just rolled our own in most of our projects mixing 64 bit timestamp and 32-64 bit randomness so weigh in at uuid level or just under but get a sequentially generated first 64 bits taking inspiration from nosql style id's. To each their own just is a matter of what fits your needs. Nice article though definitely some.stuff to think on when making choices.
Insist on credible evidence to substantiate your claims, as your current discussion lacks empirical support, additionally, various statements seem baseless and lack credibility.
If you manage a substantial database and harbor concerns about UUIDs, contemplate replacing your UUID primary key with a Snowflake ID, which proves more effective than the conventional integer auto-increment.
en.m.wikipedia.org/wiki/Snowflake_ID
Note: avoid using chatgpt in future articles.
I've been using snowflake IDs exclusively the last 2 years
I used to think this when i was a junior developer. But this article is WRONG and should instead read ( why you should not use a non sequential or non-comb type UUID ). If you are going to use UUID/GUID's for Id fields then my recommendation is to use sequential or comb guids. We use these in our large scale SaaS solutions and they are much better in many ways than integers. They are performant ( comparable to integers for indexing ) and can be generated within the app layer or db layer and can be generated on distributed systems without clashes. Anyone reading this article should first read up on the above points because restricting yourself based on this article will really hinder your ability to build large-scale applications. integers are fine for small scale e.g a few million record tables but when you start to move to large scale and more importantly in browser record creation such as ticketing systems etc you WILL have to use UUIDs or some kind of string. At this point, Comb GUIDs become the go-to solution. @frederik_vl I urge you to amend your original article as it is factually incorrect and missleading.
Further more to provide additional context to my above comment. There are some things people should be aware of regarding GUIDs that not everybody is.
There are many articles that compare the overall performance of int's vs ( newsequentialId as it is in SQL Server or Comb GUID's) and i am not going to waste time linking through to them. What i would refer people to is the following repo that covers some of this and provides a way to generate sequential guids across different application layers ( COMB ).
github.com/richardtallent/RT.Comb
"Never" is the wrong with for your heading.
Some of the behaviours that you describe are desirable, e.g clustering vs row distribution.
UUID's offer better security because the next one can't be guessed.
Also the debugging argument is pathetic, you should be using copy paste.
The article is not only ai generated, but also wrong
Suspiciously chatgpt styled indeed
When comparing the collision probability of NanoID and UUID, here's what you need to know:
UUID (Universally Unique Identifier):
NanoID:
Collision Comparison:
Choosing the right ID:
Ultimately, the decision depends on your specific project requirements and priorities.
I hope this explanation clarifies the collision differences between NanoID and UUID. Feel free to ask if you have any further questions!
There are some deficiencies in these explanations, and I think some more testing is in order.
Mandating that keys originate in your persistence layer is a huge architecture decision, and as such should always be an informed one.
Sometimes warranted, I'm sure, but in lots of modern cases: just an egregious case of premature optimisation, introducing unnecessary dependencies, complexity, and limits on your design space.
The cost is real, so you better make sure the benefit is too.
Not every app is or should be db-centric.
This doesn't solve all the problems listed above, but it definitely reduces the key size:
Base64 encode your UUID and you'll get a 22 character ID. That's significantly shorter than UUID by itself.
Note that you'll probably want to make it URL safe by replacing all "/" and "+" characters with something like "-" and "_".
For me every programming termins exist for some reasons, of course with the unique and identified id for the schema in the database, it would be the best choice for choosing unique autoincremented IDs. But in some specific cases, it would be again best choice if starting with the use of uuid. For me I used to make the use of them for generating id of receipt cause it fits my wishes.
Do you think that UUID7 may have an advantage in DB scenarios?
I don't think any non-number primary key can beat the number primary key
Example, the fragmentation is not solved with UUID7
everything is numbers
Right. We see Universally Unique Ids as characters and bytes but they are just numbers, 0s and 1s, for the database and machines; however, the old UUIDs weren't that simple. The newer generation of UUIDs are based on timestamp and so are sequential, and therefore they are not much significantly different than integer ids.
I had a scenario and solve like this:
This is my new inserted Id:
-1000001
Next one
-1000002, -1000003
So this records are pending to be “synchronized”.
To know which records need to be inserted, just select the negative IDs.
After successfully synchronizing them, the devices IDs are CASCADE updates to the IDs generated at the server, at a single transaction if possible. All tables / FKs are correctly updated.
Never got an issue related with duplicated at server side, nor device.
Did you ever use Update Cascade before? SQLite and many other databases have this feature.
I also achieve a good synch strategy for updating as well as deleting records with other approaches.
I not even think UUID as UID anymore, I prefer ULID since it breaks the disadvantages of UUID.
UUID's have there use and place, it just happens to be not as a primary key.
What do you think about NanoID?
It can generate unique IDs with a fixed size and a custom format depending on your needs
linkedin.com/pulse/ulid-way-genera...
ULID??
This feels like low effort AI
I think a better solution is to use ULID. As it starts with timestamp.
There are multiple variations and iterations of UUID that solve the issues above if you looked for them.
So far this article seems all hypothetical, backed by hastily gathered information.
Interesting article. It would be helpful of you include measurements when you make claims about performance. Otherwise readers might think it's just your personal opinion.
What are your thoughts on ULIDs, they address many of the concern you pointed out
Interesting point, thanks for sharing!
UUID V7 and postgres. The better choice to avoid those "why you should never use uuid..."
I think CUID2 solves some of the problems we have here. But I think the best scenario is having both. One unique id for the entity and incremental for relations and etc..
How Firebase generates unique distributed IDs firebase.blog/posts/2015/02/the-21...
Its difficult to agree to. Reason why its used is coz its difficult to guess. Integers are guessable. Autoincrement is a no no!
And often its not the index on primary keys that adds value. But ya, its the default one anyways.
Great article! I'd have also included a couple of alternatives to UUID as well
Isn’t this why KSUIDs were invented? It’s a sortable globally unique key.