Thomas Hansen for AINIRO.IO

Posted on Aug 18, 2022 • Edited on Jul 1 • Originally published at aista.com

DO NOT trust your frontend validators

#javascript #react #angular

Trusting frontend validation logic is like trusting a thief when he says he won't steal your wallet. Frontend validation is for convenience, to reduce HTTP requests, not for ensuring data quality. You can add TypeScript validators until your face turns red, and the moment some guy creates another frontend to consume your API, your validators are basically useless. Hyperlambda validators on the other side are executing on your server, which makes them much more valuable.

Data quality and Hyperlambda validators

I have worked with 50+ companies during my 25+ years as en enterprise software developer. Most of these companies struggled with poor data quality. Phone numbers would be written like; "John Doe", or "foo@bar.com". Manually going through 500,000 records to clean up garbage data is literally impossible. This reduces the data quality your employer has. Which again results in more trouble doing business. Which again leads to less profit. Which again leads to less salary for you. Data quality IS KING!

In the following video I am illustrating how to create server side validators with Hyperlambda, which is a much better alternative if you're to chose only one. If you want to follow the video hands on, you can register a Magic cloudlet here.

Adding server side validation ensures data quality, assuming all data goes in and out of your database through your backend API. This results in higher data quality over time, which again results in better business.

Only relying upon frontend validators created with for instance React or Angular, is asking for trouble. Very soon somebody will want to create another frontend client using for instance Swift or the Android SDK. As they do, they're going to bypass your validator logic. If they do, you will end up with garbage data in your database.

DO NOT trust frontend validators (alone!) - Because they're "mostly useless" from a data quality perspective

With Hyperlambda you've got validators for every imaginable purpose, ranging from email validators to regular expression validators. Don't trust users of your web API to supply you valid data, ensure it using validators.

Top comments (42)

Aaron Reese • Aug 18 '22

Don't do your validation in the API/middleware either! To be truly robust all constraints should be built into the database and APIs will call stored procedures for writes and stored procedures or views for reads. Whatever you do don't use the Repository pattern or ORMs for connecting to the database. At the end of the day the database is your one source of truth.
You should however carry out validation and give feedback at front end and API levels as sending data you know to be bad across the network is expensive, but ultimately when using a REST protocol you are unable to know if your data interaction is still valid until you try and hit the data store.

Thomas Hansen • Aug 19 '22

Don't do your validation in the API/middleware either! To be truly robust all constraints should be built into the database and APIs will call stored procedures

I love the idea of moving validation logic as close to my data storage as possible. However, I also don't like putting business logic directly into my database. Yes, it's an oxymoron, I know :D

But this is a matter of taste I guess. I see your point here, especially when you've got multiple APIs accessing the same database - However, I suspect it's difficult to prevent users from using raw insert and update statements anyways, which of course would bypass the stored procedure inserts and updates ...

However, I think this is a matter of taste tbh with you, and you're definitely "closer" to my personal opinion than the guys simply adding frontend RegEx validators to the mix ... ;)

Jack • Aug 19 '22

This is kinda my take too. I know the most robust way is to build validators and constraints directly into the database. But in reality, you should only need to validate data at its contact point.

Once I've validated the request payload, I (as the developer) should know that my data is "safe" and the only person who can screw it up is me 😅

danjelo • Aug 19 '22

I usually use constraints for at least PK/FK keys. I have gotten in serious mess a few times when there were none and data migrations and faulty logic put wrong ids as keys :)

As a side note, some ORM's such as EF Core have some nice code first functionality where validation in models are reflected in db as constraints.

Thomas Hansen • Aug 19 '22

As a side note, some ORM's such as EF Core have some nice code first functionality where validation in models are reflected in db as constraints

The problem I've got with EF is the disparity between the RDBMS and its "OOP circus". For instance, it's very tempting to just do myObject.Save(). This model of using a database increases bandwidth consumption (passing in whole object during updates for instance), it increases chatter towards DB, and it makes it harder to synchronise access, resulting in the need for "locking records" either logically, or physically somehow ...

danjelo • Aug 19 '22

Yes agree. Have to say I am not really a fan of ORM's in general for the OR impedance mismatch for one thing and its tendancies to generate hellish SQL :) Recently troubleshooted a slow EF Core query. Could not find the issue, likely some sort of "parameter sniffing" issue where the query plan was not used.

Aaron Reese • Aug 19 '22

@jack:

But in reality, you should only need to validate data at its contact point.

Getting a bit OT here, but I absolutely disagree. You are about to 'POST' a customer order. How do you know if between the time the customer started the order on the app/website and submitted it, that the finance team have not put the customer account on hold for non-payment. This can only be done on the back end. On a really busy system (e.g. Amazon on Black Friday) this order request may even go into a message queue and may not get processed for several minutes. By the time it gets loaded into the system, the stock may be gone or the account may be suspended.

Thomas Hansen • Aug 19 '22

These are problems 90 percent never faces …

Jack • Aug 19 '22

You've quoted me but without the italics which totally changes the tone of my statement 😆

I don't work at Amazon or anything close that kind of scale, and the chances of something going wrong between contact point and database is virtually (virtually) 0.

Thomas Hansen • Aug 19 '22

Hehe 😅

You wish I was sorry. Sorry, but I’m not 🤪😉

András Tóth • Aug 31 '22

However, I also don't like putting business logic directly into my database.
And why is that? Because the database is a really clunky coding experience.

I came to the conclusion that it is time for reimagining SQL. The language and connection must be modernized:

gain the ability to easily integrate with source control tools like git
modern programming language features: move away from thinking "it's a language to query the database" to have packages, code modules, unit testing/mocking capabilities

If this sounds ridiculous how does it sound to do n non-transactional rounds to the DB just because the team can only use ORMs and they don't know how to write the one action as one database transaction...

Thomas Hansen • Aug 31 '22

how does it sound to do n non-transactional rounds to the DB just because the team can only use ORMs

I've already covered ORMs ... ;)

Aaron Reese • Aug 19 '22

I also don't like putting business logic directly into my database

Why not? I can think of a few reasons but I would love to hear yours. To a certain extent I was being controversial with my original reply. Perhaps there should be a distinction between 'business logic' and 'data integrity'. Entering a telephone number and postal address in different countries doesn't break data integrity but it could be against business rules.
Ultimately someone/something has to be responsible for the validity of the data. If the data store is the one constant (you mentioned that [backend] users could do direct INSERT statements - well put the logic in a trigger.
In case you can't guess, I am a database guy. When the FE or API developers screw up the logic, guess who has to sort out the mess :)

Thomas Hansen • Aug 19 '22

Why not? I can think of a few reasons but I would love to hear yours

First of all I find it incredible hard to write validation logic in SQL. For instance, how do you validate an email address being valid in a stored procedure. I'm sure it can be done, I'm just not entirely sure if I want to see the code ... ;)

Perhaps there should be a distinction between 'business logic' and 'data integrity'

100% agree! Everything you can make the database take care of, you should make the database take care of, such as referential integrity, not null / versus null, field length, etc. However, in my video I illustrate a case where the validator semantically communicates that a field is not long enough back to the client. Validating things such as these in your stored procedure would be hard, and also probably result in an exception that it's impossible to return to the user because of security issues. Not to mention that the database is typically deployed on a different machine, possibly different network, as the backend API, resulting in one additional network request, resulting in that it's faster to validate in the API backend.

In case you can't guess, I am a database guy

Ahh, makes sense :)

By all means, apply as much data validation as you can in the database, I guess I just have a somewhat similar opinion to database validation as I do with frontend validation; "It's cool, nice to have, but don't exclusively do it" ... ;)

(For different reasons though)

guess who has to sort out the mess

I feel your pain ... :/

donny roufs • Aug 20 '22

At the end of the day the database is your one source of truth.

Uh no. That's far from true. Enterprise scaled applications are usually built around the domain meaning that the database is an implementation detail. Small apps that are usually aimed to be a POC or MVC might have their database as one source of truth but beyond that hell no.

András Tóth • Aug 31 '22 • Edited

Sorry to chime in so late, but there is a viewpoint not expressed here leading to the wrong conversations.
Who's responsibility is to keep data consistent or free of garbage? (i.e. invalid states)
Regarding if you say frontend, backend or database I have bad news for all of them.

If you think of the layers as an onion then...

The core would be the database (or distributed database, or databases if many services roll their own).
Then you can have n number of services that operate upon that. This means if you choose not to validate data written into your database now you have to do it n times!
Then you can have k number of frontends, plus other services and hackers with scripts that send in data. Obviously, as the article is also saying, at this point validation on the frontend is just UX ensuring a smooth user experience - i.e. when you input invalid data you do not need to wait for submitting the form to see the phone number format is wrong.

And then if you choose...

The database layer to ensure data validity/consistency you are going to face horrible source control options, inability to unit test solutions, hiring issues (sadly ORMs make hiring easier on the expense of good DB code), etc. Writing code to the DB is not a rich coding experience.
The service layer, then you need to figure out how to distribute your validator among many codebase, sometimes many languages! Good luck maintaining a validation library in Java, JS, Go, Rust etc. at the same time.

Everything sucks, just retire early and open a bar. 😐

Thomas Hansen • Aug 31 '22

Everything sucks, just retire early and open a bar

Respectfully, but if you look at your web API as a micro service, you can ensure all clients are using the same micro service to interact with the database. Creating such "bottle necks" is often very valuable, since it implies arguably the equivalent of "single source of truth" in regards to code able to modify data, and leads to the same nice place as "single source of truth" leads to related to data normalisation and similar constructs ...

András Tóth • Aug 31 '22

The point is, you are better off if you think about the bottlenecks than if you are not.

As a side note it horrifies me whenever I read "80-90% backend applications are simple CRUD applications, therefore they can be autogenerated from a document.". If your application is a simple CRUD you either don't have data consistency or an actual useful, sellable product.

Thomas Hansen • Aug 31 '22 • Edited

If your application is a simple CRUD you either don't have data consistency or an actual useful, sellable product

Define CRUD. Our "CRUD" generator allows you to apply.

reCAPTCHA values for individual verbs towards individual tables
Authorisation requirements for individual verbs towards individual tables
Row level security implying for instance users cannot see individual records that aren't their own "property"
Decide which rows are included in which CRUD verb endpoint
Automagically takes care of foreign keys, adding auto complete widgets in the frontend when you've got a foreign key, doing lookups into the referenced table
Publishing socket messages upon write invocations towards data
Implement caching
Log invocations
Add validators server side for individual fields
Etc, etc, etc ...

I'd say that covers about 80% to 90% of the stuff me and you typically do, assuming your background is enterprise software development ... ;)

... unless of course you're one of these guys always looking for an opportunity to make stuff more complex ... :/

an actual useful, sellable product

Psst, Microsoft Office Access ...?

Last time I checked it was selling pretty decent ...? ;)

Ricardo • Aug 20 '22

What? I don’t mean to be rude. But I disagree.
Owner of business logic is the application (C#, Java, Jose, whatever). Database is only the repository.
Validation belongs to businesses logic

Aaron Reese • Sep 1 '22

As Thomas pointed out, validation belongs in the bottleneck. So if ALL your data manipulation goes through ONE set of APIs you are safe to put the logic there. I proposed the database because it is (almost) always the ultimate bottleneck.
If your app offered different data storage solutions then the case for dB logic is diminished.
A number of references have been made to source code control of the database. For MSSQL there is a superb range of products from RedGate for SCC, migration, data compare, lineage, unit testing and a few other critical tools.

Thomas Hansen • Aug 21 '22

If you use your database as only the repository, you're missing out on a lot of features and safe guards. Of course, for a NoSQL guy what you're saying makes sense, simply because you've got no features allowing you to even validate your objects in your database. However, for the rest of us, the database and its schema can help us with a lot of things, creating guarantees for us, that prevents garbage data from entering our "repository".

JoelBonetR 🥇 • Aug 18 '22 • Edited

Phone numbers would be written like; "John Doe", or "foo@bar.com".

The dev throwing raw queries directly into the database:

😂😂😂😂

I agree 100% on the topic, front-end validations were always for convenience and there's not one but two reasons for adding validations in the client side as I already commented here:

JoelBonetR • Aug 17 • Edited on Aug 17

Yes sure!
The same way you can disable JS in your browser, send a request using Postman directly to the endpoint or anything else.

Client code is loaded and running inside third party machines, hence you can't rely in frontend validations in any app. You'll need to re-validate the whole in backend anyway.

Validations in the frontend have 2 purposes:

Give feedback to the user as soon as possible.
Avoid requests to your server if the data that the user is about to send is not valid.

So yes, it's usable in a real product. If you do that and submit wrong data, the backend will throw an error about that and we should be good 😂

Original post for reference.

So at the end, the benefits are better UX and saving costs and server load of the requests that will necessarily fail in the back-end validation.

Thomas Hansen • Aug 18 '22

Word!! ^_^

However, what I find difficult is the fact that when using frontend validation, the code duplicates. As I update code, I've got two places I need to touch, possibly two different roles on my team too, to ensure they're both applying the correct changes. However, I do (mostly) agree on that frontend validation is necessary - Just don't TRUST it ... :)

JoelBonetR 🥇 • Aug 18 '22

Yes sure! the same way that adding a column in the database (DATA) also demands some changes in the server (BACK-END) and in any client consuming this information (FRONT-END).

I hear a noise far away that says "Decouple your system building blooooocks!"
Wait I'm hearing something else "*slap* Dependencieeeees!"

Never mind, must have been the wind 😁

Thomas Hansen • Aug 18 '22

Hahahahaha :D

Well, there is a difference, because one is duplicating logic, while the other is not really duplicating things, but simply allowing for field to move back and forth - But I see your point ^_^

JoelBonetR 🥇 • Aug 18 '22

Well it's like in security at the end, you've different layers (data security, endpoint security, application security network Security...) and at the end you won't be trusting any 😅 and develop a contingency plan "just in case" everything fails.

If we going strict, you don't need to add validations in frontend "as is", just in the backend; But if you do, you earn the benefits (lower cost by lowering the requests and happier customers) so it's not doing job for nothing 😁

Thomas Hansen • Aug 18 '22

I agree, I guess I'm just spoiled with Hyperlambda HTTP requests never using more than 100ms before returning ... ;)

Christian Gröber • Aug 22 '22

I desperately hope this isn't news to anyone that's been in code for more than a month

Thomas Hansen • Aug 22 '22

Hahahaha - You wish ... ^_^

I've seen stuff like this in companies with thousands of employees, handling sensitive financial data, and the paradox is that these parts was the good parts of the codebase ... ;)

There's a reason why I'm writing about it ...

Christian Gröber • Aug 22 '22

Yikes! I was seriously shocked when I saw the title of your post, guess I had every reason to be...

What do you do where you get such insight?

Thomas Hansen • Aug 22 '22

I can't disclose actual employers for obvious reasons, but I've been an enterprise software developer for 25+ years, working for companies having software development as a secondary function. I've worked with FinTech, Health, Streaming, and everything really. The problem at such companies, at least those I've been working for, is that as long as the software works, it's good enough.

One project I was working for added a Guid to all HTTP requests as a query parameter. I couldn't understand why, before I started looking at its code. The Guid was the "authorisation mechanism", and was actually the primary key of the roles the user had to belong to in order to invoke some backend endpoint. This wasn't a small company either, it was a large company with 500+ employees, handling extremely sensitive data.

The above is just the tip of the iceberg. There's so much garbage code out there you wouldn't believe it if I told you.

Christian Gröber • Aug 22 '22

"Never touch a running system" is often taken way too literally.

I'm working with a major insurance company and when I look at how terrible their internal communication is, it makes me question how they're able to make any money. But then they're one of the most profitable companies in their field. That made me question all other companies as well, and has really been an eye opener. And what you're telling me only enforces this new understanding I have of how businesses work, thank you.

Completely unrelated, how bad 1-10 would you rate it if a system were to, say, use the hashed password as an API key? It feels extremely dirty, but I can't see how it could be of any security risk.

Thomas Hansen • Aug 22 '22 • Edited

Completely unrelated, how bad 1-10 would you rate it if a system were to, say, use the hashed password as an API key?

I wouldn't do it myself, but in theory, assuming you don't allow for authenticating somehow using the raw hash, it should not be any directly security risk. If I was to "fix" such a system, I'd double hash it, but it might be impossible due to integrations with other systems, where all systems needs to be updated, etc ...

Maybe creating "transition code" where the thing is double hashed, and both double hash and single hash is considered a "match", then log all single hashes, and slowly weed out single hashes over time. However, if this is your largest security issues, you're in a good place compared to some of the places I've been ... ;)

However, of course the correct way to create API keys is to create them dynamically, having them being completely unrelated of users and passwords, allowing admins to revoke them with time. That train has probably "left the station" for your current system ... :/

Edit - It depends upon how the password was hashed. I kind of assumed BlowFish and individual per record based salts. If this is not the case, the hash is almost as good as having raw access to password, due to rainbow dictionary brute force attacks ...

Christian Gröber • Aug 22 '22

Thank you for the elaborate response

The dev responsible keeps saying "in a future release", but more pressing issues keep on surfacing, so my guess is it won't be done until this potential risk gets abused

Thomas Hansen • Aug 22 '22 • Edited

I thought about your problem. The simple solution would just be to create lonbg lasting JWT tokens, assuming that's what you're using for plain auth. As in, creating JWT token lasting for 3 months or something. It doesn't provide eviction, but I suspect it'll be better than simply sending the hash ...

And, it wouldn't need to touch database, and the token will be valid if the user account changes pwd, etc, etc, etc. Not perfect, but way better ...

This would allow the token to "impersonate" a user account, removing all "special logic" required for API tokens ...

Ellis • Aug 21 '22

Agreed, plus... :)

Frontend & backend are both software and they are both apps, but they are two very different beasts, and should be judged and optimised differently:

Validation, indeed 👍
Strict type definitions and safety, for which reason I think typescript is much overhyped and taken far too religiously. It's just a tool, i use parts of it.

iway1 • Aug 22 '22

The solution is always to do both. Front end validation is a significant UX improvement, backend validation is the only way to ensure data integrity.

Maxi Contieri • Aug 19 '22

Objects should always valid

No matter if they come from the frontend. backedn api, imports etc
Business rules should be in the object itself

Aaron Reese • Sep 1 '22

Only if you are really specific about your objects. E.g. a customer object can't have an email property as a string, it would need to be a type of email object because email needs special validation and is used also on suppliers, users, prospects etc. Before long you have a horrible DX where you are constantly writing order.dispatch.shipaddress.zipcode, order.dispatch.shipaddress.country