Edit 2: Lots of insightful comments at the bottom, do give them a read, too, before going with any alternatives!
Edit 1: Added a new take on 'Optim...
For further actions, you may consider blocking this person and/or reporting abuse
First off, efficiency in data formats like these only matters when you're manipulating a lot of data. If you're using a gigabyte of JSON then you're probably doing something wrong.
As far as your optimisations are concerned, I disagree on a few points.
This example:
isn't a way of making JSON more efficient, it's a way of changing your schema. If you want to store relationships, or have a collection of
order
objects, then you use the "inefficient" hierarchy, and if you need to have keys for the items (not really relevant in this example) then you use a keyed object, otherwise you use an array. My point is that these are things you will change depending on your schema, and they have little bearing on efficiency and none on JSON in particular.Rather, use keys which are consistent with the rest of your code.
Don't do this. When you're loading data, unless you're going through another parsing stage, these abbreviations are going to directly correspond with objects with confusing names. If you wouldn't write it as a variable name in your clear, self-documenting code, don't use it as a key in a data structure.
You talk about JSON being language agnostic, and efficiency in things like parsing numerical data, but really we know we're talking about the performance over the web. Your native application storing its configuration in JSON isn't going to notice any of these performance changes. This post is more like, "improving performance of sending human-readable data structures over the wire".
True Ben, your point makes sense. I was considering that for some use-cases, where data may need to be serialized or deserialized frequently, changing the schema may reduce verbosity.
But, yeah, your points are valid overall.
Thanks you've spared me some minutes writing the exact same comment
Indeed. This hurts readability and seems unnecessary most of the time
Remember that premature optimization is the root of all evil
I think that's unfair - there's nothing particularly wrong with the article if it's pitched as saving bandwidth, and the author has clearly made an effort to make it readable and interesting.
saving bandwidth in 2023 =P is not 100% shitty tho
"Language Agnostic" !? It literally has JavaScript in its acronym... I think you mean it's supported by several languages. Almost all languages will have a json converter. But it's not agnostic... let's take numbers for example: numbers is JS are just numbers, independently if they have or not decimals... however it's the converter job to transform it into int/float/decimal whatever type of numbers the other language work with.
Numbers in JSON are symbolic representations of numeric data, which type is not imposed by the format. It is up to the implementation to interpret, for example, numbers without decimal separator as integers, others as double-precision 64 bit just like JS, or single precision 32-bit numbers if the mantissa is short enough.
"'Language Agnostic' !? It literally has JavaScript in its acronym..."
You can use JSON in a Golang program, yeah? So it's language agnostic.
This is an interesting collection of notes and options. Thanks for the article!
Can you provide links, especially for the "Real-World Optimizations" section? I would appreciate being able to learn more about the experiences of these different companies and situations.
In the "Optimizing JSON Performance" section the example suggests using compression within Javascript for performance improvement. This should generally be avoided in favor of HTTP compression at the connection level. HTTP supports the same zlib, gzip, and brotli compression options but with potentially much more efficient implementations.
While protocol buffers and other binary options undoubtedly provide performance and capabilities that JSON doesn't, I think it undersells how much HTTP compression and HTTP/2 matter.
I did some small work optimizing JSON structures a decade ago when working in eCommerce to offset transfer size and traversal costs. While there are still some benefits to using columnar data (object of arrays) over the usual "Collection" (array of objects), a number of the concerns identified, like verbose keys, are essentially eliminated by compression if they are used in repetition.
HTTP/2 also cuts down overhead costs for requests, making it more efficient to request JSON – or any format – in smaller pieces and accumulate them on the client for improved responsiveness.
There are some minor formatting issues, and it is lacking in sources, but it provides a great base of information and suggestions.
You're right about the HTTP compression, Samuel. Have added your perspective in that section. For referring resources in Example, I've added the links in those sections.
On point 4.a (avoid repetitive data), you might actually need to use the "inefficient" way, as you will most likely need an "id" type of field.
In the "inefficient" way, the property name can serve as an "id", but in the "efficient" way, you would need a new "id" property.
For most of these tips, they seem to all be micro-optimizations, as the majority of the slowness would be down to the frontend framework you use, as well as the many third party packages.
Being able to improve performance is nice,
but as a dev I care the most about the developer experience:
And missing for me is real examples, with open source code on
All the features listed for each serialization format sound completely identical to me.
Since most of these tools come from server environments its even questionable if you can
The solution for JSON to be sent compressed in Brotli or GZip format is very good. But one thing, if you run an application in production, the safest thing is to run it behind a proxy server (Nginx, Apache, Cloudflare, ...) to which it is preferable to enable serving all string responses (json, html, xml, css, js, ...) in compressed mode (brotli, gzip). Better to let the proxy compress the responses and not your application or you will use up important application resources that it may need for other processes.
On the other hand, although it may seem like a closed way of thinking, I have never been inclined towards JSON Schema. This has its reason for being, adding validation of the value, but it has a serious disadvantage: The type of data (among other things) that allows validating the value must be passed through each key, leaving an extremely heavy JSON. In essence, it should only be used when the response of a form is sent to an API, so that the server can also validate what it receives. Schema would be terribly inefficient for the response of a list (for example, a list of products from a catalog) because of the burden of unnecessary extra data.
Speaking of catalog products, a while ago I was in charge of an e-commerce that, instead of saving the shopping cart in the application, temporarily saved it in the browser's localStorage. When the buyer adds a product to the cart, the SKU and the requested quantity ["SKU89038", 5] are accumulated in an array and then go down to localStorage.
When placing the purchase order, what is sent from the customer to the application is a list of SKU's + Quantity. No further information is needed at this point, making the process quite efficient. But when it is the other way around in the case of displaying a list of products, only exactly the data that is needed is shown and no more.
While designing how I would return responses from product lists I explored the idea of returning a json containing a list of arrays with only the values. Then, the first member of that list would be all the keys in each column (focused on the human who was programming the front-end), in order to avoid adding extra bytes to the response. In the end it was an over-optimization and we ruled out doing it, returning the json with their respective traditional key-values.
Thanks for the article. I learnt some things :)
Auth0 wasn't as impressed by Protocol Buffers as I was expecting them to be.
I'm kind of shocked that CBOR and the excellent github.com/fxamacker/cbor library aren't covered in this. You get JSON interop without the wackiness of protobuf serialization and can be equally compact to protobufs with the
keyasint
andtoarray
struct tags when needed.On top of that you can still utilize JSON schema fairly easily, for example huma.rocks/ supports JSON/CBOR (via client-driven content negotiation) out of the box and uses OpenAPI 3.1 & JSON Schema for validation.
Hey THANKS for the inspiration. I always wanted to try a Mongodb Atlas Cluster but I never considered the notion that BSON is much faster than JSON. So I gave it a whirl and found that manipulating data in Mongo is 90% faster according to my testing.
When inserting a record into a JSON array there is no append method so we need to fetch the entire array with the record added and then resave the array which overwrites the original structure. In my app this takes 3000ms to add a record. Over in Mongo we have the insertOne() method which only takes 200ms. My collection has 14k records so inserting via JSON is not practical. But for much smaller use cases like dashboards using javascript array methods on JSON can be practical.
I use AWS Lambda. In 7~ lines I can insert a record into Mongo with the following:
import { MongoClient } from "mongodb"; //a very small library
const client = new MongoClient(process.env.MONGODB_CONNECTION_STRING);
//connection string saved as key/pair in env file
export const handler = async(event) => {
const db = await client.db("test");
const collection = await db.collection("tracker3");
const myobj = {
"xyz":"value",
"other key":"123"
};
//const body = await collection.find().toArray(); //gets entire collection
const result = await db.collection("tracker3").insertOne(myobj); //this is the insert command
//return body; //the toArray methods puts in json array..then can be exported to csv
};
to do same as JSON array I have bigger process also with a much bigger import
const AWS = require('aws-sdk'); //a very large library
const fetch = require('node-fetch');
const s3 = new AWS.S3();
exports.handler = async (event) => {
const res = await fetch('xxxxx.s3.us-east-2.amazonaws.com/t...);
const json = await res.json();
// add a new record using array method push
json.push({
country: event.country2,
session: event.ses,
page_name: event.date2,
hit: event.hit2,
ip: event.ip2,
time_in: event.time2,
time_out: event.time3,
event_name: event.city2
});
var params = {
Bucket: 'xxxxxx',
Key: 'tracker2.json',
Body: JSON.stringify(json), //pass fetch result into body with update from push
ContentType: 'json',
};
var s3Response = await s3.upload(params).promise();
};
That's pretty time saving, imo. Did you tried any other more efficient options?
how about gRPC?
An efficient option for getting smaller payloads. Tried this one? Also this was something I wrote sometimes back with gRPC dev.to/nikl/building-production-gr...
Super interesting article, certainly something to think about!
This may also be of interest: dev.to/samchon/i-made-express-fast...
You have good details in here, but my biggest problem is that the title is click bait. You're talking about transfer protocols and schema requirements. JSON is a display format, obviously the fastest way to transfer data is not as an unmodified string. Your alternatives, while maybe not exactly JSON, even looks like JSON or use JSON to create the encodings. I think you're going to confuse new developers by the title. The last thing I'd want people to do is over engineer simple problems and stop using JSON by simply internalizing "JSON is slow". Instead of sounding like an attack on JSON, I would've better highlighted reasons people augment JSON or use different tools beyond just speed.
"JSON is a display format". Hmm. Well, it can certainly be displayed, and it was designed to be human-readable, but... in its origin, JSON was meant to be a quick and effective way of passing around objects in JavaScript — in those very remote times when JavaScript was still thought as merely a "cool" way to do some visual manipulation of HTML on the browser, and we only had interpreted JavaScript in any case...
I concede that it's a "data interchange format" and not merely for display. But my point is that this article doesn't seem to acknowledge that JSON is an essential part of building a web application and is not a bottleneck for 99% of applications. A bad database query will take far longer than the time to send and receive JSON. While that may seem obvious to you, it can cause unnecessary confusion for new developers. Honestly, when I first read this article I was only thinking about web development. But after reading it again, it doesn't mention web development specifically. If you're sending data from a native mobile app to a rust backend, then using protocol buffers is a good choice. However, if you're using JavaScript (the most popular programming language in the world) you can't even instantiate a protocol buffer without using JSON. I just wanted the title to be changed from something like "JSON is Slower" to "Sending JSON over a network is too slow for large scale applications". Clickbait titles and headings could proliferate protocol buffers into every app making the world worse with unneeded complexity. If you have a JS front end and a JS backend, prematurely optimizing network calls to remove JSON would be insanity. If the thought of JSON bothers you, don't even touch web development, your soul will be crushed. The web is horribly slow in so many ways, and it's certainly not from kilobytes of JSON, trust me. New developers need this nuance. I rest my case.
very helpful notes ...helpful
thinks
✌️
Appreciate the insights - lots to think about in here
When transferring large JSON data between server and client, I implement like this (example is not in a particular language, consider it like an algorithm):
On the server side:
^
queryResult
is not a scalar data type, its datatype is language specific.^
getFields()
returns the names of the fields, got in the query's result, in an array, like the following:fields = ['id', 'name', 'email_address']
^
getRecordsOnlyInArray()
returns rows only in arrays, instead of objects, without field names. It will be an array of arrays, like the following:data = [ [1, 'Nikunj', 'nikunj@example.com'],
[2, 'John', 'john@example.com'],
[3, 'Martin', 'martin@example.com'] ]
A JSON object is sent, after converting it to a string, with two properties: fields and data:
On the client side
The string sent from the server is received, converted to objects/arrays according to JSON, and assigned to the variable
fieldsAndRows
.As the variable name
fieldsAndRows
is bigger than usual, and contains two separate properties for us to access, assign each property to two separate variables. (Here it is hoped that the new variables are just references to the two related properties of the variablefieldsAndRows
. If the language does not support such referencing and instead creates duplicate arrays for fields and rows then use a shorter name, likefr = stringToJson(getData())
, instead offieldsAndRows
.)Now create an object having fields' names as their properties' names, and assign index number of the same field to each respective property using a loop:
Now these can be used as below to access value of a particular field of a particular row:
BTW, to reduce load on the server, it is most of the times better to convert/manipulate data on the client side.
Informative article with great instructive comments, I will try Avro, now I'm trying
this: github.com/matteobertozzi/yajbe-da...
https://www.youtube.com/watch?v=ilbsLXa7uT8&t=110s
Hi Nik,
You mentioned 'JSON Schema', but the last time I looked there was no standardised approach to this. Has the changed and if so could you please provide a reference.
Regards, Tracy
Hmm... I'm not sure about this. You're not making things more efficient as much as you're changing the schema and losing the relationships of the key/value pairs.
If you're experiencing inefficiencies due to JSON, you're probably in need of a database — or you're just not organizing your JSON files efficiently.
This is one of the best posts I have ever come across on this platform since the time I joined. The quality and the topic are both top notch. Excellent. I'm devoid of words to emphasize enough on how this is an ideal post.
I use the following combo as graphQL( or similar ) + protobuf (or similar) + servereside cach (when possible )+ client side cach (with optimistic update)+ http compression
Some feedback about this stack ?
There is no good support for protobuf in client side js. compilers and libraries have a lot of bugs and interpret protobuf to json in their own way and if you want to switch library or switch back to json you have to change a lot of stuff all over the app 😔
It's also not compatible with most S3 implementations.
what is "S3" ?
Hi, can I repost your article on my website (uhtred.dev)!?
You will be mentioned as the author with link references.
Sure. Just use the canonical.
Wonderful 🤩!
I'll need some information.
The information I need is just to display in your author profile on my website. See my profile for an example (uhtred.dev/authors/@uhtred.dev).
I will use your public information here on dev.to. But you can reply here with the information or send me in away that is comfortable for you.
will share on email. email?
social@uhtred.dev
Alternative to JSON is HTML - htmx.org/
Amazing comparison, we also recently made a similar benchmark, but also compared http and grpc - packagemain.tech/p/protobuf-grpc-v...
Insightful.
Here is a web based all in one tool for developers, coderkit.dev.
I'm curious if someone had experience with the alternatives but on the client side, I mean in the browser.