Helloo peeps, hope all of our holidays have been well and we all stayed healthy, positive and ready to change the world (and ourselves) for the better in this winter 🙌❄. Today we will be exploring some key word jumbling processes that differ in definition, processes and intentions. Hope you enjoy, as always I would appreciate a nudge if there's anything that needs correction!
Encoding
Encoding, or output encoding as the OWASP website mentioned it's commonly called, is essentially the transformation from one format to another specific format that is understood by a target you wish to send it to.
It's a sort of translation between languages. Typing a quick google search of "Types of encoding in programming", I can see that there are many, some of which is more common; Like URL encoding, HTML encoding and Base 64 encoding among others. You can also count the transformation of human readable code into 1's and 0's as encoding, because now the machine understands what we're trying to do with that code.
The purpose for this, as greatly put in the "Alice and Bob learn Security" book, is not to protect the data, but to change the format of the data to be used by wherever you want it to be used. From a security standpoint, the translated characters into that specific interpreter would also no longer possess any power to influence it.
In keeping with the security theme in this definition, encoding usually goes hand in hand with another popular term called "escaping". It is defined as a way to render a string input by a user as only a string, avoiding the possibility of inputting executable code. An example of this is when we use the backslash (\
) for line breaks and quotes etc.
Example of attack and how this technique defends against it
- A web app that uses user input in the URL as parameters
- An attacker can use this feature to input a potential redirect to a malicious site which is exactly identical
- Victim is fooled, and evil wins
Encryption
As encoding is about making data as easy to understand as possible, encryption is about making it as hard to understand as possible. This is because the point of encryption is to protect the data (the Confidentiality part of the CIA).
The word "Encryption" is actually part of a larger concept called "Cryptography". Coined as the science of obfuscating data, with the intention of only displaying meaning to anyone with the key to clarifying it. It's a process that's been used for thousands of years (example of the Caesar cipher by google).
Side note: Can we count certain emojis as a sort of encoding 🔥? Because anyone who doesn't know the context won't understand the other meaning behind it. I mean, the ancient Egyptians had emojis for days and it took us a while to decrypt it 👀
.
Encryption is the first part of the process, where we make the information no longer understandable. It's a two-way process so it can be "Decrypted" as well (made understandable again). It's true that encoding is technically two way as well, but the difference, at least from my perspective, is that encryption involves secret keys to get the actual value from the jumbled up words.
There are two types of encryption (ways to make the jumbled up words), one is more secure than the other so we'll go through them:
- Symmetric encryption
- Asymmetric encryption
Symmetric encryption
From the first word "Symmetric" we can tell that there is an equal "something" on both sides, and that thing is the key used to encrypt and decrypt the information. It's known as the shared key or private key and if someone were to get a hold of that, they would be able to both encrypt and decrypt valuable information. A con to this is that it's seen as the less secure out of the two types, but a pro is that it's cheap to execute and doesn't take much compute power.
Asymmetric encryption
Opposite to the previous type of encryption, this one has an inequality between encryption and decryption. And, you guessed it, the keys are different when doing either action. As google simply puts it "One is a public key shared among all parties for encryption. Anyone with the public key can then send an encrypted message, but only the holders of the second, private key can decrypt the message.". This, being the opposite, is seen as more secure, but takes more compute power because of the size of the public key (being stupidly large). It makes sense, then, for google to suggest that "asymmetric encryption is often not suited for large packets of data.".
Hashing
Going back to comparing the two-way system of encryption and encoding, hashing on the other hand is one way. This means that once it's jumbled up, that baby is gone and will not be understood again. You might think "that's a bit extreme" at first, and I thought that too. This was until I took myself outside of the box where the hashed values live, to see what the intention is.
First, though, let's take a step back and look at some concrete ways to define what hashing is.
TLDR: It's basically a unique fingerprint of the data that's created using that very same data, via an algorithm
- A hashing algorithm transforms blocks of data that a file consists of into shorter values of fixed length. In other words, a hash value is basically a summary of what is in that file. - crypto templar
- Hashing is the process of converting data — text, numbers, files, or anything, really — into a fixed-length string of letters and numbers. Data is converted into these fixed-length strings, or hash values, by using a special algorithm called a hash function. - codecademy
So what is the intention? Well, it seems from the get go that jumbling up this data may mean we want to hide it. However, according to Tanya Janca's "Alice and Bob learn security", she claps back at that idea saying that "data is not the value; proving your identity is". This makes sense since it's a one way jumbling process, if someone were to get it they wouldn't be able to use it anywhere. They can't take it back to the application and try enter it as a password, because the system will try and hash that hash and compare it with the hashed password in it's system (too many hashes, say it all again but quickly 🙄🧻💩).
The security issue of hashing and it's variations
Let's go into that scenario and pretend that someone gets the hashed version of that password, something like dfli458tv5oyo84ht4ti45ut45
. You would think that the damage is minimized by jumbling up the password, so they would find it near impossible to find the real value. Well, we'd be kind of wrong but not all the way.
To explain this further, we have to acknowledge the amount of hashing algorithms out there; starting with one of the earlier, probably commonly known, MD5 hashing algorithm. Created in the early 90s, it's aim was just as mentioned: jumble up words so that it's no longer usable, and it was used in a lot of places in systems for years. Until at some point technology and computing minds improved, and it exposed some pretty serious flaws in it. I was able to understand the list of vulnerabilities simpler in this link, I will list a couple of them here anyway:
- Collision Vulnerabilities : There's a chance that two different values can produce the same hash 😱 now imagine a bot that tries to create any value to match the hash, and it can run locally until it finds one.
- Preimage Attacks : It can apparently be reverse engineered, I did a shallow search to see if that is possible but I only got ideas for Collision attacks (first vulnerability).
Since MD5, however, there came more secure and now widely used hashing algorithms to last us for the moment. Some include SHA-256 and SHA-3. These alternatives provide better security against the attacks mentioned. But, oh boy 🤦♂️, we will eventually have to worry about quantum computing which is so powerful that there is a chance that some of the strong hashing algorithms today can be pushed aside. Apparently some of the SHA hashing family of algo's are quantum safe 🙃 from collisions and preimage attacks, and it's a long way from now so I can relax a little for the next n+ years lol.
Conclusion
So we learned a few things today about the different ways we jumble words up for security reason. We learnt that encoding is not actually made to protect, but to actually make it more readable to it's data's target - Encoding (i.e humans understand words, machines understand 1's and 0s). Unlike encoding, encryption values the data and does not want to even be found. So it uses different (secretive) styles of keys to jumble the data, so that any party with the key unlock it and get what they need. The last is hashing, which is one-way and doesn't care who sees the jumbled up value or not. We learnt that there are some vulnerable hashing algorithms out there, but just as current ones are being exposed everyday, newer and stronger ones pop up at a similar rate too.
I hope you enjoyed and learned form this post, I know I learnt a lot from it myself! Thank you for sticking around, as always please comment let me know of any good or bad, I wanna be better!
See you in the next post 👋
Top comments (0)