Introduction
In my previous post, i implemented base64 encoding but it only works for non-unicode characters. In this post, we will enhance our base64_encode
function to make it support the whole planet =)).
What we did wrong?
Let's have a look at RFC4648 and find out what we've been missing the whole time.
The encoding process represents 24-bit groups of input bits as output
strings of 4 encoded characters. Proceeding from left to right, a
24-bit input group is formed by concatenating 3 8-bit input groups.
These 24 bits are then treated as 4 concatenated 6-bit groups, each
of which is translated into a single character in the base 64
alphabet.
And a look at our code from previous post
for char in input.chars() {
...
}
See the bold section from the quote? that's where we went wrong. In other word, we do not guarantee that our character could be represented using 1 byte, only ASCII character could be represented using 1 byte, but unicode is not the case. For example, 'a pile of poo' 💩 is represented using 4 bytes (F0 9F 92 A9).
What is the fix?
It's easy to fix, we just need to modify the code to loop over each byte
in the str
. Lucky for us, Rust has a built-in method bytes
which create an Iterator
to loop over each byte in the str
. The good thing about Iterator is that they are lazy and won't do anything until being consumed, we can still keep our memory usage minimal.
for byte in input.bytes(){
}
Add more testing
A pile of poo 💩 should be encoded correctly to prove the correctness of our program.
assert_eq!("8J+SqQ==", base64_encode("💩"));
Top comments (0)