Implement base64 encoding using Rust - [Part 2] Handle unicode characters

#rust #programming

Introduction

In my previous post, i implemented base64 encoding but it only works for non-unicode characters. In this post, we will enhance our base64_encode function to make it support the whole planet =)).

What we did wrong?

Let's have a look at RFC4648 and find out what we've been missing the whole time.

The encoding process represents 24-bit groups of input bits as output
strings of 4 encoded characters. Proceeding from left to right, a
24-bit input group is formed by concatenating 3 8-bit input groups.
These 24 bits are then treated as 4 concatenated 6-bit groups, each
of which is translated into a single character in the base 64
alphabet.

And a look at our code from previous post

    for char in input.chars() {
        ...
    }

See the bold section from the quote? that's where we went wrong. In other word, we do not guarantee that our character could be represented using 1 byte, only ASCII character could be represented using 1 byte, but unicode is not the case. For example, 'a pile of poo' 💩 is represented using 4 bytes (F0 9F 92 A9).

What is the fix?

It's easy to fix, we just need to modify the code to loop over each byte in the str. Lucky for us, Rust has a built-in method bytes which create an Iterator to loop over each byte in the str. The good thing about Iterator is that they are lazy and won't do anything until being consumed, we can still keep our memory usage minimal.

    for byte in input.bytes(){

    }

Add more testing

A pile of poo 💩 should be encoded correctly to prove the correctness of our program.

assert_eq!("8J+SqQ==", base64_encode("💩"));

DEV Community

Implement base64 encoding using Rust - [Part 2] Handle unicode characters

Introduction

What we did wrong?

What is the fix?

Add more testing

Top comments (0)

Read next

Debian in WSL not Ubuntu

What is AI and How Does It Work? A Beginner’s Guide

Effective Guest Posting Websites for Link Building

Mastering runCatching in Kotlin: How to Avoid Coroutine Cancellation Issues