Pierre Jambet

Posted on Aug 17, 2020 • Edited on Sep 2, 2020 • Originally published at redis.pjam.me

Rebuilding Redis in Ruby - Chapter 5 - Redis Protocol Compatibility

#redis #ruby

What we'll cover

By the end of this chapter RedisServer will speak the Redis Protocol, RESP v2. Doing this will allow any clients that was written to communicate with the real Redis to also communicate with our own server, granted that the commands it uses are within the small subset of the ones we implemented.

One such client is the redis-cli utility that ships with Redis, it'll look like this:

RESP v2 has been the protocol used by Redis since version 2.0, to quote the documentation:

1.2 already supported it, but Redis 2.0 was the first version to talk only this protocol)

As of version 6.0, RESP v2 is still the default protocol and is what we'll implement in this chapter.

RESP3

RESP v2 is the default version, but not the latest one. RESP3 has been released in 2018, it improves many different aspects of RESP v2, such as adding new types for maps — often called dictionary — and a lot more. The spec is on GitHub and explains in details the background behind it.
RESP3 is supported as of Redis 6.0, as indicated in the release notes:

Redis now supports a new protocol called RESP3, which returns more semantical replies: new clients using this protocol can understand just from the reply what type to return to the calling program.

The HELLO command can be used to switch the connection to a different protocol version. As we can see below, only two versions are currently supported, 2 & 3. We can also see the new map type in action, hello 2 returned an array with 14 items, representing 7 key/value pairs, whereas hello 3 leveraged the new map type to return a map with 7 key/value pairs.

127.0.0.1:6379> hello 2
 1) "server"
 2) "redis"
 3) "version"
 4) "6.0.6"
 5) "proto"
 6) (integer) 2
 7) "id"
 8) (integer) 6
 9) "mode"
10) "standalone"
11) "role"
12) "master"
13) "modules"
14) (empty array)

127.0.0.1:6379> hello 3
1# "server" => "redis"
2# "version" => "6.0.6"
3# "proto" => (integer) 3
4# "id" => (integer) 6
5# "mode" => "standalone"
6# "role" => "master"
7# "modules" => (empty array)

127.0.0.1:6379> hello 1
(error) NOPROTO unsupported protocol version

127.0.0.1:6379> hello 4
(error) NOPROTO unsupported protocol version

Support for the HELLO command and RESP3 might be added in a later chapter but it's not currently on the roadmap of this online book.

Back to RESP v2

The official specification goes into details about the protocol and is still reasonably short and approachable, so feel free to read it, but here are the main elements that will drive the changes to our server.

The 5 data types

RESP v2 defines five data types:

Simple Strings
Errors
Integers
Bulk Strings
Arrays

The type of a serialized RESP data is determined by the first byte:

Simple Strings start with +
Errors start with -
Integers start with :
Bulk Strings start with $
Arrays start with *

The data that follows the type byte depends on each type, let's look at each of them one by one.

Simple Strings

A Simple String cannot contain a new line. One of its main use cases is to return OK back to the client. The full format of a Simple String is "A + character, followed directly by the content of the string, followed by a carriage return (often written as CR or \r) and a line feed (often written as LF or \n).

This is why Simple Strings cannot contain multiples lines, a newline would create confusion given that it is also use a delimiter.

The "OK" string, here shown in its JSON form, returned by the SET command upon success is therefore serialized as +OK\r\n.

redis-cli does the work of detecting the type of the response and only shows us the actual string, OK, as we can see in the example below:

127.0.0.1:6379> SET 1 2
OK

Using nc, we can see what the full response sent back from Redis is:

> nc -v localhost 6379
SET 1 2
+OK

nc does not explicitly display invisible characters such as CR & LF, so it is hard to know for sure that they were returned, beside the newline printed after +OK. The hexdump command is useful here, it allows us to see all the bytes:

echo "SET 1 2" | nc localhost 6379 | hexdump -C
# ...
00000000  2b 4f 4b 0d 0a                                    |+OK..|
00000005

The interesting part is the middle one, 2b 4f 4b 0d 0a, these are the 5 bytes returned by Redis. The part to the right, between pipe characters (|) is their ASCII representation. We can see five characters there, + is the ASCII representation of 2b, O is for 4f, K is for 4d, and the last two bytes do not have a visual representation so they're displayed as ..

2b is the hex notation of 43 ('2b'.to_i(16) in irb), and 43 maps to + in the ASCII table. 4f is the equivalent of 79, and the capital letter O, 4b, the number 75 and the capital letter K.

0d is the equivalent of the number 13, and the carriage return character (CR), and finally, 0a is 10, the line feed character (LF).

Redis follows the Redis Protocol, that's a good start!

Errors

Errors are very similar to Simple Strings, they also cannot contain new line characters. The main difference is that clients should treat them as errors instead of successful results. In languages with exceptions, a client library might decide to throw an exception when receiving an error from Redis. This is what the official ruby library does.

Similarly to Simple Strings, errors end with a carriage return and a line feed, let's see it in action:

❯ echo "GET 1 2" | nc localhost 6379 | hexdump -C
00000000  2d 45 52 52 20 77 72 6f  6e 67 20 6e 75 6d 62 65  |-ERR wrong numbe|
00000010  72 20 6f 66 20 61 72 67  75 6d 65 6e 74 73 20 66  |r of arguments f|
00000020  6f 72 20 27 67 65 74 27  20 63 6f 6d 6d 61 6e 64  |or 'get' command|
00000030  0d 0a                                             |..|
00000032

There are more bytes here, they represent the string: "Err wrong number of arguments for 'get' command", but we can see that the response starts with the 2d byte. Looking at the ASCII table, we can see that 45, the numeric equivalent of 2d, maps to -, so far so good.

And finally, the response ends with 0d0a, respectively CR & LF.

Integers

Integers have a similar representation to Simple Strings and errors. The actual integer comes after the : character and is followed by the CR & LF characters.

An example of integer reply is with the TTL and PTTL commands

The key key-with-ttl was set with the command: SET key-with-ttl value EX 1000.

> echo "TTL key-with-ttl" | nc localhost 6379 | hexdump -C
# ...
00000000  3a 39 38 38 0d 0a                                 |:988..|
00000006

The key not-a-key does not exist.

> echo "TTL not-a-key" | nc localhost 6379 | hexdump -C
# ...
00000000  3a 2d 32 0d 0a                                    |:-2..|
00000005

The key key-without-ttl was set with the command: SET key-without-ttl value.

> echo "TTL key-without-ttl" | nc localhost 6379 | hexdump -C
# ...
00000000  3a 2d 31 0d 0a                                    |:-1..|
00000005

All of these responses start with the 3a byte, which is equivalent to 58, aka :. In the two cases where the response is a negative value, -2 for a non existent key and -1 for an existing key without a ttl, the next byte is 2d, equivalent to 45, aka -.

The rest of the data, before the 0d & 0a bytes, is the actual integer data, in ASCII format, 31 is the hex equivalent to 49, which is the character 1, 32 is the hex equivalent to 50, which is the character 2. 39 & 38 are respectively the hex equivalent to 57 & 56, the characters 9 & 8.

A ruby client parsing this data would extract the string between : and \r\n and call to_i on it: '988'.to_i == 988.

Bulk Strings

In order to work for any strings, Bulk Strings need to first declare their length, and only then the actual data. This lets the receiver know how many bytes to expect, instead of reading anything until it finds CRLF, the way it does for a Simple String.

The length of the string is sent directly after the dollar sign, and is delimited by CRLF, the following is the actual string data, and another CRLF to end the string.

The RESP Bulk String representation of the JSON string "GET" is: $3\r\nGET\r\n.

Interestingly, it seems like Redis does not care that much about the final CRLF, as long as it finds two characters there, it assumes it's the end of the Bulk String and tries to process what comes after.

In the following example, we first send the command GET a to Redis over port 6379, as a an array of Bulk Strings, followed by the non existent command NOT A COMMAND. The response first contains the -1 integer, followed by the error.

irb(main):001:0> require 'socket'
=> true
irb(main):002:0> socket = TCPSocket.new 'localhost', 6379
irb(main):004:0> socket.write("*2\r\n$3\r\nGET\r\n$1\r\na\r\n*1\r\n$13\r\nNOT A COMMAND\r\n")
=> 35
irb(main):005:0> socket.read_nonblock(1024, exception: false)
=> "$-1\r\n-ERR unknown command `NOT`, with args beginning with: `A`, `COMMAND`, \r\n"

The following is handled identically by Redis, despite the fact the a Bulk String is not terminated by CRLF. We can see that Redis ignored the b and c characters and proceeded with the following command, the non existent NOT A COMMAND. I am assuming that the code in charge of reading client input first reads the length, then grabs that many bytes and jumps by two characters, regardless of what these characters are.

irb(main):027:0> socket.write("*2\r\n$3\r\nGET\r\n$1\r\nabc*1\r\n$13\r\nNOT A COMMAND\r\n")
=> 35
irb(main):030:0> socket.read_nonblock(1024, exception: false)
=> "$-1\r\n-ERR unknown command `NOT`, with args beginning with: `A`, `COMMAND`, \r\n"

There's a special value for Bulk Strings, the null Bulk String. It is commonly returned when a Bulk String would otherwise be expected, but there was no value to return. This happens in many cases, such as when there are no values for the key passed to the GET command. RESP represents it as a string with a length of -1: $-1\r\n.

Arrays

Arrays can contain values of any types, including other nested arrays. Similarly to Bulk Strings, arrays must first declare their lengths, followed by CRLF, and all items come afterwards, in their regular serialized form. The following is a JSON representation of an arbitrary array:

[ 1, "a-string", [ "another-string-in-a-nested-array" ], "a-string-with\r\n-newlines" ]

The following is the RESP representation of the same array:

*4\r\n:1\r\n$8\r\na-string\r\n*1\r\n$32\r\nanother-string-in-a-nested-array\r\n$24\r\na-string-with\r\n-newlines\r\n

We can include newlines and indentation for the sake of readability

*4\r\n
  :1\r\n
  $8\r\na-string\r\n
  *1\r\n
    $32\r\nanother-string-in-a-nested-array\r\n
  $24\r\na-string-with\r\n-newlines\r\n

RESP has a special notation for the NULL array: *-1\r\n. The existence of two different NULL values, one for Bulk Strings and one for Bulk Arrays is confusing and is one of the many changes in RESP3. RESP3 has a single null value.

Requests & Responses

As we saw in a previous example, requests are sent as arrays of Bulk Strings. The command GET a-key should be sent as *2\r\n$3\r\nGET\r\n$5\r\na-key\r\n, or in plain English: "An array of length 2, where the first string is of length 3 and is GET and the second string is of length 5 and is a-key".

We can illustrate this by sending this string with the TCPSocket class in ruby:

irb(main):001:0> require 'socket'
=> true
irb(main):002:0> socket = TCPSocket.new 'localhost', 6379
irb(main):003:0> socket.write "*2\r\n$3\r\nGET\r\n$5\r\na-key\r\n"
=> 24
irb(main):004:0> socket.read_nonblock 1024
=> "$-1\r\n"

Inline Protocol

RESP's main mode of operation is following a request/response model described above. It also supports a simpler alternative, called "Inline Commands", which is useful for manual tests or interactions with a server. This is similar to how we've used nc in this book so far.

Anything that does not start with a * character — which is the first character of an array, the format Redis expects for a command — is treated as an inline command. Redis will read everything until a newline is detected and attempts to parse that as a command. This is essentially what we've been doing so far when implementing the RedisServer class.

Let's try this quickly with nc:

> nc localhost 6379
# ...
SET 1 2
+OK
GET 1
$1
2

The reason RESP's main mode of operations is more complicated is because inline commands are severely limited. It is impossible to store a key or a value that contains the carriage return and line feed characters since they're use as delimiters even though Redis does support any strings as keys and values as seen in the following example:

> redis-cli
127.0.0.1:6379> SET a-key "foo\nbar"
OK
127.0.0.1:6379> GET a-key
"foo\nbar"

Let's double check with nc to see what Redis stored:

> nc localhost 6379
# ...
GET a-key
$7
foo
bar

We could also use hexdump to triple check:

> echo "GET a-key" | nc localhost 6379 | hexdump -C
# ...
00000000  24 37 0d 0a 66 6f 6f 0a  62 61 72 0d 0a           |$7..foo.bar..|
0000000d

We can see the 0a byte between o/6f & b/62.

Without inline commands sending test commands would be excruciating:

> nc -c localhost 6379
*2
$3
GET
$1
a
$1
1

Note that we're using the -c flags, which tells nc to send CRLF characters when we type the return key, instead of the default of LF. As we've seen above, for RESP arrays, RESP expects CRLF delimiters.

Pub/Sub

Redis supports a Publish/Subscribe messaging paradigm, with the SUBSCRIBE, UNSUBSCRIBE & PUBLISH commands, documented on Pub/Sub page of the official documentation.

These commands have a significant impact of how data flows between clients and servers, and given that we have not yet added support for pub/sub, we will ignore its impact on our implementation of the Redis Protocol for now. Future chapters will add support for pub/sub and will follow the RESP specification.

Pipelining

RESP clients can send multiple requests at once and the RESP server will write multiple responses back, this is called pipelining. The only constraint is that commands must be processed in the same ordered they were received, so that clients can associate the responses back to each request.

The following is an example of sending two commands at once and then reading the two responses, in Ruby:

irb(main):001:0> require 'socket'
=> true
irb(main):002:0> socket = TCPSocket.new 'localhost', 6379
irb(main):003:0> socket.write "SET 1 2\r\nGET 1\r\n"
=> 16
irb(main):004:0> socket.read_nonblock 1024
=> "+OK\r\n$1\r\n2\r\n"

We first wrote the string "SET 1 2\r\nGET 1\r\n", which represents the command SET 1 2 and the command GET in the inline format.

The response we get from the server is a string containing the two responses, fist the Simple String +OK\r\n, followed by the Bulk String $1\r\n2\r\n.

Making our Server speak RESP

As far as I know there is no official test suite that we could run our server against to validate that it correctly follows RESP. What we can do instead is rely on redis-cli as a way to test the RESP implementation of our server. Let's see what happens when we try it with the current server. First let's start the server from Chapter 4:

DEBUG=t ruby -r"./server" -e "RedisServer.new"

and in another shell, let's open redis-cli on port 2000:

> redis-cli -p 2000

You should see the following the server logs:

D, [2020-08-12T16:11:42.461645 #91271] DEBUG -- : Received command: *1
D, [2020-08-12T16:11:42.461688 #91271] DEBUG -- : Response: (error) ERR unknown command `*1`, with args beginning with:
D, [2020-08-12T16:11:42.461925 #91271] DEBUG -- : Received command: $7
D, [2020-08-12T16:11:42.461960 #91271] DEBUG -- : Response: (error) ERR unknown command `$7`, with args beginning with:
D, [2020-08-12T16:11:42.462005 #91271] DEBUG -- : Received command: COMMAND
D, [2020-08-12T16:11:42.462036 #91271] DEBUG -- : Response: (error) ERR unknown command `COMMAND`, with args beginning with:

The server received the string "*1\r\n$7\r\nCOMMAND\r\n", which is the RESP representation of the string "COMMAND" in a single item array, [ "COMMAND" ] in JSON.

The COMMAND command is useful when running Redis in a cluster. Given that we have not yet implementer cluster capabilities, going into details about the COMMAND command is a little bit out of scope. In short the COMMAND command is useful to provide meta information about each command, such as information about the positions of the keys. This is useful because in cluster mode, clients have to route requests to the different nodes in the cluster. It is common for a command to have the key as the second element, the one coming directly after the command itself. This happens to be the case for all the commands we've implemented so far. But some commands have different semantics. For instance MSET can contain multiple keys, so clients need to know where the keys are in the command. While rare, some commands have the first key at a different index, this is the case for the OBJECT command.

Back to redis-cli running against our Redis server, if you then try to send a command, GET 1 for instance, redis-cli will crash after printing the following error:

Error: Protocol error, got "(" as reply type byte

This is because our server writes the string (nil) when it does find an try for the given key. (nil) is what redis-cli displays when it receives a null Bulk String, as we can see with the following example, we first send the GET 1 command with redis-cli and then with nc and observe the response in each case:

❯ nc -c localhost 6379
GET 1
$-1
# ...
> redis-cli
127.0.0.1:6379> GET 1
(nil)

Our server must send the null Bulk String, $-1\r\n, to follow RESP. This is what redis-cli tells us before stopping, it expected a "type byte", one of +, -, :, $ or *, but instead got (.

In order to use redis-cli against our own server, we should implement the COMMAND command, since it sends it directly after starting. We also need to change how we process client input, to parse RESP arrays of Bulk Strings. We also need to support inline commands. Finally, we also need to update the responses we write back, and serialize responses following RESP.

Let's get to it!

Parsing Client Input

Modules & Namespaces

Most of the changes will take place in server.rb. As the codebase started to grow, I thought it would be easier to start using ruby modules, so I nested the Server class under the Redis namespace. This will allow us to create other classes & modules under the Redis namespace as well. All the other classes have been updated to be under the Redis namespace as well, e.g. ExpireHelper is now BYORedis::ExpireHelper. BYO stands for Build Your Own. I'm purposefully not using Redis as it is already used by the popular redis gem. We're not using both at the same time in the same project for now, so it wouldn't really have been a problem. But say that you would like to use the redis gem to communicate with the server we're building, we will prevent any kind of unexpected errors by using different names.

# expire_helper.rb
module BYORedis
  module ExpireHelper

    def self.check_if_expired(data_store, expires, key)
      # ...
    end
  end
end

listing 5.1: Nesting ExpireHelper under the Redis module

Storing partial client buffer

As of the previous chapter we never stored the client input. We would read from the socket when IO.select would tell us there is something to read, read until the end of the line, and process the result as a command.

It turns out that this approach is a bit too aggressive. Clients should be able to send a single command in two parts, there's no reason to treat that as an error.

In order to do this, we are going to create a Client struct to hold the client socket as well a string containing all the pending input we have not process yet:

# server.rb
Client = Struct.new(:socket, :buffer) do
  def initialize(socket)
    self.socket = socket
    self.buffer = ''
  end
end

listing 5.2: The new Client class

We need to adapt process_poll_events to use this new class instead of the raw socket coming as a result of TCPServer#accept:

# server.rb
def process_poll_events(sockets)
  sockets.each do |socket|
    begin
      if socket.is_a?(TCPServer)
        @clients << Client.new(@server.accept)
      elsif socket.is_a?(TCPSocket)
        client = @clients.find { |client| client.socket == socket }
        client_command_with_args = socket.read_nonblock(1024, exception: false)
        if client_command_with_args.nil?
          @clients.delete(client)
          socket.close
        elsif client_command_with_args == :wait_readable
          # ...
        else
          # We now need to parse the input as a RESP array
          # ...
        end
      else
        # ...
      end
    rescue Errno::ECONNRESET
      @clients.delete_if { |client| client.socket == socket }
    end
  end
end

listing 5.3: Updated handling of socket in server.rb

Parsing commands as RESP Arrays

More things need to change in process_poll_events. We first append the result from read_nonblock to client.buffer, which will allow us to continue appending until we accumulate enough to read a whole command. We then delegate the processing of client.buffer to a different method, split_commands:

# server.rb
def process_poll_events(sockets)
  sockets.each do |socket|
    begin
      # ...
      elsif socket.is_a?(TCPSocket)
        # ...
        else
          client.buffer += client_command_with_args
          split_commands(client.buffer) do |command_parts|
            response = handle_client_command(command_parts)
            @logger.debug "Response: #{ response.class } / #{ response.inspect }"
            @logger.debug "Writing: '#{ response.serialize.inspect }'"
            socket.write response.serialize
          end
        end
      else
        # ...
      end
      # ...
    end
  end
end

def split_commands(client_buffer)
  @logger.debug "Full result from read: '#{ client_buffer.inspect }'"

  scanner = StringScanner.new(client_buffer.dup)
  until scanner.eos?
    if scanner.peek(1) == '*'
      yield parse_as_resp_array(scanner)
    else
      yield parse_as_inline_command(scanner)
    end
    client_buffer.slice!(0, scanner.charpos)
  end
end
#...

listing 5.4 Updated handling of client input in server.rb

split_commands is in charge of splitting the client input into multiple commands, which is necessary to support pipelining. As a reminder, since we're adding support pipelining, we have to assume that the content of client.buffer might contain more than one command, and if so, we want to process them all in the order we received them, and write the responses back, in the same order.

It also handles the two different versions of commands, inline, or "regular", as RESP Arrays. We use the StringScanner class, which is really convenient to process data from a string, from left to right. We call String#dup on the argument to StringScanner to make sure that the StringScanner gets its own instance. As we iterate through client.buffer, every time we find a whole command, we want to remove it from the client input. We do this with client_buffer.slice!(0, scanner.charpos). If client_buffer contains two commands, i.e. GET a\r\nGET b\r\n, once we processed GET a, we want to remove the first 7 characters from the string: GET a\r\n, so that we never attempt to process them again. Note that we only do this after yielding, meaning that we only ever treat a command as done after we successfully wrote to the socket.

We first peek at the first character, if it is *, the following should be a RESP array, and we process it as such. Otherwise, we assume that we're dealing with an inline command. Each branch delegates to a method handling the parsing of the string.

The yield approach allows us to process each parsed command one by one, once parsed, we yield it, and it is handled by the handle_client_command method, which has barely changed from the previous chapter.

Let's look at the parse_as_resp_array & parse_as_inline_command methods:

def parse_as_inline_command(client_buffer, scanner)
  command = scanner.scan_until(/(\r\n|\r|\n)+/)
  raise IncompleteCommand if command.nil?

  command.split.map(&:strip)
end

def parse_as_resp_array(scanner)
  unless scanner.getch == '*'
    raise 'Unexpectedly attempted to parse a non array as an array'
  end

  expected_length = scanner.scan_until(/\r\n/)
  raise IncompleteCommand if expected_length.nil?

  expected_length = parse_integer(expected_length, 'invalid multibulk length')
  command_parts = []

  expected_length.times do
    raise IncompleteCommand if scanner.eos?

    parsed_value = parse_as_resp_bulk_string(scanner)
    raise IncompleteCommand if parsed_value.nil?

    command_parts << parsed_value
  end

  command_parts
end

def parse_integer(integer_str, error_message)
  begin
    value = Integer(integer_str)
    if value < 0
      raise ProtocolError, "ERR Protocol error: #{ error_message }"
    else
      value
    end
  rescue ArgumentError
    raise ProtocolError, "ERR Protocol error: #{ error_message }"
  end
end

listing 5.5 Parsing RESP Arrays in server.rb

parse_as_inline_command starts by calling StringScanner#scan_until, with /\r\n/. scan_until keeps iterating through the string, until it encounters something that matches its argument. In our case it will keep going through client_buffer until it finds CRLF, if it doesn't find a match, it returns nil. We're not even trying to process the string in this case, it is incomplete, so we'll leave it in there and eventually reattempt later on, the next time we read from this client.

If the string returned is not nil, it contains the string, and in this case, we do what we used to, we split it on spaces, and return it as an array of string parts, e.g. GET 1\r\n would be returned as [ 'GET', '1' ]

parse_as_resp_array is more complicated. As a sanity check, we test again that the first character is indeed *, getch also moves the internal cursor of StringScanner, moving it to the first character of the expected length. Using scan_until we extract all the characters until the first CRLF characters in the client input.

If nil is returned, this means that we reached the end of the string without encountering CR & LF, and instead of treating this as a client error, we raise an IncompleteCommand error, to give the client a change to write the missing parts of the command later on.

expected_length will contain a string composed of the characters before CRLF & the CRLF characters. For instance, if the scanner was created with the string $3\r\nabc\r\n — The Bulk String representation of the string "3" — expected_length would be equal to "3\r\n". The Ruby String#to_i is not strict enough here. It returns 0 in a lot of cases where we'd want an error instead, such as "abc".to_i == 0. We instead use the Kernel.Integer method, which raises an ArgumentError exception with invalid strings. We catch ArgumentError and raise a ProtocolError instead.

In the next step we iterate as many times as the value of expected_length with expected_length.times. We start each iteration by checking if we reached the end of the string with eos?. If we did, then instead of returning a protocol error, we raise an IncompleteCommand exception. This gives a chance to the client to send the remaining elements of the array later on.

As mentioned above, a request to Redis is always an array of Bulk Strings, so we attempt to parse all the elements as strings, by calling parse_as_bulk_string with the same scanner instance. Before looking at the method, let's see how the two new exceptions IncompleteCommand & ProtocolError are defined and handled:

IncompleteCommand & ProtocolError are custom exceptions defined at the top of the file:

# server.rb
IncompleteCommand = Class.new(StandardError)
ProtocolError = Class.new(StandardError) do
  def serialize
    RESPError.new(message).serialize
  end
end

listing 5.6 The new exceptions in server.rb

RESPError is defined in resp_types.rb:

# resp_types.rb
module BYORedis
  RESPError = Struct.new(:message) do
    def serialize
      "-#{ message }\r\n"
    end
  end
  # ...
end

listing 5.7 The new RESPError class

They are handled in the begin/rescue block in process_poll_events:

# server.rb
begin
  # ...
rescue Errno::ECONNRESET
  @clients.delete_if { |client| client.socket == socket }
rescue IncompleteCommand
  # Not clearing the buffer or anything
  next
rescue ProtocolError => e
  socket.write e.serialize
  socket.close
  @clients.delete(client)
end

listing 5.8 Handling the new exceptions in server.rb

We don't write anything back when encountering an IncompleteCommand exception, we assume that the client has not finished sending the command. On the other hand, for ProtocolError, we write an error back to the client, following the format of a RESP error and we disconnect the client. This is what Redis does too.

Back to parse_as_resp_bulk_string:

# server.rb
def parse_as_resp_bulk_string(scanner)
  type_char = scanner.getch
  unless type_char == '$'
    raise ProtocolError, "ERR Protocol error: expected '$', got '#{ type_char }'"
  end

  expected_length = scanner.scan_until(/\r\n/)
  raise IncompleteCommand if expected_length.nil?

  expected_length = parse_integer(expected_length, 'invalid bulk length')
  bulk_string = scanner.rest.slice(0, expected_length)

  raise IncompleteCommand if bulk_string.nil? || bulk_string.length != expected_length

  scanner.pos += bulk_string.bytesize + 2
  bulk_string
end

listing 5.9 Parsing Bulk Strings

The first step is calling StringScanner#getch, it moves the internal cursor of the scanner by one character and returns it. If the first character is $, we received a Bulk String as expected. Anything else is an error.

Redis accepts empty strings, and while it may be unusual, it is possible for a Redis key to be an empty string, and a value can also be an empty string. If the expected length is negative, then we stop and return a ProtocolError

The next step is extracting the actual string. StringScanner maintains an internal cursor of the progress through the string. At this point this cursor is right after CRLF, where the string content starts. StringScanner#rest returns the string from this cursor until the end, and using slice, we extract only the number of characters indicated by expected_length.

If the result of this operation is nil or shorter than the expected length, we don't want to treat it as an error yet, since it is possible for the clients to write the missing elements of the command, so we raise an IncompleteCommand, in the hope that the client will send the missing parts later on.

The final step is to advance the cursor position in the StringScanner instance. We do this with the StringScanner#pos= method. Notice how we use the bytesize methods and two to it. We use bytesize instead of length to handle characters that span over multiple bytes, such as CJK characters, accentuated characters, emojis and many others. Let's look at the difference in irb:

irb(main):045:1* def print_length_and_bytesize(str)
irb(main):046:1*   puts str.length
irb(main):047:1*   puts str.bytesize
irb(main):048:0> end
=> :print_length_and_bytesize
irb(main):049:0> print_length_and_bytesize('a')
1
1
=> nil
irb(main):050:0> print_length_and_bytesize('é')
1
2
=> nil
irb(main):051:0> print_length_and_bytesize('你')
1
3
=> nil
irb(main):058:0> print_length_and_bytesize('😬')
1
4
=> nil

As we can see, all of these strings return 1 for length, but different values, respectively 2, 3 & 4 for bytesize. Going into details about UTF-8 encoding is out of scope, but the main takeaway from this is that what we consider to be a single character, might span over multiple bytes.

If a client had sent 你 has a Bulk String, we'd expect it to pass the length as 3, and therefore we need to advance the cursor by 3 in the StringScanner instance. We also add two to account for the trailing CRLF characters. Note that, like Redis, we do not actually check that these two characters are indeed CR & LF, we just skip over them.

Updating the command responses

The commands we've implemented so far, GET, SET, TTL & PTTL do not return data that follows the format defined in RESP. GET needs to return Bulk Strings, SET returns the Simple String OK or the null Bulk String if it didn't set the value and the last two, TTL & PTTL, return integers. We will first create new classes to wrap the process of serializing strings and integers to their matching RESP format:

# resp_types.rb
module BYORedis
  # ...
  RESPInteger = Struct.new(:underlying_integer) do
    def serialize
      ":#{ underlying_integer }\r\n"
    end

    def to_i
      underlying_integer.to_i
    end
  end

  RESPSimpleString = Struct.new(:underlying_string) do
    def serialize
      "+#{ underlying_string }\r\n"
    end
  end

  OKSimpleStringInstance = Object.new.tap do |obj|
    OK_SIMPLE_STRING = "+OK\r\n".freeze
    def obj.serialize
      OK_SIMPLE_STRING
    end
  end

  RESPBulkString = Struct.new(:underlying_string) do
    def serialize
      "$#{ underlying_string.bytesize }\r\n#{ underlying_string }\r\n"
    end
  end

  NullBulkStringInstance = Object.new.tap do |obj|
    NULL_BULK_STRING = "$-1\r\n".freeze
    def obj.serialize
      NULL_BULK_STRING
    end
  end

  RESPArray = Struct.new(:underlying_array) do
    def serialize
      serialized_items = underlying_array.map do |item|
        case item
        when RESPSimpleString, RESPBulkString
          item.serialize
        when String
          RESPBulkString.new(item).serialize
        when Integer
          RESPInteger.new(item).serialize
        when Array
          RESPArray.new(item).serialize
        end
      end
      "*#{ underlying_array.length }\r\n#{ serialized_items.join }"
    end
  end
  NullArrayInstance = Object.new.tap do |obj|
    NULL_ARRAY = "*-1\r\n".freeze
    def obj.serialize
      NULL_ARRAY
    end
  end
end

listing 5.10 The new RESP types

RESPArray is not strictly required at the moment since none of the commands we've implemented so far return array responses, but the COMMAND command, which we'll implement below returns an array, so it'll be useful there.

We could have chosen a few different options to represent the null array and the null list, such as adding the logic in serialize methods of RESPArray & RESPBulkString. I instead decided to create two globally available instances that implement the same interface, the serialize method. This allows the code in server.rb to always call serialize on the result it gets from calling the call method. On the other hand, in the *Command classes, it forces us to explicitly handle these null cases, which I find preferable to passing nil values around.

We use the String#freeze method to prevent accidental modifications of the values at runtime. Ruby will throw an exception if you attempt to do so:

irb(main):001:0> require_relative './server'
=> true
irb(main):002:0> BYORedis::NULL_BULK_STRING
=> "$-1\r\n"
irb(main):003:0> BYORedis::NULL_BULK_STRING << "a"
Traceback (most recent call last):
        4: from /Users/pierre/.rbenv/versions/2.7.1/bin/irb:23:in `<main>'
        3: from /Users/pierre/.rbenv/versions/2.7.1/bin/irb:23:in `load'
        2: from /Users/pierre/.rbenv/versions/2.7.1/lib/ruby/gems/2.7.0/gems/irb-1.2.3/exe/irb:11:in `<top (required)>'
        1: from (irb):3
FrozenError (can't modify frozen String: "$-1\r\n")

That said, do note that "constants" in Ruby aren't really "constants", it is possible to reassign the value at runtime:

irb(main):004:0> BYORedis::NULL_BULK_STRING = "something else"
(irb):4: warning: already initialized constant BYORedis::NULL_BULK_STRING
/Users/pierre/dev/redis-in-ruby/code/chapter-5/resp_types.rb:32: warning: previous definition of NULL_BULK_STRING was here
irb(main):005:0> BYORedis::NULL_BULK_STRING
=> "something else"

While it doesn't prevent all kinds of weird runtime issues, I do like the use of String#freeze to at least be explicit about the nature of the value, signifying that it is not supposed to be modified.

The OK Simple String is so common that I created a constant for it, OKSimpleStringInstance, so that it can be reused instead of having to allocate a new instance every time we need it. Only the SetCommand class uses it for now, but more commands use it, such as LSET, MSET and many others.

Let's start with GET:

# get_command.rb
module BYORedis
  class GetCommand

    # ...

    def call
      if @args.length != 1
        RESPError.new("ERR wrong number of arguments for 'GET' command")
      else
        key = @args[0]
        ExpireHelper.check_if_expired(@data_store, @expires, key)
        value = @data_store[key]
        if value.nil?
          NullBulkStringInstance
        else
          RESPBulkString.new(value)
        end
      end
    end
  end
end

listing 5.11 Updated response in GetCommand

Now that BYORedis::GetCommand has been updated, let's tackle SetCommand:

# set_command.rb
def call
  key, value = @args.shift(2)
  if key.nil? || value.nil?
    return RESPError.new("ERR wrong number of arguments for 'SET' command")
  end

  parse_result = parse_options

  existing_key = @data_store[key]

  if @options['presence'] # ...
    NullBulkStringInstance
  elsif @options['presence'] # ...
    NullBulkStringInstance
  else

    # ...

    OKSimpleStringInstance
  end

rescue ValidationError => e
  RESPError.new(e.message)
rescue SyntaxError => e
  RESPError.new(e.message)
end

listing 5.12 Updated response in SetCommand

The SET command has two possible outputs, either the nil string if the outcome was that nothing was set, as a result of the NX or XX options, or the Simple String OK if the outcome was a successful set. This is where the special case instances NullBulkStringInstance & OKSimpleStringInstance come in handy. By returning them here, the code in server.rb can leverage duck typing and call the serialize method, but under the hood, the same strings will be used, BYORedis::OK_SIMPLE_STRING & BYORedis::NULL_BULK_STRING. This is a very small optimization, but given how common it is to call the SET command, it is interesting to think about things like that to prevent unnecessary work on the server.

And finally we need to update TtlCommand and PttlCommand

# pttl_command.rb
def call
  if @args.length != 1
    RESPError.new("ERR wrong number of arguments for 'PTTL' command")
  else
    key = @args[0]
    ExpireHelper.check_if_expired(@data_store, @expires, key)
    key_exists = @data_store.include? key
    value = if key_exists
              ttl = @expires[key]
              if ttl
                (ttl - (Time.now.to_f * 1000)).round
              else
                -1
              end
            else
              -2
            end
    RESPInteger.new(value)
  end
end

# ttl_command.rb
def call
  if @args.length != 1
    RESPError.new("ERR wrong number of arguments for 'TTL' command")
  else
    pttl_command = PttlCommand.new(@data_store, @expires, @args)
    result = pttl_command.call.to_i
    if result > 0
      RESPInteger.new((result / 1000.0).round)
    else
      RESPInteger.new(result)
    end
  end
end

listing 5.13 Updated response in PttlCommand & TtlCommand

Case insensitivity

It is not explicitly mentioned in the RESP v2 documentation, but Redis treats commands and options as case insensitive. The following examples are all valid: get 1, GeT 1, set key value EX 1 nx.

In order to apply the same handling logic, we changed the keys in the COMMANDS constant to be lower case, and we always lower case the client input when attempting to find a handler for the command:

# server.rb
COMMANDS = {
  'command' => CommandCommand,
  'get' => GetCommand,
  'set' => SetCommand,
  'ttl' => TtlCommand,
  'pttl' => PttlCommand,
}
# ...

def handle_client_command(command_parts)
  @logger.debug "Received command: #{ command_parts }"
  command_str = command_parts[0]
  args = command_parts[1..-1]

  command_class = COMMANDS[command_str.downcase]

  # ...
end

listing 5.14 Updates for case insensitivity in BYORedis::Server

We also need to update the BYORedis::SetCommand class to handle options regardless of the case chosen by clients:

# set_command.rb
# ...
OPTIONS = {
  'ex' => CommandOptionWithValue.new(
    'expire',
    ->(value) { validate_integer(value) * 1000 },
  ),
  'px' => CommandOptionWithValue.new(
    'expire',
    ->(value) { validate_integer(value) },
  ),
  'keepttl' => CommandOption.new('expire'),
  'nx' => CommandOption.new('presence'),
  'xx' => CommandOption.new('presence'),
}
#...
def parse_options
  while @args.any?
    option = @args.shift
    option_detail = OPTIONS[option.downcase]
    # ...
  end
end
#...

listing 5.15 Updates for case insensitivity in SetCommand

The `COMMAND` command

In order to implement COMMAND, we added a describe method to each of the *Command classes, so that the CommandCommand class can iterate over all these classes and call .describe on them, and then serialize the result to a RESP array:

# command_command.rb
module BYORedis
  class CommandCommand

    def initialize(_data_store, _expires, _args)
    end

    def call
      RESPArray.new(Server::COMMANDS.map { |_, command_class| command_class.describe } )
    end

    def self.describe
      [
        'command',
        -1, # arity
        # command flags
        [ 'random', 'loading', 'stale' ].map { |s| RESPSimpleString.new(s) },
        0, # position of first key in argument list
        0, # position of last key in argument list
        0, # step count for locating repeating keys
        # acl categories: https://github.com/antirez/redis/blob/6.0/src/server.c#L161-L166
        [ '@slow', '@connection' ].map { |s| RESPSimpleString.new(s) },
      ]
    end
  end
end

listing 5.16 The new CommandCommand class

# get_command.rb

def self.describe
  [
    'get',
    2, # arity
    # command flags
    [ 'readonly', 'fast' ].map { |s| RESPSimpleString.new(s) },
    1, # position of first key in argument list
    1, # position of last key in argument list
    1, # step count for locating repeating keys
    # acl categories: https://github.com/antirez/redis/blob/6.0/src/server.c#L161-L166
    [ '@read', '@string', '@fast' ].map { |s| RESPSimpleString.new(s) },
  ]
end

# pttl_command.rb

def self.describe
  [
    'pttl',
    2, # arity
    # command flags
    [ 'readonly', 'random', 'fast' ].map { |s| RESPSimpleString.new(s) },
    1, # position of first key in argument list
    1, # position of last key in argument list
    1, # step count for locating repeating keys
    # acl categories: https://github.com/antirez/redis/blob/6.0/src/server.c#L161-L166
    [ '@keyspace', '@read', '@fast' ].map { |s| RESPSimpleString.new(s) },
  ]
end

# set_command.rb

def self.describe
  [
    'set',
    -3, # arity
    # command flags
    [ 'write', 'denyoom' ].map { |s| RESPSimpleString.new(s) },
    1, # position of first key in argument list
    1, # position of last key in argument list
    1, # step count for locating repeating keys
    # acl categories: https://github.com/antirez/redis/blob/6.0/src/server.c#L161-L166
    [ '@write', '@string', '@slow' ].map { |s| RESPSimpleString.new(s) },
  ]
end

# ttl_command.rb

def self.describe
  [
    'ttl',
    2, # arity
    # command flags
    [ 'readonly', 'random', 'fast' ].map { |s| RESPSimpleString.new(s) },
    1, # position of first key in argument list
    1, # position of last key in argument list
    1, # step count for locating repeating keys
    # acl categories: https://github.com/antirez/redis/blob/6.0/src/server.c#L161-L166
    [ '@keyspace', '@read', '@fast' ].map { |s| RESPSimpleString.new(s) },
  ]
end

listing 5.17 Updates for the COMMAND command in SetCommand, GetCommand, TtlCommand & PttlCommand

test.rb & test_helper.rb

Testing the BYORedis::Server class is becoming more and more complicated, in order to keep things clean, I moved a lot of the helper method to the test_helper.rb file, so that test.rb only contains the actual tests.

The assert_command_results helper has been updated to handle the RESP format. For the sake of simplicity, it assumes that the data is not serialized and does that for you. This allows us to write simpler assertions such as:

assert_command_results [
  [ 'SET 1 3 NX EX 1', '+OK' ],
  [ 'GET 1', '3' ],
  [ 'SET 1 3 XX keepttl', '+OK' ],
]

and the assert_command_results will serialize the commands as RESP Arrays for us.

I also added a new assertion helper, assert_multipart_command_results. It allows a little bit more flexibility around expectations for commands sent through multiple write calls. Instead of being a single command like in assert_command_results, the first element of the pair is itself an array of strings, each of them representing a sequence of characters that will be sent to the server. This is handy to test pipelining as well as edge cases with regard to RESP.

# test_helper.rb
# The arguments in an array of array of the form
# [
#   [ [ "COMMAND-PART-I", "COMMAND-PART-II", ... ], "EXPECTED_RESULT" ],
#   ...
# ]
def assert_multipart_command_results(multipart_command_result_pairs)
  with_server do |server_socket|
    multipart_command_result_pairs.each do |command, expected_result|
      command.each do |command_part|
        server_socket.write command_part
        # Sleep for one milliseconds to give a chance to the server to read
        # the first partial command
        sleep 0.001
      end

      response = read_response(server_socket)

      if response.length < expected_result.length
        # If the response we got is shorter, maybe we need to give the server a bit more time
        # to finish processing everything we wrote, so give it another shot
        sleep 0.1
        response += read_response(server_socket)
      end

      assert_response(expected_result, response)
    end
  end
end

def assert_command_results(command_result_pairs)
  with_server do |server_socket|
    command_result_pairs.each do |command, expected_result|
      if command.is_a?(String) && command.start_with?('sleep')
        sleep command.split[1].to_f
        next
      end
      command_string = if command.start_with?('*')
                         command
                       else
                         BYORedis::RESPArray.new(command.split).serialize
                       end
      server_socket.write command_string

      response = read_response(server_socket)

      assert_response(expected_result, response)
    end
  end
end

def assert_response(expected_result, response)
  assertion_match = expected_result&.match(/(\d+)\+\/-(\d+)/)
  if assertion_match
    response_match = response.match(/\A:(\d+)\r\n\z/)
    assert response_match[0]
    assert_in_delta assertion_match[1].to_i, response_match[1].to_i, assertion_match[2].to_i
  else
    if expected_result && !%w(+ - : $ *).include?(expected_result[0])
      # Convert to a Bulk String unless it is a Simple String (starts with a +)
      # or an error (starts with -)
      expected_result = BYORedis::RESPBulkString.new(expected_result).serialize
    end

    if expected_result && !expected_result.end_with?("\r\n")
      expected_result += "\r\n"
    end

    if expected_result.nil?
      assert_nil response
    else
      assert_equal expected_result, response
    end
  end
end

def read_response(server_socket)
  response = ''
  loop do
    select_res = IO.select([ server_socket ], [], [], 0.1)
    last_response = server_socket.read_nonblock(1024, exception: false)
    if last_response == :wait_readable || last_response.nil? || select_res.nil?
      response = nil
      break
    else
      response += last_response
      break if response.length < 1024
    end
  end
  response&.force_encoding('utf-8')
rescue Errno::ECONNRESET
  response&.force_encoding('utf-8')
end

def to_query(*command_parts)
  [ BYORedis::RESPArray.new(command_parts).serialize ]
end

listing 5.18 The new test helpers in test_helper.rb

Conclusion

We can now use redis-cli, with redis-cli -p 2000 to interact with our redis server:

> redis-cli -p 2000
127.0.0.1:2000> COMMAND
1) 1) "command"
   2) (integer) -1
   3) 1) random
      2) loading
      3) stale
   4) (integer) 0
   5) (integer) 0
   6) (integer) 0
   7) 1) @slow
      2) @connection
2) 1) "get"
   2) (integer) 2
   3) 1) readonly
      2) fast
   4) (integer) 1
   5) (integer) 1
   6) (integer) 1
   7) 1) @read
      2) @string
      3) @fast
3) 1) "set"
   2) (integer) -3
   3) 1) write
      2) denyoom
   4) (integer) 1
   5) (integer) 1
   6) (integer) 1
   7) 1) @write
      2) @string
      3) @slow
4) 1) "ttl"
   2) (integer) 2
   3) 1) readonly
      2) random
      3) fast
   4) (integer) 1
   5) (integer) 1
   6) (integer) 1
   7) 1) @keyspace
      2) @read
      3) @fast
5) 1) "pttl"
   2) (integer) 2
   3) 1) readonly
      2) random
      3) fast
   4) (integer) 1
   5) (integer) 1
   6) (integer) 1
   7) 1) @keyspace
      2) @read
      3) @fast
127.0.0.1:2000> GET a-key
(nil)
127.0.0.1:2000> SET name pierre
OK
127.0.0.1:2000> GET name
"pierre"
127.0.0.1:2000> SET last-name J EX 10
OK
127.0.0.1:2000> TTL last-name
(integer) 6
127.0.0.1:2000> PTTL last-name
(integer) 5016
127.0.0.1:2000> PTTL last-name
(integer) 2432
127.0.0.1:2000> DEL name
(error) ERR unknown command `DEL`, with args beginning with: `name`,

All the commands we already implemented work as expected and non implemented commands such as DEL return an unknown command error. So far so good!

In the next chapter we'll write our own Hashing algorithm and ban the use of the Hash class in our code.

Code

As usual, the code is available on GitHub.

DEV Community

Rebuilding Redis in Ruby - Chapter 5 - Redis Protocol Compatibility

What we'll cover

RESP3

Back to RESP v2

The 5 data types

Requests & Responses

Inline Protocol

Pub/Sub

Pipelining

Making our Server speak RESP

Parsing Client Input

Updating the command responses

Case insensitivity

The `COMMAND` command

test.rb & test_helper.rb

Conclusion

Code

Top comments (0)

Read next

Sending Emails with Queues in Node.js - Improve Your App’s Email Deliverys

Use Action Cable with Your Main PostgreSQL Database

📩My journey to send 100 mails to 500k effortlessly📩

What's New in Ruby on Rails 8

What we'll cover

RESP3

Back to RESP v2

The 5 data types

Requests & Responses

Inline Protocol

Pub/Sub

Pipelining

Making our Server speak RESP

Parsing Client Input

Updating the command responses

Case insensitivity

The COMMAND command

test.rb & test_helper.rb

Conclusion

Code

Read next

Sending Emails with Queues in Node.js - Improve Your App’s Email Deliverys

Use Action Cable with Your Main PostgreSQL Database

📩My journey to send 100 mails to 500k effortlessly📩

What's New in Ruby on Rails 8

The `COMMAND` command