Intro
This post is the third part of the "Learning Microservices with Go". I'm writing these posts as I'm learning the concepts. If you haven't checked out the part 2, here's the link for it: Part 2.
Link for Git Repo: Github
In part 2, we implemented the service discovery for our movie app. It allowed us to create multiple instances of our services and run them without hard-coding the ports and addresses.
For this part, we'll be exploring serialization, various serialization formats, their comparison, implementation and some simple benchmarks.
The basics of serialization
Serialization is the process of converting data, into some format that which allows us to store it, transfer it and deconstruct it back.
Uses Of Serialization
There are 2 primary use cases for doing serialization:
- Transferring data between services and acting as the common language between them.
- Encoding and decoding the data for storage. Allowing to store complex data structures as strings or as bytes.
If you try to think, you've probably used one serialization format quite a bit. We ourselves used it in communication with the HTTP API endpoints. And that is ...... JSON.
JSON allowed us to transform the data when sending back responses from the endpoint and decode them on other side.
In Go, we have annotations that help the JSON encoder to transform our object into an output.
// Metadata defines the movie metadata.
type Metadata struct {
ID string `json:"id"`
Title string `json:"title"`
Description string `json:"description"`
Director string `json:"director"`
}
Let's try encoding an instance of this Metadata structure.
Metadata{
ID: "1",
Title: "Saving Private Ryan",
Description: "Movie about WW2",
Director: "Steven Spielberg",
}
The result after encoding will be:
{"id":"1","title":"Saving Private Ryan","description":"Movie about WW2","director":"Steven Spielberg"}
Some more use cases of serialization:
- Store configuration: Eg:
tsconfig.json
is used to specify the typescript configurations in many projects. - Storing record in Database: Many databases use JSON to store arbitrary data. Eg: Key-Value databases require converting the value to byte arrays. JSON is often used for this.
JSON
JSON is the most widely used serialization format.
- Most of the languages have tools/libraries/packages to handle encoding/decoding JSON.
- It has a great browser support. All major browsers including developer tools support working with JSON in the browser.
- It's having an easier readability and it's easier to use during development and debugging.
Limitations of JSON:
- Size: JSON is not a size efficient format. There are formats having lower encoded sizes for the same data.
- Speed: Similar to the size, the encoding and decoding speed is not the fastest against some other serialization protocols.
Serialization Protocols
Some of the most popular/used formats(other than JSON) are:
- XML
- YAML
- Apache Thrift
- Protocol Buffers
Let's see brief overview and differences between them.
XML
XML was one of the earliest serialization format. It represents data as a tree of nodes. You'll find it similar to the HTML structure. Our metadata struct would be encoded as:
<Metadata><ID>1</ID><Title>Saving Private Ryan</
Title><Description>Movie about WW2</
Description><Director>Steven Spielberg</Director></Metadata>
One downside of XML is the size of the output. You can see that the output is longer than JSON.
YAML
Yaml is one of the most popular serialization format. It's designed to be much more human readable. Our metadata struct would be encoded as:
metadata:
id: 123
title: Saving Private Ryan
description: Movie about WW2
director: Steven Spielberg
The formats that we've discussed so far are primarily used for defining and serializing arbitrary types of data.
Some solutions are more focused towards using the serialization format mainly for communication between services. For eg:
Apache Thrift
Apache Thrift is a combination of serialization and a communication protocol. It was created by Facebook but became open source later.
Unlike the json, xml, it requires you to define your structures in .thrift files.
struct Metadata {
1: string id,
2: string title,
3: string description,
4: string director
}
Thrift code generator uses the thrift file to generate code for structures in chosen programming language along with encoding, decoding logic.
Thrift has a higher encoding, decoding speed compared to JSON, XML and also the serialized data can be 30-50% smaller in size than XML, JSON.
Few disadvantages that thrift has is the lack of official documentation. Most of the documentation is unofficial. The serialized data is not readable. Thus there also has been a low adoption rate. And there are more popular and efficient formats.
Protocol Buffers
Protocol buffers or protobufs is a serialization format that was created at google more than 20 years ago. It became open source later. It has some benefits like:
- Small data output size along with high speed in serialization and deserialization.
- It allows to define data structures that can be used both by client and server code in multiple languages.
- It's having official support by Google and widespread use in microservices.
Using Protocol Buffers
We'll use protocol buffers now and see how it compares to other formats.
- We'll first install the protocol buffer compiler and the go plugin for the same:
Download the latest version from the Website and follow the Readme.
You can also use the apt repository(for Debian based Linux). NOTE: The version of the compiler might be old if you install using apt.
sudo apt update
sudo apt install protobuf-compiler
To generate Go code from the protobuf compiler, we'll install the Go plugin for protobuf.
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
Now we have all the tools needed.
- Create an
api
directory in the root and addmovie.proto
file under it with the following content.File: /api/movie.proto
syntax = "proto3";
option go_package = "/gen";
message Metadata {
string id = 1;
string title = 2;
string description = 3;
string director = 4;
}
message MovieDetails {
float rating = 1;
Metadata metadata = 2;
}
In the code above, we first added the protobuf syntax version we're using. Then we defined the output path of the code that'll be generated based on this file.
The rest is the two structures similar to what we defined in the first part of the series. Link
Now we'll generate the Go code from this file. In the root dir, run the following command.
protoc -I=api --go_out=. movie.proto
If the command is successful you'll see a gen
directory at the root. The directory will have a file called movie.pb.go
. It includes our structures and code to serialize and deserialize them.
For the above command,
-
-I=api
specifies the input directory for the protobuf file -
--go_out=.
specifies the output directory as the current directory for the generated go code. -
movie.proto
is the protobuf file name which has the actual content.
Note: The protobuf file itself adds the generated code to the /gen
package. Hence the final destination of the files would be /gen
instead of the root.
Here's what we achieved. We created a movie.proto file that defines our data schema. It provides these benefits to us:
- Explicit schema definition: Our schema is decoupled from the code and is independent now.
- Code Generation: Our schema is now converted to code via code generation. We'll send data between services via the code in next part.
- Cross-language support: We can generate the code for any language. If there's a service that's using another language, we can just generate the structures for it. Earlier it wasn't possible as the structures were coupled to Go.
Benchmarking
We'll do a quick benchmark between XML, JSON, Protobufs on the basis of size of serialized data and speed of encoding.
Encoding Size Benchmark
Add the following contents to a file /cmd/sizecompare/main.go
package main
import (
"encoding/json"
"encoding/xml"
"fmt"
"google.golang.org/protobuf/proto"
"movieexample.com/gen"
"movieexample.com/metadata/pkg/model"
)
// This is the model struct from the metadata package
var metadata = &model.Metadata{
ID: "1",
Title: "Saving Private Ryan",
Description: "Movie about WW2",
Director: "Steven Spielberg",
}
// This is the generated struct from the protobuf definition
var genMetadata = &gen.Metadata{
Id: "1",
Title: "Saving Private Ryan",
Description: "Movie about WW2",
Director: "Steven Spielberg",
}
func main() {
jsonBytes, err := serializeToJSON(metadata)
if err != nil {
panic(err)
}
xmlBytes, err := serializeToXML(metadata)
if err != nil {
panic(err)
}
protoBytes, err := serializeToProto(genMetadata)
if err != nil {
panic(err)
}
fmt.Printf("JSON size:\t%dB\n", len(jsonBytes))
fmt.Printf("XML size:\t%dB\n", len(xmlBytes))
fmt.Printf("Proto size:\t%dB\n", len(protoBytes))
}
Now let's add implemtation of the functions used
func serializeToJSON(m *model.Metadata) ([]byte, error) {
return json.Marshal(m)
}
func serializeToXML(m *model.Metadata) ([]byte, error) {
return xml.Marshal(m)
}
func serializeToProto(m *gen.Metadata) ([]byte, error) {
return proto.Marshal(m)
}
Do go mod tidy
to clean up imports.
Now let's do go run main.go
inside the directory with this file.
And the results are....
We see that the XML encoding is about 40% bigger than the JSON one. And protobufs is around 40% smaller than JSON. This is just an illustration of how switching to protobufs can just reduce the data that's being sent over the network.
Serialization speed benchmark
For this, we use the testing package from the Go library which will do an automated performance test and measure how fast the operation is.
Create a file main_test.go
in the same directory and add the following content to it.
package main
import (
"testing"
)
// Benchmark functions need to be in a file ending with _test.go
// and have a function name starting with Benchmark
func BenchmarkSerializeToJSON(b *testing.B) {
for i := 0; i < b.N; i++ {
serializeToJSON(metadata)
}
}
func BenchmarkSerializeToXML(b *testing.B) {
for i := 0; i < b.N; i++ {
serializeToXML(metadata)
}
}
func BenchmarkSerializeToProto(b *testing.B) {
for i := 0; i < b.N; i++ {
serializeToProto(genMetadata)
}
}
Every function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably.
Now we'll run the test by running: go test -bench=.
And the results for the speed benchmark are....
The output means that the loop ran 2552851 times at a speed of 501.4 ns per loop for the case of JSON. And similarly for others.
We see that JSON serialization was almost 2 times slower than the Protobufs. XML serialization however had almost an order of difference with Protobufs.
NOTE: Be careful of these comparisions. The benchmark was for the serialization operations and hence had a big difference. In some applications, you may not be performance bound by serialization and instead could be doing other heavy work. For that case you'll see a lower difference in performance.
Just a good to have in mind when seeing any benchmark is that everything is context dependent.
Tadaaa. We've completed understanding serialization and compared the different protocols for serialization.
In the next part we'll continue to use protobufs and will implement their use in our services.
Please do like the post if you found it helpful and learned something.
Checkout my twitter at: manavkush
Link for Git Repo: Github
Link for Reference Book: Book
See you next post.
Top comments (2)
good one. when its going to be completed?
I'm not really sure. I'm learning this alongside my work. Will try to push out next part by end of the week.