In some recent work I have been trying to generate Kotlin extensions to the standard Java code that is generated by the protocol buffer compiler, using the excellent kroto-plus plugin. For those who have not had the pleasure of working with the protocol buffer compiler, protoc
, it can be a frustratingly opaque tool to work with. In the process of attempting to understand why compilation was not working as expected, I ended up learning a lot more about how the compiler works.
My problem arose when attempting to move my protocol buffer compilation out of the safe confines of my own project into a shared definition repository that my employer uses, so that I could generate client bindings and server stubs for multiple languages. In making the move, all of a sudden the expected Kotlin extension code was not being generated at all, with no error messages or warnings. The mistake turned out to be trivial, but getting to a position where I could identify the mistake was frustrating.
Invoking protoc
When working with a single language, there are many tools that wrap protoc
, such that you never have to understand its interface. For my Kotlin project, the Gradle protobuf plugin follows the typical Gradle idiom: place proto files in a standard directory, specify what plugins to use, and voilà, you have generated code. However, in a polyglot environment, you often have to go a little deeper and at least understand how to invoke protoc directly.
An invocation can look something like:
protoc \
-I /opt/include \
-I /path/to/project/protos \
--java_out=/path/to/project/gen/java \
--grpc_out=/path/to/project/gen/java \
--plugin=protoc-gen-grpc=/usr/local/bin/protoc-gen-grpc-java \
--kroto_out=ConfigPath=/path/to/project/kroto.yml:/path/to/project/gen/java \
--plugin=protoc-gen-kroto=/usr/local/bin/protoc-gen-kroto-plus \
/path/to/project/protos/service.proto \
/path/to/project/protos/internals.proto \
...
Breaking this down:
The
-I
flag specifies an "include" directory, where protobuf files that are imported can be found. There is no module system to speak of for protobuf compilation, so there are just some loose conventions around namespacing using package names that correspond to directory structures. For the "well known" types like google.protobuf.Any, these will typically be sourced from some common include path - in the example above,/opt/include
, which exists within a docker container I'm using. Multiple import paths can be specified, and imports are searched for in all the specified include directories, using relative paths. For those familiar with traditional C compilers, this is a very common pattern for compilation in the era before versioned package management.The
--java_out=
flag specifies two things: that we want to generate Java code, and where we want that generated code to go. Java code generation is a built-in feature ofprotoc
, so this is all that's required in this case. The built-in generators for protoc arecpp
,csharp
,java
,js
,objc
,php
,python
andruby
.The
--grpc_out=
flag similarly specifies that we want to generate "grpc", and where to generate to. However, what does "grpc" mean here, as it is not a built-in generator type? By default, the compiler will look for a plugin on thePATH
, with nameprotoc-gen-grpc
.The
--plugin=protoc-gen-grpc=...
flag explicitly tells the compiler where to find the plugin executable for "grpc". In this case, we're pointing it to a version of protoc-gen-grpc-java. Effectively, we aliased "grpc-java" to "grpc"; we could have instead specified a "--grpc-java_out=" flag without specifying the explicit plugin reference, as long asprotoc-gen-grpc-java
could be found on thePATH
.Observant readers will notice something slightly different about the flag specified for
kroto
: it embeds a parameter to be passed to the plugin. The compiler's awkward syntax for doing this is to allow embedding a parameter with syntax--gen_out=param:/gen/path
. Only a single string parameter can be specified, but the full string between the first=
and the:
is treated as that parameter value. I have seen plugins use various conventions here to allow specifying multiple params, likekey1=value1,key2=value2,...,keyN=valueN
. Some instead use the parameter to point to an external configuration file, which is what the kroto-plus plugin does.Finally, a list of proto files to be parsed and sent to the generators is provided.
While you can use relative paths rather than absolute paths when invoking protoc
, I have stumbled over problems with mixing relative paths and import directives. I find it helps to keep me sane to use absolute paths when working with protoc
, so it can be very clearly determined where everything is coming from, irrespective of the current working directory.
Docker containers like those provided by Namely try to help out in a polyglot environment by hiding some of the details of protoc and plugin invocation behind a more uniform contract. I recommend trying these out to see if they fit your needs before implementing your own solution, but I have found that a basic understanding of the protocol buffer compiler and plugins is essential to success.
What are plugins, really?
Protobuf compiler plugins are standalone executables that interpret CodeGeneratorRequest protobufs from stdin, and produce CodeGeneratorResponse protobufs to stdout. The main protobuf compiler executable produces these requests, embedding a set of FileDescriptorProto instances for the parsed proto files. The response from the plugins embeds instructions on source files to be generated and their contents.
Plugins can therefore be implemented using any technology that can serialize and deserialize protobufs. Some are implemented in C++, some in Java, some in Go. It's a very flexible system, if a rather opaque one from the user's perspective when attempting to diagnose a problem.
Intercepting plugin requests and responses
As plugins just need to be something that the protoc
process can invoke and interact with using stdin and stdout, we can wrap virtually any plugin in a shell script to see what is being provided and returned, using tee
:
#!/bin/sh
tee /tmp/input.pb.bin | /usr/local/bin/kroto-plus | tee /tmp/output.pb.bin
While these files are binary encoded protobufs, they are dominated by text content, as you will see if you open them in a text editor. However, the protoc
binary can also decode binary protobufs to its "text proto" format. If we have a clone of the protobuf repo in ~/protobuf
, we can run:
protoc --decode=google.protobuf.compiler.CodeGeneratorRequest \
-I ~/protobuf/src ~/protobuf/src/google/protobuf/compiler/plugin.proto \
< /tmp/input.pb.bin
This will output the text format of the proto to stdout, making the contents a little easier to read. Similarly, you can do this for the output of the plugin:
protoc --decode=google.protobuf.compiler.CodeGeneratorResponse \
-I ~/protobuf/src ~/protobuf/src/google/protobuf/compiler/plugin.proto \
< /tmp/output.pb.bin
How did this help me?
A mentioned earlier, when attempting to use the kroto-plus plugin manually, it was not producing any kotlin output. This was weird, as it was producing kotlin output fine in my separate Gradle-based build environment.
I couldn't see what I was doing wrong: I was using the same version of protoc
, the same plugins, and the same source files, though moved around to fit the location conventions in my docker build container. I scrutinized the paths and everything looked correct, but I missed one small detail.
The kroto-plus plugin, as mentioned earlier, requires a parameter to be passed of form ConfigPath=/path/to/config
. I had transcribed this incorrectly as ConfigFile=/path/to/config
- that four character difference caused all of my problems. I had expected a mistake like this would cause an error, as I had seen errors emitted by the kroto-plus plugin before when pointed at an invalid path for the configuration. However, with an incorrect property name rather than an incorrect path, the plugin does nothing.
I was able to see the difference with the help of my little interception script: by recording the input to the plugin in the working environment and the broken environment, and then performing a diff, the mistake becomes readily apparent:
> diff in_working.proto.txt in_broken.proto.txt
23c23
< parameter: "ConfigPath=/path/to/kroto-config.yml"
---
> parameter: "ConfigFile=/path/to/kroto-config.yml"
After making a dent in my desk with my face, the fix was trivial, and the expected Kotlin output emerged as expected.
Conclusion
The protoc
tool is mysterious, and in many respects, poorly documented. During my time at Google using the internal version of Bazel, all the details of correctly compiling protocol buffers to usable code were hidden under several layers of abstraction. For those of us now outside the Chocolate Factory, we are mostly left to fend for ourselves in figuring out how to use this complex tool, or must accept being disintermediated by other tools that may not do what we need.
The approach presented above can help diagnose more complex problems than just typos: through the ability to observe the full input and output to plugins, differences in compiler versions, input source, paths and annotations can be easily observed.
Over time, I believe we will build a community knowledge base and consistent patterns for the usage of protobufs and gRPC. Tools like buf show promise in this regard, and wrappers like Namely's docker containers can provide a good reference for using protoc where the documentation is lacking - take a look at their protoc wrapping script for a real world usage of protoc for polyglot builds.
I hope at least one person out there finds this useful. This my first foray into public technical writing in a few years, and it feels good to share what I have learned beyond my immediate colleagues again. If you have any questions, feel free to contact me: iainmcgin-at-gmail-dot-com
.
Top comments (0)