DEV Community

Sam Rose
Sam Rose

Posted on • Originally published at samwho.co.uk on

Language Interoperability From the Ground Up

How does a function in Ruby call a function in C? How does a function in Kotlin call a function in Java?

It’s no secret that you can call functions in one programming language from another. Not only is it possible, it can be done in multiple ways. But how does it actually work?

This post is going to be a bottom-up, in-depth look at interoperability. Each section will build upon the previous until we have an understanding of the mechanisms that make language interoperability possible.

Who should read this post?

If you have been programming for a little while and have heard people talking about whether a library has “Java bindings” or “Ruby bindings.”

If you have been using a language like Kotlin for a while and called Java functions from Kotlin and wondered how on earth that works under the hood.

If you have a keen interest in the gory details of complicated systems, but want an explanation that doesn’t assume much and keeps the examples as simple and hands-on as possible.

Additionally, you will need to have the following programs installed on your computer and accessible from the command line if you would like to follow along with the examples: nm, gcc, nasm, gdb, javac javap, kotlinc,kotlin, ruby. All of the examples in this post were written and run on an Arch Linux installation, they’re not likely to work as shown here when run on a Mac or Windows machine.

Why is language interoperability important?

The majority of language interoperability involves higher level languages calling libraries written in lower level languages. A Java library that offers you the ability to call in to the native OpenSSL libraries, for example. Or a Rust library that provides a more idiomatic Rust API in to the cURL library written in C.

Duplication is bad. When you’ve written a complicated library in one language, reimplementing it in another is just one more thing to forget to maintain.

Additionally, it’s a good idea for young languages to be able to make the most of existing work. If you create a new language and include a mechanism for using existing native libraries, you can immediately draw upon decades of hard work.

The benefits of purity

Sometimes you might see a library advertised as a “pure Ruby” implementation, or a “pure Java” implementation of some other library. These libraries will be full reimplementations of their target technology.

For example, this is a pure Java implementation ofLevelDB. This would have been a lot of work for the author, but the advantage is that people will be able to use LevelDB from Java without having to install the native LevelDB library on their system and package that up with their Java code for deployment.

While a pure reimplementation can be a lot of up-front effort and maintenance, they can be easier to use.

Why start at the bottom and work up?

The key to interoperability is finding common ground. With computers, the common ground is a set of standards and conventions in the lower levels of how programs work that allow languages to speak to each other. Understanding these conventions is key to understanding when languages can interoperate, and when they can’t.

Rock bottom: assembly language

mov eax, 10
Enter fullscreen mode Exit fullscreen mode

That there is a line of “assembly code.” It is a single “instruction,” and it consists of a “mnemonic”, mov, and two “operands,” eax and 10.

This is the “move” instruction, and it instructs our CPU to move the value 10in to the register eax. It is part of an instruction set called “x86,” which is used in 32-bit Intel and AMD CPUs1.

What is a register?

Registers are small but fast bits of storage connected directly to your CPU. If you want to do something to some data, be it addition, subtraction, or some other operation, the data needs to first be loaded in to a register.

If you’re not used to writing code at this level, the concept of registers might seem silly. Why can’t we just store 10 in memory and operate on it there?

Because that isn’t physically possible. The CPU isn’t connected to memory directly. It is connected through a series of caches, and all memory access must go through the Memory Management Unit, or MMU. The MMU can’t process any operations, this happens in the Arithmetic Logic Unit, or ALU2, and the ALU is also directly connected to the CPU. It can only get at data if it is in a register.

The MMU and friends

This is not to scale. In reality, there are around a kilobyte of registers, a few hundred kilobytes of L1 cache, a megabyte or two of L2 cache, low tens of megabytes of L3 cache, and then memory is often tens of gigabytes.

Program flow

Doing mathematical operations is great and all, but for us to write code that does anything we need to be able to compare things and make decisions based on the outcome.

mov     eax, 10
sub     eax, 5
mov     ebx, 5
test    eax, ebx
je      equal

notequal:
jmp     exit

equal:
; do something here

exit:
; end of program
Enter fullscreen mode Exit fullscreen mode

We start this snippet with some moves and a subtract (sub). We introduce a new register, ebx. Then we do a test. The test instruction compares two values, they can either both be registers or one of them could be an “immediate” value, e.g. 10. The result of the test is stored in a special “flags” register that we don’t touch directly. Instead, we use a family of “jump” instructions that read the flags register and decide which instruction to run next.

The je instruction will jump to a point in our code, denoted by the “label”equal, if the result of the test was that both values are equal. Otherwise, the code falls through to the next assembly instruction, which will be whatever we decide to put below our notequal label. For now, we just do a jmp, which is an unconditional jump, to the end of our program at exit.

“Labels” are a new concept in this snippet. They aren’t instructions and the CPU doesn’t attempt to run them. Instead, they’re used to mark points in the code that can be jumped to, which is what we’ve used them for here. You can spot labels by looking for the trailing colon.

Accessing memory

So far so good, but we’re only touching registers at the moment. Eventually we will run out of space in registers and have to fall back to using memory. How does that work in assembly?

mov         eax, 0x7f00000
mov dword [eax], 123
Enter fullscreen mode Exit fullscreen mode

The first bit should be familiar, we’re storing the hexidecimal value 0x7f00000in to eax, but then we do something funky with the next instruction.

The square brackets around [eax] mean that we want to store a value in the memory address inside of eax. The dword keyword signifies that we want to store a 4-byte value (“double word”). We need the dword keyword in there because without it there’s no way to infer how large the value 123 should be in memory.

If you’re familiar with pointers in C, this is essentially how pointers work. You store the memory address you want to point to in a register and then access that memory by wrapping the register name in square brackets.

We can take this a little further:

mov             eax, 0x7f00000
mov dword [eax + 1], 123
Enter fullscreen mode Exit fullscreen mode

This will store the value 123 at address 0x7f00001, one byte higher than before. The value of eax isn’t modified by doing this, it’s just letting us access the value at a register plus an offset. This is commonly seen in real-world assembly code, and we’ll see why later on.

The stack

Fundamentals out of the way, this is our first recognisable concept from higher level languages. The stack is where languages typically store short-lived variables, but how does it work?

When your program starts, the value of a special register called esp is set to a memory location that represents the top of the stack space you are allowed to access, which is itself near the top of your program’s “address space.”

You can examine this phenomenon by writing a simple C program and running it in a debugger:

int main() {
    return 42;
}
Enter fullscreen mode Exit fullscreen mode

Compile with gcc and run using gdb, the “GNU Debugger”:

$ gcc main.c -o main -m32
$ gdb main
gdb> start
gdb> info register esp
esp 0xffffd468  0xffffd468
Enter fullscreen mode Exit fullscreen mode

When we say start in the GDB prompt, we’re asking it to load our program in to memory, start running it, but pause it as soon as it starts. Then we inspect the contents of esp with info register esp.

A brief detour into memory organisation

Why is the stack near the top of the address space? What even is an address space?

When your program gets loaded in to memory on a 32-bit system, it is given its own address space that is 4GB in size: 0x00000000 to 0xffffffff. This happens no matter how much physical RAM your machine has installed in it. Mapping this “virtual” address space is another one of the jobs that the MMU performs, and the details of it are beyond the scope of this post.

Memory layout visualised

This simplified view of memory shows that our program is loaded somewhere fairly low down in the address space, our stack is up near the top and grows down, then we have this mysterious place called the “heap” that lives just above our program. It’s not important for language interoperability, but the heap stores long-lived data, for example things you’ve allocated usingmalloc() in C or new in C++.

This allows for a fairly efficient use of the available space. In reality, the stack is limited in size to a handful of megabytes and exceeding that will cause your program to crash. The heap can grow all the way up to the base of the stack but no further. If it ever attempts to overlap with the stack, yourmalloc() calls will start to fail, which causes many programs to crash.

Back to the stack

The x86 instruction set gives us some instructions for adding and removing values from the stack.

push    1
push    2
push    3
pop     eax
pop     ebx
pop     ecx
Enter fullscreen mode Exit fullscreen mode

At the end of this sequence of instructions, the value of eax will be 3,ebx will be 2 and ecx will be 1. Don’t trust me? We can verify this for ourselves with a couple of small modifications.

global main
main:
        push    1
        push    2
        push    3
        pop     eax
        pop     ebx
        pop     ecx
        ret
Enter fullscreen mode Exit fullscreen mode

Save this in to a file called stack.s and run the following commands:

$ nasm -f elf stack.s
$ gcc -m32 stack.o -o stack
Enter fullscreen mode Exit fullscreen mode

If you don’t have nasm you’ll need to install it. It’s a type of program called an “assembler” and it can take assembly instructions and compile them down to machine code.

We now have an executable file in our working directory called “stack”. It’s a bonafide program you can run like any other program. The only modifications we had to make to it were giving it a main label, and making sure it correctly returns control back to the operating system with the ret instruction. We’ll explore ret in more detail later.

Running this program doesn’t really do anything. It will run, but exit silently3. To see what’s going on inside of it, we will once again need a debugger.

$ gdb stack
gdb> break *&main+9
gdb> run
gdb> info registers eax ebx ecx
Enter fullscreen mode Exit fullscreen mode

The output of this sequence of commands should verify what I said earlier about the contents of those registers. gdb allows us to load up a program, run it until a certain point (break *&main+9 is us telling gdb to stop just before the ret instruction) and then examine the program state.

So what do push and pop actually do?

The push and pop instructions are shorthand and can be expanded to the following:

; push
sub     esp, 4
mov     [esp], operand

; pop
mov     operand, [esp]
add     esp, 4
Enter fullscreen mode Exit fullscreen mode

All of which should be familiar to you from previous sections, and neatly demonstrates how the stack grown downwards and shrinks upwards as things are added and removed.

Functions in assembly

Language interoperability is, at its most fundamental, the ability to call a function written in one language from a different language. So how do we define and call functions in assembly?

With the knowledge we have so far, you might be tempted to think that function calls look like this:

main:
        jmp myfunction

myfunction:
        ; do things here
Enter fullscreen mode Exit fullscreen mode

A label to denote where your function is in memory, and a jump instruction to call it. This approach has two critical problems: it doesn’t handle passing arguments to or returning values from the function and it doesn’t handle returning control to the caller when your function ends.

We could solve the first problem by putting arguments and return values on the stack:

main:
        push    1
        push    2
        jmp     add
        pop     eax

add:
        pop     eax
        pop     ebx
        add     eax, ebx
        push    eax
Enter fullscreen mode Exit fullscreen mode

This would work really well if only our add function were able to jump back to the caller when it was finished. At the moment, when add ends, the program ends. In an ideal world it would return back to just after the jmp in main.

What if we saved where we were when we called the function and jumped back to that location when the function was finished?

The eip register holds the location of the currently executing instruction. Using this knowledge, we could do this:

main:
        push    1
        push    2
        push    eip
        jmp     add
        pop     eax

add:
        pop     edx ; store the return address for later
        pop     eax ; 2
        pop     ebx ; 1
        add     eax, ebx
        push    eax
        mov     eip, edx
Enter fullscreen mode Exit fullscreen mode

We’re getting there. This approach has a couple of problems, though: we modifyesp a lot more than we really have to and x86 doesn’t let you mov things in to eip.

Here’s what a compiler would actually generate for our example above:

main:
        push    ebp
        mov     ebp, esp
        push    2
        push    1
        call    add
        add     esp, 8
        pop     ebp
        ret

add:
        push    ebp
        mov     ebp, esp
        mov     edx, dword [ebp + 8]
        mov     eax, dword [ebp + 12]
        add     eax, edx
        pop     ebp
        ret
Enter fullscreen mode Exit fullscreen mode

This is a lot to take in, so let’s go through it line by line.

We’re introducing a new special register: ebp. This is the “base pointer” register, and its purpose is to act as a pointer to the top of the stack at the moment a function is called. Every function starts with saving the old value of the base pointer on the stack, and then moving the new top of the stack in toebp.

Next we do the familiar pushing of arguments on to the stack. At least we got that right. Then we use an instruction we haven’t seen before called call. call can be expanded to the following:

; call
push    eip + 2
jmp     operand
Enter fullscreen mode Exit fullscreen mode

With eip + 2 meaning the instruction after the jmp below it. It doesn’t matter what the value is, just think of it as pushing the address of the instruction after the call instruction on to the stack so we can refer to it later.

Then control is passed to add, which follows a similar pattern. The base pointer is saved, the stack pointer becomes the new base pointer, and then we get to see the base pointer in action.

mov     edx, dword [ebp + 8]
mov     eax, dword [ebp + 12]
Enter fullscreen mode Exit fullscreen mode

This code is pulling the two arguments to add off of the stack and in to registers so that we can operate on them. But why are they 8 bytes and 12 bytes away?

Remember we pushed the arguments on to the stack, and then call pushed the address to return to. This means that the first 4 bytes of stack are a return address, then then 8 bytes after that are our arguments. To get to the first argument, you need to move 8 bytes up from the stack pointer (because it grows downwards), and to get to the second argument you need to move 12 bytes up.

Stack frame

This has the added benefit of not requiring us to modify esp with every singlepush and pop instruction. It’s a small saving, but when you consider that this has to happen for every argument to every function called, it adds up.

When we’ve done what we need to do in our add function, we perform the steps we did at the start but in reverse order.

pop     ebp
ret
Enter fullscreen mode Exit fullscreen mode

The ret instruction is special because it allows us to set the value of eip. It pops the return value that call pushed for us and jumps to it, returning control of the program to the calling function.

The same thing happens in main, except there’s a subtle difference. The add
esp, 8
is necessary to “free” the arguments to add we pushed on to the stack. If we don’t do this, the pop ebp will not correctly restore the base pointer and we’ll likely try to refer to memory we never intended to, crashing our program.

Lastly, you’ll notice that add doesn’t push its result back on to the stack when it’s done. It leaves the result in eax. This is because it’s conventional for a function’s return value to be stored in eax.

Conventions

We’ve just done the deepest possible dive on how function calls work in x86, now let’s put names to each of the things we have learnt.

Saving the base pointer and moving the stack pointer prior to calling a function is called the function prologue.

Restoring the stack pointing and base pointer after a function call is called the function epilogue.

Those two concepts, along with returning values in eax and storing your function arguments on the stack make up what’s called a calling convention , and calling conventions are part of a larger concept known as an application binary interface , or ABI. Specifically, all of what I have described so far it part of the System V ABI , which is used by almost all Unix and Unix-like operating systems.

Before we start calling functions written in one language from functions written in another, there’s one last thing we need to be aware of.

Object files

When you compile a C program, quite a lot of things happen under the hood. The most important concept to understand for language interoperability is “object files.”

If your program consists of 2 .c files, the first thing a compiler does is compile each of those .c files in to a .o file. There is typically a 1-to-1 mapping between.c files and .o files. The same is true of assembly, or .s files.

Object files, on Linux and lots of other Unix-like operating systems, are in a binary format called the “executable and linkable format,” or ELF for short. If you’re interested in the full range of data found inside of an ELF file, you can use the readelf -a <elffile> command to find out. The output is likely to be quite dizzying, though, and we’re only really interested in one of its features here.

Symbol tables

To explain symbol tables, let’s split out our assembly from earlier in to two.s files:

add.s:

add:
        push    ebp
        mov     ebp, esp
        mov     edx, dword [ebp + 8]
        mov     eax, dword [ebp + 12]
        add     eax, edx
        pop     ebp
        ret
Enter fullscreen mode Exit fullscreen mode

main.s:

main:
    push    ebp
    mov     ebp, esp
    push    2
    push    1
    call    add
    add     esp, 8
    pop     ebp
    ret
Enter fullscreen mode Exit fullscreen mode

Assemble both of the files:

$ nasm -f elf add.s
$ nasm -f elf main.s
main.s:6: error symbol `add' undefined
Enter fullscreen mode Exit fullscreen mode

Whoops. Our assembler isn’t happy about us calling an undefined function. This is sticky, because we want to define that function elsewhere and call it in main.s, but it seems here like the assembler doesn’t allow that.

The problem is that there is no add symbol defined in this file. If we want to tell the assembler that we intend to find this symbol in another file, we need to say so. Add this line to the top of main.s:

extern add
Enter fullscreen mode Exit fullscreen mode

And now it should assemble without complaint. Before we go further, have a look in your working directory. You should have two .o files: main.o and add.o. We can look at the contents of their symbol tables with a tool called nm:

$ nm add.o
00000000 t add

$ nm main.o
         U add
00000000 t main
Enter fullscreen mode Exit fullscreen mode

The first column is the address of the symbol, the second column is the type of the symbol, and the third column is the name of the symbol. Notice that the symbol names match up with our label names. Also note that main.o has anadd symbol of type U. U means “undefined”, which means we need to find a definition for it when we link these object files together later.

Both of our defined functions have a symbol type of t. This means that the symbol points to some code.

To create an executable out of these object files, we run:

$ gcc -m32 main.o add.o -o main
Enter fullscreen mode Exit fullscreen mode

This will shout at you, claiming that it cannot find either main or add. What gives?

Unless we explicitly say so, the symbols in an object file cannot be used by other object files when the compiler links them together. To fix this, we need to add:

global add
Enter fullscreen mode Exit fullscreen mode

To the top of add.s and:

global main
Enter fullscreen mode Exit fullscreen mode

To the top of main.s. This allows the symbols to be linked and the result is that the compiler now takes our object files and creates an executable out of them without complaint.

$ nm main
Enter fullscreen mode Exit fullscreen mode

This will produce a lot of output now, because the compiler has to link in a lot of administrative stuff, like the libc constructor and destructor handlers,__libc_csu_init and __ libc_csu_fini respectively. Don’t worry about them, the important thing is that both main and add are defined and the program runs without complaint.

Assemble and link diagram

Calling a C function from C++

Let’s go up a level and look at some C and C++.

main.cpp:

extern int inc(int i);

int main() {
    return inc(2);
}
Enter fullscreen mode Exit fullscreen mode

The first line here is us telling the compiler to expect to find a function calledinc in another file.

inc.c:

int inc(int i) {
    return i + 1;
}
Enter fullscreen mode Exit fullscreen mode

Here’s the set of steps we need to follow to compile them both separately and then link them together:

$ gcc -c main.cpp -o main.o
$ gcc -c inc.c -o inc.o
$ gcc inc.o main.o
Enter fullscreen mode Exit fullscreen mode

Unfortunately, this fails with a seemingly unfathomable error:

main.o: In function `main'
main.cpp:(.text+0xa): undefined reference to `inc(int)'
collect2: error: ld return 1 exit status
Enter fullscreen mode Exit fullscreen mode

But we supplied an object file with a definition of inc(int) in it, we made sure to tell the compiler to expect to find a function called inc(int), why can’t it find it?

Name mangling

Sadly C++ wasn’t able to provide all of the features it wanted to without diverging from the System V ABI a little bit. When you compile a function in C, that function gets given a symbol with the name you gave it so that others can call it by that name.

C++ does not do this by default. As well as the name you give it, C++ also tacks on information about the return type and argument types of that function. If the function is in a class, information about the class is also included. It does this to allow you to overload the name of a function, so you can define multiple variations of a function that takes different arguments. This is called name mangling.

As a result, when we compiled our main.cpp file, it was told to look for a function called _Z3inci instead of plain old inc. Our inc.c file provides a function called inc, and as such the two languages cannot interoperate without a little bit of help.

Fortunately, the problem is easily solved by adding 4 characters to our main.cpp:

extern "C" int inc(int i);
Enter fullscreen mode Exit fullscreen mode

This addition of "C" tells C++ to compile this file in search of a function with plain old C-style calling conventions, and this includes using the plain old C name of inc. Attempting to compile this code should now work as expected.

Calling a C++ function from C

This relationship works similarly in the other direction. If we want to write a function in C++ but expose it in a way that a C program could call it, we would need to use extern "C" on that function:

extern "C" int inc(int i) {
    return i + 1;
}
Enter fullscreen mode Exit fullscreen mode

What about Java?

What we’ve discussed up until now is the fundamental basis for how language interoperability works between native languages, but what about a language that runs on a virtual machine, such as Java?

The process is a little more involved, but doable. The problem arises because Java runs on a thing called the Java Virtual Machine, or JVM, which acts as a layer of indirection between Java code and the machine on which it runs. Because of this, Java cannot link directly to native libraries in the same way that C and C++ can. We have to introduce a layer that translates between the native world and the JVM world.

Fortunately, the people behind Java gave this a lot of thought and they came up with the “Java Native Interface,” or JNI. It’s the accepted way to get Java code to talk to native code, and here’s how it works:

public class Interop {
  static {
    System.loadLibrary("inc");
  }

  public static native int inc(int i);

  public static void main(String... args) {
    System.out.println(inc(2));
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice the use of the native keyword. This tells Java that the implementation for this function is defined in some native code somewhere. TheSystem.loadLibrary("inc") line will search Java’s library path for a library called libinc.so and, when it finds it, we will be able to use the Java function inc to call our native code!

But how do we do that?

Step 1: Generate the JNI header file from our code.

$ javac -h . Interop.java
Enter fullscreen mode Exit fullscreen mode

The -h . tells javac to generate a file called Interop.h in the current directory. This will define the function we have to to implement. The resulting file looks like this:

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class Interop */

#ifndef _Included_Interop
#define _Included_Interop
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class: Interop
 * Method: inc
 * Signature: (I)I
 */
JNIEXPORT jint JNICALL Java_Interop_inc
  (JNIEnv *, jclass, jint);

#ifdef __cplusplus
}
#endif
#endif
Enter fullscreen mode Exit fullscreen mode

This looks a lot scarier than it is. The lines containing _Included_Interopare just “header guards” that make sure we can’t accidentally include this file twice and the __cplusplus bit checks if we’re compiling as C++ code and, if we are, wraps everything in an extern "C" block, which you’ll remember from earlier in this post.

The rest is the definition of the JNI function we have to implement:

JNIEXPORT jint JNICALL Java_Interop_inc
  (JNIEnv *, jclass, jint);
Enter fullscreen mode Exit fullscreen mode

It might not look like one, but this is indeed a function declaration. We implement it like so:

inc-jni.c:

#include <jni.h>
#include "Interop.h"

JNIEXPORT jint JNICALL Java_Interop_inc
  (JNIEnv* env, jclass class, jint i) {
    return (jint)(i + 1);
}
Enter fullscreen mode Exit fullscreen mode

The jint cast is to make sure our integer is the size that Java is expecting it to be. The jni.h include is required for all of the Java-specific things we’re seeing, such as jint and jclass.

To compile this we need to execute a pretty gnarly gcc call:

$ gcc -I"$JAVA_HOME\include" -I"$JAVA_HOME\include\linux" -fPIC -shared -o libinc.so inc-jni.c
Enter fullscreen mode Exit fullscreen mode

The -I flags tell gcc where to find header files, which we need for the#include <jni.h> lines to work. The -fPIC -shared flags create a special type of object file called a “shared object.” What’s special about this type of object file is that it’s possible to, instead of compiling directly against it, load it in to your process at runtime. Shared object files are, by convention, named lib<something>.so.

Now we can run the Java code like so:

$ java -Djava.library.path=. Interop
3
Enter fullscreen mode Exit fullscreen mode

Voila! We successfully called our native inc implementation from Java! How cool is that?

What about Ruby?

Ruby has a similar approach to Java. It exposes an API called “Ruby native extensions” that you can hook in to to call native functions in Ruby. Given that we have explored Java’s way of doing this, and Ruby’s is not too dissimilar, I want to use Ruby to focus on a different and more convenient way of calling native code.

ffi

ffi stands for “foreign function interface,” and it’s a Ruby gem that allows us to call functions in existing shared object files with very little setup. First, we install the ffi gem:

$ gem install ffi
Enter fullscreen mode Exit fullscreen mode

Then we need to compile our original inc.c file in to a shared object:

$ gcc -fPIC -shared -o libinc.so inc.c
Enter fullscreen mode Exit fullscreen mode

And then we can write the following short snippet of Ruby code and call ourinc function:

require 'ffi'

module Native
  extend FFI::Library
  ffi_lib './libinc.so'
  attach_function :inc, [:int], :int
end

puts Native.inc(2)
Enter fullscreen mode Exit fullscreen mode

This method makes us jump through far fewer hoops than Java’s JNI does, and has the benefit of allowing us to call existing shared objects without modifying them. For example, you can call directly in to libc.so:

require 'ffi'

module Libc
  extend FFI::Library
  ffi_lib 'c'
  attach_function :puts, [:string], :int
end

Libc.puts "Hello, libc!"
Enter fullscreen mode Exit fullscreen mode

Notice that we only had to specify 'c' as the library name. This is because Unix-like systems have standardised paths to find libraries in, often including/lib and /usr/lib. By default ffi will look for libraries with the naming scheme lib<name>.so. If you do ls /usr/lib/libc.so you should find a file exists at that path.

Why doesn’t Java have FFI?

It does! There’s a library called JNA that does the same job that Ruby’s FFI library does.

import com.sun.jna.Library;
import com.sun.jna.Native;

class JNA {
  public interface Libc extends Library {
    Libc INSTANCE = (Libc)Native.loadLibrary("c", Libc.class);
    public int puts(String s);
  }

  public static void main(String... args) {
    Libc.INSTANCE.puts("Hello, libc!");
  }
}
Enter fullscreen mode Exit fullscreen mode

We have to download the JNA library, which you can do so from here. Then we compile and run the Java code including the JNA library:

$ javac -classpath jna-4.1.0.jar JNA.java
$ java -classpath jna-4.1.0.jar:. JNA
Hello, libc!
Enter fullscreen mode Exit fullscreen mode

Why would anyone use a language’s native interface if FFI is so much more convenient?

Lots of languages have some form of FFI library, and they’re very convenient for calling in to existing native libraries. The problem, though, is that it’s one-way communication. The library you’re calling in to can’t modify, for example, a Java object directly. It can’t call Java code. The only way to do that is to use Java’s JNI or Ruby’s native extensions, because they expose to you an API for doing that.

If you don’t need to have two-way communication between languages, though, and all you want to do is call existing native code, FFI is the way to go.

Hold up, so how does Kotlin call functions written in Java?

Kotlin is a relatively new language that runs on the JVM.

“Wait, doesn’t Java run on the JVM?”

That’s true! But the JVM isn’t just for Java, it’s a generic virtual machine that compilers can generate “machine code” for just like native machines. The JVM machine code is more commonly referred to as “bytecode,” because every instruction is one byte in length. Yes, this limits the instruction set to 256 instructions. Far fewer than the 2,034 instructions you’ll find in x86.

In this respect, you could consider the JVM to be a “native” machine. That is the abstraction it aims to present to compilers and users. The only difference is that the native machine you’re running code on is being emulated by a piece of software called the JVM.

Let’s look at some example Java and Kotlin code.

Hello.java:

public class Hello {
  public static void world() {
    System.out.println("Hello, world!");
  }
}
Enter fullscreen mode Exit fullscreen mode

Main.kt:

fun main(args: Array<String>) {
  Hello.world()
}
Enter fullscreen mode Exit fullscreen mode

Let’s compile and run this code.

$ javac Hello.java
$ kotlinc -classpath . Main.kt
$ kotlin MainKt
Hello, world!
Enter fullscreen mode Exit fullscreen mode

And now we disassemble the code to see what’s going on under the hood:

$ javap -c MainKt.class
Compiled from "Main.kt"
public final class MainKt {
  public static final void main(java.lang.String[]);
    Code:
      0: aload_0
      1: ldc #9 // String args
      3: invokestatic #15 // Method kotlin/jvm/internal/Intrinsics.checkParameterIsNotNull:(Ljava/lang/Object;Ljava/lang/String;)V
      6: invokestatic #21 // Method Hello.world:()V
      9: return
}
Enter fullscreen mode Exit fullscreen mode

This looks a lot different to disassembled native code. The bit underneath the “Code:” heading is the actual bytecode that gets run. The numbers on the left hand side are the byte offset in to the code, the next column is the “opcode” and anything after the opcode are the operands.

“But you said JVM bytecodes were one byte in length, why does the byte index on the left increment more than 1 byte per instruction?”

The opcodes are one byte in length. The operands make up the rest of the space. For example, the invokestatic opcode takes two additional bytes in operands: one byte to reference the class being called and another to reference the method on that class.

“How does the number 21 reference our Hello.world() method?”

A fundamental concept in JVM class files is the “constant pool.” It’s not output by default by javap, but we can ask for it:

$ javap -verbose MainKt.class
Classfile /tmp/MainKt.class
  Last modified May 7, 2018; size 735 bytes
  MD5 checksum f3ce23c2362512e852a5c91a1053c198
  Compiled from "Main.kt"
public final class MainKt
  minor version: 0
  major version: 50
  flags: ACC_PUBLIC, ACC_FINAL, ACC_SUPER
Constant pool:
  ...
  #16 = Utf8 Hello
  #17 = Class #16
  #18 = Utf8 world
  #19 = Utf8 ()V
  #20 = NameAndType #18:#19
  #21 = Methodref #17.#20
  ...
Enter fullscreen mode Exit fullscreen mode

I’ve trimmed a lot of the output, but left the relevant entried in the constant pool. 16, 18 and 19 are the raw UTF-8 strings that give the names “Hello”, “world” and “()V”, which is a way of saying it’s a function that returns void. The invokestatic opcode specifically takes operands that are indexes in to the constant pool, and entries in the constant pool can reference other entries in the constant pool.

If we tried to compile Main.kt on its own, without adding Hello.class in to the classpath, we get an error very similar to the one we got when we tried to link an object files without all of the symbol definitions it required:

$ kotlinc Main.kt
Main.kt:2:2: error: unresolved reference: Hello
Enter fullscreen mode Exit fullscreen mode

So because these languages, Java and Kotlin, both run using the same ABI and on the same instruction set, we are able to use that ABI’s calling conventions to make function calls across language boundaries.

Conclusion

We’ve come a long way. From raw, native assembly code up to calling functions between languages running on the same virtual machine.

With the knowledge you have built up of calling conventions, instruction sets, object files and FFI libraries, you should now be well equipped to explore how languages not mentioned in this post would call functions written in other languages.


  1. Don’t worry too much about these words if they don’t mean anything to you. They’re just terms you may see in the wild and, when you do, you’ll know that the things in this post are what they are referring to. 

  2. With exceptions including floating point calculations, which happen on the Floating Point Unit, or FPU. 

  3. It’s not 100% silent. Check the exit status of the program when it finishes running. Why do you think it exits with the status it does? 

Top comments (0)