My adventure with Odin keeps going. I decided to try something unexpected with Odin - deep learning and AI! "But how?" you might ask. Fortunately, there is a well-known deep-learning framework for running AI models. I'm about ONNX.
Prerequisites
- Familiarity with ONNX
- Familiarity with C
- Odin Overview page has been read
ONNX has a well-documented C API, which makes it easy to port to Odin. At least, I hoped it would be easy. But this is another story.
The plan:
- chose the ONNX sample to translate to Odin
- make binds to ONNX
- ???
- run model on GPU with the use of Odin
ONNX Sample
I've found a suitable for my idea ONNX example. I'm going to use this example as a strong foundation for the project. But to make things more interesting I'll add just a few enhancements:
- listing all available providers
- checking if CUDA is available
- using CUDA (if available) for running a model
This example code shows the basic usage of ONNX:
- Make
OrtApi
instance - Initialize
OrtEnv
variable - Configure
OrtSessionOptions
andOrtSession
- Run session (all computations happen here)
- Process results and release resources
ONNX Bindings
Long story short. I've already made Odin bindings for ONNX. And I'll use those bindings in the project. I have to say that this post is not about bindings generation. Still, we'll compare the C and Odin versions of the code to get an idea about the usage of C APIs in Odin.
???
In this paragraph, I'll describe only the key parts of the code. A link to the full code is available in the references.
Make OrtApi
instance
In C OrtApi
instance initialized as follows:
#include <onnxruntime_c_api.h>
const OrtApi* g_ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);
On the Odin side, the OrtApi
structure is already defined in the bindings file, so we can simply initialize the OrtApi
instance and even check if it was initialized successfully:
g_ort: ^OrtApi
if g_ort = OrtGetApiBase().GetApi(ORT_API_VERSION); cast(rawptr)g_ort == nil {
fmt.eprintln(">>> OrtApi is nil")
os.exit(1)
}
Listing all available providers
To get a list of all available providers, in C we use:
int providers_count;
char **providers;
CheckStatus(g_ort->GetAvailableProviders(&providers, &providers_count));
printf(">>> Num Providers: %d\n", providers_count);
printf(">>> Providers:\n");
for (int i = 0; i < providers_count; ++i) {
printf(">>> %d) %s\n", i, providers[i]);
}
CheckStatus(g_ort->ReleaseAvailableProviders(providers, providers_count));
Here, you can see a regular pattern used by the ONNX C API:
- Declare variables.
- Initialize variables by reference in a function.
- Check the status of the operation.
- Release allocated resources (if any).
CheckStatus is just a helper function to check the status:
void CheckStatus(OrtStatus *status) {
if (status != NULL) {
const char *msg = g_ort->GetErrorMessage(status);
fprintf(stderr, "%s\n", msg);
g_ort->ReleaseStatus(status);
exit(1);
}
}
The equivalent code in Odin is as follows:
//// Get available providers:
providers_count: c.int
providers: [^]cstring
g_ort.GetAvailableProviders(cast(^^^c.char)(&providers), &providers_count)
defer g_ort.ReleaseAvailableProviders(providers, providers_count)
fmt.println(">>> Available providers:")
for i: c.int = 0; i < providers_count; i += 1 {
fmt.printfln("\t%d) %s", i, providers[i])
}
/*
0) TensorrtExecutionProvider
1) CUDAExecutionProvider
2) CPUExecutionProvider
*/
The most interesting part of the code snippets above (as well as in the whole project) is how Odin types are mapped to C types.
Mapping between C's int
and Odin's c.int
is quite straightforward. But what about
char **providers;
OrtStatus* GetAvailableProviders(char*** out_ptr, int* provider_length);
This is quite tricky. On the Odin side, we could use providers: ^^c.char
, but we shouldn't. This is because we use providers
as a 1D array of C-strings (null-terminated char arrays), but not just double pointer to char. To express array nature of providers
, we use multi-pointers. Multi-pointers support indexing in Odin. So, instead of providers: ^^c.char
, we use providers: [^]cstring
.
But how do I pass providers: [^]cstring
to a function with the following signature:
GetAvailableProviders : proc(out_ptr: ^^^c.char, provider_length: ^c.int) -> OrtStatusPtr
To pass providers
into GetAvailableProviders()
we have to cast providers
to ^^^c.char
:
cast(^^^c.char)(&providers)
Initialize OrtEnv
variable
According to the C API, to initialize the OrtEnv
we use the following code:
OrtEnv *env;
CheckStatus(g_ort->CreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env));
On the Odin side, the equivalent code is quite similar:
env: ^OrtEnv
status: OrtStatusPtr = g_ort.CreateEnv(OrtLoggingLevel.ORT_LOGGING_LEVEL_WARNING, "test", &env)
CheckStatus(g_ort, status)
defer g_ort.ReleaseEnv(env)
CheckStatus()
is as follows:
CheckStatus :: proc(ort: ^OrtApi, status: OrtStatusPtr) {
if status != nil {
msg: cstring = ort.GetErrorMessage(status)
fmt.eprintln(msg)
ort.ReleaseStatus(status)
os.exit(1)
}
}
Configure OrtSessionOptions
and OrtSession
OrtSessionOptions
To initialize OrtSessionOptions
we use the OrtApi
instance:
OrtSessionOptions *session_options;
CheckStatus(g_ort->CreateSessionOptions(&session_options));
g_ort->SetIntraOpNumThreads(session_options, 1);
g_ort->SetSessionGraphOptimizationLevel(session_options, ORT_ENABLE_BASIC);
Compare it with Odin's code:
session_options: ^OrtSessionOptions
status = g_ort.CreateSessionOptions(&session_options)
CheckStatus(g_ort, status)
defer g_ort.ReleaseSessionOptions(session_options)
status = g_ort.SetIntraOpNumThreads(session_options, 1)
CheckStatus(g_ort, status)
status = g_ort.SetSessionGraphOptimizationLevel(
session_options,
GraphOptimizationLevel.ORT_ENABLE_BASIC,
)
CheckStatus(g_ort, status)
Enable CUDA if available
After OrtSessionOptions
is initialized, we can check if CUDA is available and configure ONNX to use CUDA as the provider. Here I show only Odin code because as you already have seen, the difference between Odin and C is minimal.
Let's find out if CUDA is available:
is_cuda_available: bool
for i: c.int = 0; i < providers_count; i += 1 {
if providers[i] == "CUDAExecutionProvider" {
is_cuda_available = true
break
}
}
fmt.printfln(">>> CUDA is available: %t", is_cuda_available)
And use it as acceleration provider:
if is_cuda_available {
fmt.println(">>> Setting up CUDA...")
status = OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0)
CheckStatus(g_ort, status)
}
OrtSession
OrtSession
is initialized with OrtEnv
, model path, and OrtSessionOptions
:
session: ^OrtSession
model_path :: "squeezenet1.0-8.onnx"
status = g_ort.CreateSession(env, model_path, session_options, &session)
CheckStatus(g_ort, status)
defer g_ort.ReleaseSession(session)
Run session
So, we've made all necessary preparations to configure OrtSession
. We're almost ready to run the mode in GPU.
To run the model, we have to:
- specify the model's input and output node names
- allocate input tensor
- run session
- get computation results
Specify model's input and output node names
ONNX model is a computational graph where nodes represent operations, and edges represent the data flow.
We have to specify which node we'd like to put data in and which node we'd like to get the results from. The specification is done by name.
Without diving deep into details, I'll postulate that the input node name is data_0
, the output node name is softmaxout_1
, and the input node dimensions are 1x3x224x224. There are a few ways to get this information:
- read model specification or source code
- use handy visualizers, for example, https://netron.app/
- use ONNX API functionality to get the information about the model in runtime
Here is a screenshot of SqueezeNet model properties taken on the netron.app:
The syntax of node name declaration is straightforward, so I'll show only output_node_names
in C (to be more precise, in C++ as it was implemented in the ONNX example):
std::vector<const char *> output_node_names = {"softmaxout_1"};
On the Odin side, input node name, output node name, and dimensions are initialized as follows:
input_node_dims := make([dynamic]c.int64_t)
defer delete(input_node_dims)
append(&input_node_dims, 1, 3, 224, 224)
input_node_names := make([dynamic]cstring)
defer delete(input_node_names)
append(&input_node_names, "data_0")
output_node_names := make([dynamic]cstring)
defer delete(output_node_names)
append(&output_node_names, "softmaxout_1")
Allocate input tensor
In a real-world scenario, we'd use images to pass to the model and get predictions. But for the sake of the demo, we'll use dummy data to be passed to the model.
SqueezeNet is the model for image classification. The model is trained on the ImageNet dataset. Images in the ImageNet dataset have a specific size of 224x224x3 pixels (224 pixels height, 224 pixels width, and 3 channels). So, we have to allocate and populate a vector of 224*224*3
elements.
size_t input_tensor_size = 224 * 224 * 3;
std::vector<float> input_tensor_values(input_tensor_size);
// initialize input data with values in [0.0, 1.0] (dummy data)
for (size_t i = 0; i < input_tensor_size; i++) {
input_tensor_values[i] = (float)i / (input_tensor_size + 1);
}
// create input tensor object from data values
OrtMemoryInfo *memory_info;
CheckStatus(g_ort->CreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info));
OrtValue *input_tensor = NULL;
CheckStatus(g_ort->CreateTensorWithDataAsOrtValue(
memory_info, input_tensor_values.data(),
input_tensor_size * sizeof(float), input_node_dims.data(), 4,
ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, &input_tensor));
int is_tensor;
CheckStatus(g_ort->IsTensor(input_tensor, &is_tensor));
assert(is_tensor);
g_ort->ReleaseMemoryInfo(memory_info);
In the C code above (again, it's C++, but who cares), we allocate the input_tensor_values
vector of floats with 224*224*3
elements and populate it with dummy data. Then, we "transfer" the data to the ONNX world by calling CreateTensorWithDataAsOrtValue()
and allocating input_tensor
. As the final step, we check if input_tensor
is a reference to the real tensor, and we're good to go.
Here is the same thing in Odin:
input_tensor_size: c.size_t = 224 * 224 * 3
input_tensor_values := make([dynamic]c.float, input_tensor_size)
defer delete(input_tensor_values)
// initialize input data with values in [0.0, 1.0] (dummy data)
for i: c.size_t = 0; i < input_tensor_size; i += 1 {
input_tensor_values[i] = cast(c.float)i / (cast(c.float)input_tensor_size + 1)
}
// create input tensor object from data values
memory_info: ^OrtMemoryInfo
status = g_ort.CreateCpuMemoryInfo(
OrtAllocatorType.OrtArenaAllocator,
OrtMemType.OrtMemTypeDefault,
&memory_info,
)
CheckStatus(g_ort, status)
defer g_ort.ReleaseMemoryInfo(memory_info)
input_tensor: ^OrtValue
status = g_ort.CreateTensorWithDataAsOrtValue(
memory_info,
cast(rawptr)raw_data(input_tensor_values),
input_tensor_size * size_of(c.float),
cast(^c.int64_t)raw_data(input_node_dims),
len(input_node_dims),
ONNXTensorElementDataType.ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,
&input_tensor,
)
CheckStatus(g_ort, status)
defer g_ort.ReleaseValue(input_tensor)
is_tensor: c.int
status = g_ort.IsTensor(input_tensor, &is_tensor)
CheckStatus(g_ort, status)
assert(is_tensor == 1, "input_tensor not a tensor")
Pay attention to how we initialize dynamic arrays in Odin, and how we get a pointer to its inner data. To get a pointer to the inner data of the dynamic array we use raw_data()
Odin's intrinsic. And then, we cast it to the raw pointer cast(rawptr)
which is equivalent to casting to the void*
in C.
Run session
At the moment, we've prepared a session and allocated input tensor as well as names and dimensions of input and output nodes. It was a long, tedious, but necessary preparation.
To store inference results, we use the OrtValue
pointer. After computations are done, we check if that pointer is a valid tensor. Here is a C sample:
OrtValue *output_tensor = NULL;
CheckStatus(g_ort->Run(session, NULL, input_node_names.data(),
(const OrtValue *const *)&input_tensor, 1,
output_node_names.data(), 1, &output_tensor));
CheckStatus(g_ort->IsTensor(output_tensor, &is_tensor));
assert(is_tensor);
At this point, Odin's code should be clear and readable. I hope no extra explanations are required at the moment:
output_tensor: ^OrtValue
run_options: ^OrtRunOptions
status = g_ort.Run(
session,
run_options,
raw_data(input_node_names),
&input_tensor,
len(input_node_names),
raw_data(output_node_names),
len(output_node_names),
&output_tensor,
)
defer g_ort.ReleaseValue(output_tensor)
CheckStatus(g_ort, status)
status = g_ort.IsTensor(output_tensor, &is_tensor)
CheckStatus(g_ort, status)
assert(is_tensor == 1, "output_tensor not a tensor")
Get computation results
The time has come to get computation results from ONNX world to our world back. For these purposes, the GetTensorMutableData()
function is used.
The result of the computations is a float array that has a length of 1000. "Why 1000?" you may ask. This is because there are 1000 classes of images in the ImageNet dataset. The model's output is the vector of probabilities of the image (dummy data in our case) depicting a specific class.
Here is a C sample:
float* floatarr;
CheckStatus(g_ort->GetTensorMutableData(output_tensor, (void**)&floatarr));
assert(std::abs(floatarr[0] - 0.000045) < 1e-6);
// score the model, and print scores for first 5 classes
for (int i = 0; i < 5; i++)
printf("Score for class [%d] = %f\n", i, floatarr[i]);
You already know that for array pointers we use multi-pointers in Odin:
floatarr: [^]c.float
status = g_ort.GetTensorMutableData(output_tensor, cast(^rawptr)&floatarr)
CheckStatus(g_ort, status)
assert(abs(floatarr[0]) - 0.000045 < 1e-6, "computition failed")
for i := 0; i < 5; i += 1 {
fmt.printfln(">>> Score for class [%d] = %.6f", i, floatarr[i])
}
Run model on GPU with the use of Odin
The full code described in the post is available on GitHub: https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo
To run the code, you have to:
- Use Linux
- Have ONNX Runtime on your machine (in the
/thirdparty/onnxruntime
folder in the current example) - Have GPU with CUDA (optional)
Prepare, compile and run the code.
-
Clone the repo
git clone https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo.git cd onnx-odin-squeezenet-inference-demo
-
Get SqueezeNet model (using squeezenet version 1.3)
curl https://github.com/onnx/models/raw/main/validated/vision/classification/squeezenet/model/squeezenet1.0-8.onnx -Lso squeezenet1.0-8.onnx
-
Edit
onnxbinding.odin
if necessary to adjust package name or foreign import oflibonnxruntime.so
// ... package onnx_bindings // ... when ODIN_OS == .Linux do foreign import onnx "/thirdparty/onnxruntime/lib/libonnxruntime.so" // ...
-
Build
cd .. odin build onnx-odin-squeezenet-inference-demo -extra-linker-flags:"-Wl,-rpath=/thirdparty/onnxruntime/lib/" -out:onnx-odin-squeezenet-inference-demo/odin_onnx_example
-
Run
cd onnx-odin-squeezenet-inference-demo && ./odin_onnx_example
Special Thanks
Special thanks to the Odin community for answering my questions and helping me understand the mechanisms of passing data between C and Odin.
References
- Inference of SqueezeNet.onnx model on CUDA with Odin: https://github.com/yevhen-k/onnx-odin-squeezenet-inference-demo/
- Odin bindings to the ONNX Runtime (Linux): https://github.com/yevhen-k/onnx-odin-bindings
- ONNX SqueezeNet example: https://github.com/microsoft/onnxruntime/blob/v1.4.0/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp
- ONNX C API: https://github.com/microsoft/onnxruntime/blob/v1.17.3/include/onnxruntime/core/session/onnxruntime_c_api.h
- SqueezeNet: https://paperswithcode.com/method/squeezenet
- Model viewer: https://netron.app/
Top comments (0)