DEV Community

Cover image for What's the Fastest Way to Convert a Tensor to an Image File?
Aaron Langford
Aaron Langford

Posted on

What's the Fastest Way to Convert a Tensor to an Image File?

When serving generative image models in a production environment, a tensor representation of an image needs to be converted to a common image format like PNG, JPEG, or WEBP. However this conversion can be costly and those interested in super speedy inference need to know what the fastest way to get a file back to their users.

When making choices about how to deliver these image files, there will be a trade off between inference speed, image quality, and the number of bytes in the file.

In this post, I'll lay out some options to choose from share some benchmarks. All of my benchmarks are taken on a c6a.4xlarge EC2 instance. I use only a single 702x1248 image for each benchmark:

An AI generated image of the next big thing in EDM

See the appendix for list of versions of packages I'm using in my benchmarks.

Library Options:

  1. Python Image Library
  2. Torch Vision
  3. OpenCV

Formats:

  1. PNG
  2. WEBP

I won't consider lossy formats like JPEG because I think diffusion models services shouldn't risk any quality degradation as a result of lossy image compression.

PNG Benchmarks

Here's a sample of the code I used for Python Image Library (PIL). I assume the reader can figure out how to modify to produce all the results I report in the table that comes after the code.

import io
import time
from PIL import Image
import torchvision.transforms.functional as F

path = "/home/user/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")

image_tensor = F.to_tensor(pil_image)

def pil_png(out):
    pil_image: Image = F.to_pil_image(image_tensor)
    pil_image.save(out, format="PNG", compress_level=4)

t0 = time.time()
for i in range(100):
    out = io.BytesIO()
    pil_png(out)

print(f"Bytes in file:{len(out.getvalue())}")
print(f"Average time: {(time.time() - t0) / 100}")
Enter fullscreen mode Exit fullscreen mode

Results for Python Image Library:

Options Time (s) File Size (bytes)
optimize=True 2.153 1066223
optimize=False 0.368 1098439
compress_level=0 0.057 2766507
compress_level=1 0.085 1273428
compress_level=4 0.15 1114614
compress_level=9 2.13 1114614

I also tested Torch Vision, which has a png encoder. Code for Torch Vision:

import io
import time
from PIL import Image
import torch
import torchvision.transforms.functional as F
import torchvision.io

path = "/home/user/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")

image_tensor = F.to_tensor(pil_image) * 255.0
image_tensor = image_tensor.to(torch.uint8)

def tv_png():
    return torchvision.io.encode_png(image_tensor, compression_level = 1)

t0 = time.time()
for i in range(100):
    val = tv_png()

print(f"Bytes in file:{len(val)}")
print(f"Average time: {(time.time() - t0) / 100}")
Enter fullscreen mode Exit fullscreen mode

Results for Torch Vision:

Options Time (s) File Size (bytes)
compression_level=0 0.036 2770032
compression_level=1 0.071 1272572
compression_level=4 0.117 1116922
compression_level=9 1.535 1069100

My 3rd library that I considered was OpenCV. Here's the code I used to test that option:

path = "/home/aalangfo/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")

image_tensor = F.to_tensor(pil_image) * 255.0
image_tensor = image_tensor.to(torch.uint8)
image_tensor = image_tensor.numpy()
image_tensor = image_tensor.transpose((1, 2, 0))

def cv_png():
    result, buffer = cv2.imencode('.png', image_tensor, [cv2.IMWRITE_PNG_COMPRESSION, 0])
    return buffer

t0 = time.time()
for i in range(100):
    out = cv_png()

print(f"Bytes in file:{len(out.tobytes())}")
print(f"Average time: {(time.time() - t0) / 100}")
Enter fullscreen mode Exit fullscreen mode

Results for OpenCV:

Options Time (s) File Size (bytes)
cv2.IMWRITE_PNG_COMPRESSION, 0 0.035 2770047
cv2.IMWRITE_PNG_COMPRESSION, 1 0.063 1272600
cv2.IMWRITE_PNG_COMPRESSION, 4 0.093 1186740
cv2.IMWRITE_PNG_COMPRESSION, 9 2.088 1107430

WebP Benchmarks

WebP is an image format developed by Google that allows for both lossless and lossy compression for images. It was designed to offer more efficient compression than JPEG and PNG. It also supports animation like GIF does, but improves on GIFs compression.

The downside of WebP is that it may not be supported in older browsers and devices. There's also a few more levers in the WebP encoding spec, so it may take a bit more time to find the right settings for you. You should explore a list of those options here before reviewing the results.

I will stick with only lossless encoding for WebP for the same reasons mentioned above.

Here's my code for benchmarking WebP with Python Image Library (PIL):

import io
import time
from PIL import Image
import torchvision.transforms.functional as F

path = "/home/user/00000.png"
pil_image = Image.open(path)
pil_image = pil_image.convert("RGB")

image_tensor = F.to_tensor(pil_image) * 255.0
image_tensor = image_tensor.to(torch.uint8)

def pil_webp(out):
    pil_image: Image = F.to_pil_image(image_tensor)
    pil_image.save(out, format="WebP", lossless=True, quality=0, method=0)

t0 = time.time()
for i in range(100):
    out = io.BytesIO()
    pil_webp(out)

print(f"Bytes in file:{len(out.getvalue())}")
print(f"Average time: {(time.time() - t0) / 100}")
Enter fullscreen mode Exit fullscreen mode

Results for PIL:

Options Time (s) File Size (bytes)
quality=0, method=0 0.047 1046120
quality=0, method=3 0.218 814500
quality=0, method=6 0.270 808084
quality=50, method=0 0.080 1046734
quality=50, method=3 0.324 811578
quality=50, method=6 0.397 804762
quality=100, method=0 0.304 1033038
quality=100, method=3 0.745 809758
quality=100, method=6 7.203 791030

As of this post, torch vision does not support writing images using WebP.

While OpenCV does support WebP, the only way to do lossless image encoding is to set the quality level higher than 100; For the curious, I was able to get the following results at quality level 101:

  • Average time: 0.462
  • Bytes in file: 811340

Conclusion

Generally OpenCV is the fastest way to encode images. Because of lack of support for WebP encoding flags in OpenCV, OpenCV with PNG at compression level 0 or 1 seems like a great way to go.

If number of bytes are really important for the use case, WebP provides superior lossless compression to PNG, saving 100s of KB or 1s of MB depending on the configuration.

Top comments (0)