While working on my personal projects, I have had to deal with Go's image package a number of times. Transitively, I have also had to deal with Go's color package as well.
In the past, I used to struggle with alpha-premultiplied colors, which are the default ones in Go, and their representation. In this post I am hoping to shed some light on the matter as well as to show some micro-optimization techniques when implementing your own Color type.
Quick Theory
Before diving into the details, we should first start with the basics. Colors are often represented by four components - R
, G
, B
and A
- each representing the amount of Red
, Green
, Blue
and Alpha
(transparency/translusency) that a color is made up of.
In practice, the A
is sometimes omitted when one is dealing with fully opaque colors. Also, this is not the only format out there (e.g. there is CMYK) but I will stick to RGBA in this blog post.
The most common machine representation of RGBA is probably R8G8B8A8
(RGBA8
for short). This means that there are 8 bits dedicated to each component. In essence, each component can range from 0
to 255
. The benefit is that this is capable of representing a sufficient number of unique colors while also using little memory. In fact, a single color can be nicely stored in a single uint32
variable.
Other common options are R16G16B16A16
(RGBA16
for short), where each component gets 16 bits (values range from 0
to 65535
), and R32FG32FB32FA32F
(RGBA32F
for short), where each component is a floating point number (values range from 0.0
to 1.0
, though there are exceptions), allowing for easy transformations, without losing much precision, at the cost of a higher memory footprint.
Alpha-premultiplied colors
In my personal experience, I have always needed my colors to be in non-alpha-premultiplied form so I cannot pretend to fully understand the benefits of alpha-premultiplied colors. I know that there are certain algorithms which work better/faster with alpha-premultiplied colors and I believe that Go's draw package uses such algorithms internally.
Regardless, by default Go returns colors in alpha-premultiplied form so one needs to know how such colors are constructed, if one is to use Go's color
or image
packages correctly.
So what are alpha-premultiplied colors. These are colors, where the r
, g
, and b
components have been multiplied by a
.
...and as much as this explanation is correct, it is also very wrong. It makes certain assumptions that might not be obvious to the reader, unless they have deep knowledge on colors. This same assumption is made by the Go documentation as well.
If we were to take an r
component from an RGBA8
color that has a value 128
(half intensity) and were to multiply it with an a
component that has a value 128
(half transparency), we would get a new r
value of 0
(no intensity), where in fact we would have expected 64
(quarter intensity). The problem is that the number overflows the 8 bits we have available.
In fact, what is usually understood is that the multiplication occurs in floating point space. In our example above, it would mean that we convert the 128
values to 0.5
(half intensity), then we multiply them with each other (i.e. 0.5 x 0.5
), resulting in 0.25
. Lastly, we convert the float back to uint8
, resulting in 64
.
Go's built-in color types
We should have a look at some of the most popular Go color types and see how they match to the formats we discussed previously.
The most popular color type is RGBA. It represents an alpha-premultiplied RGBA8
color.
Another popular color type is NRGBA. It represents a non-alpha-premultiplied RGBA8
color.
Derived from these type names, we also have the 16 bit component types.
The first one is RGBA64, which represents an alpha-premultiplied RGBA16
color.
Note: Go takes a different approach to color naming when compared with OpenGL, for example. In Go's case, the 64
in the type name indicates that the whole struct uses 64
bits (4 x 16 bits
) and not that each component is 64 bits in size.
And the second one is NRGBA64, which represents a non-alpha-premultiplied RGBA16
color.
The RGBA function
Understanding the built-in color types is good but what unifies all colors in Go is the Color interface. Each color is expected to implement that interface. As such, if you were to ever implement your own color representation, you would need to implement that interface.
The only method that the Color
interface requires is the RGBA
one.
RGBA() (r, g, b, a uint32)
And even if you were to never implement your own Color, you are more than likely to use the RGBA
method. As such, it is important to understand how it relates to the colors.
This particular function, for me personally, has been the source of a lot of confusion in the past. Let's have a look at the official documentation.
RGBA returns the alpha-premultiplied red, green, blue and alpha values for the color. Each value ranges within [0, 0xffff], but is represented by a uint32 so that multiplying by a blend factor up to 0xffff will not overflow. An alpha-premultiplied color component c has been scaled by alpha (a), so has valid values 0 <= c <= a.
There are two important things here that are easy to get wrong.
The first one is that colors returned from the RGBA
function are in the alpha-premultiplied R16G16B16A16
format. However, the components are returned inside 32 bit
variables. The idea is that if one were to multiply two R16G16B16A16
colors with each other, the values would overflow the 16 bit representation, whereas 32 bits per component would handle such an operation.
Still, the important point to remember here is that each component is 16 bit, even if placed inside a 32 bit variable, and that the color components are alpha-premultiplied. It is sufficient to check the official source code for the RGBA64
type (recall that it represents an alpha-premultiplied R16G16B16A16
color).
type RGBA64 struct {
R, G, B, A uint16
}
func (c RGBA64) RGBA() (r, g, b, a uint32) {
return uint32(c.R), uint32(c.G), uint32(c.B), uint32(c.A)
}
As can be seen, the 16 bit components are just cast to uint32
but their values remain in the [0, 65535]
range. Also, there is no transformation done, since the color is already alpha-premultiplied.
The second important thing to note when reading the documentation is the part about the color component c has been scaled by alpha (a)
. What is meant here by scaled by alpha
is that the alpha
is converted to a float32 and only then multiplied with each color component. Recall the explanation in the Alpha-premultiplied colors section.
Implementing the Color interface
So, now that we have covered the fundamental aspects of colors in Go, let's try and implement our own color type. We will try to implement the non-alpha-premultiplied RGBA8 type.
Initial implementation
Following the principles from what's been discussed so far, we end up with the following implementation.
type CustomColor struct {
R uint8
G uint8
B uint8
A uint8
}
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert components to float32 in the range [0.0, 1.0]
r32f := float32(c.R) / float32(255.0)
g32f := float32(c.G) / float32(255.0)
b32f := float32(c.B) / float32(255.0)
a32f := float32(c.A) / float32(255.0)
// perform alpha-premultiplication
r32f = r32f * a32f
g32f = g32f * a32f
b32f = b32f * a32f
// convert back to 16 bit
r16 := uint16(r32f * 65535.0)
g16 := uint16(g32f * 65535.0)
b16 := uint16(b32f * 65535.0)
a16 := uint16(a32f * 65535.0)
// return as 32 bit (without upscaling)
return uint32(r16), uint32(g16), uint32(b16), uint32(a16)
}
If we test this implementation we will see that it works correctly. The question now is how well does it compare to an official implementation.
Official implementation
As mentioned already, the color.NRGBA64
type in Go implements a non-alpha-premultiplied RGBA8 color, exactly what we implemented above. Let us have a look at the official source code and compare how well we did.
func (c NRGBA) RGBA() (r, g, b, a uint32) {
r = uint32(c.R)
r |= r << 8
r *= uint32(c.A)
r /= 0xff
g = uint32(c.G)
g |= g << 8
g *= uint32(c.A)
g /= 0xff
b = uint32(c.B)
b |= b << 8
b *= uint32(c.A)
b /= 0xff
a = uint32(c.A)
a |= a << 8
return
}
There is clearly a major difference between the two implementations. How is it that the official implementation avoids the usage of float32 values and what are all the bit-wise operations doing?
Iterative optimization
Let us try to optimize our implementation one step at a time until we get to something that resembles the official implementation. Going on that journey should expand our knowledge in micro-optimizations and give us pointers on how to optimize the Color
implementations we might do in the future.
First, we should realize that we don't need to go through the uint16
type but can instead directly cast to uint32
, as long as we ensure that the final values don't exceed 65535
. And since our float representations are in the [0.0, 1.0]
range and we multiply them by 65535.0
, this should be ensured. We end up with the following code.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert components to float32 in the range [0.0, 1.0]
r32f := float32(c.R) / float32(255.0)
g32f := float32(c.G) / float32(255.0)
b32f := float32(c.B) / float32(255.0)
a32f := float32(c.A) / float32(255.0)
// perform alpha-premultiplication
r32f = r32f * a32f
g32f = g32f * a32f
b32f = b32f * a32f
// convert back to 16 bit, stored inside 32 bit variables
r16 := uint32(r32f * 65535.0)
g16 := uint32(g32f * 65535.0)
b16 := uint32(b32f * 65535.0)
a16 := uint32(a32f * 65535.0)
return r16, g16, b16, a16
}
Ok, this is cleaner, but is not much of an actual performance improvement.
Looking more at the code, one thing we may ask ourselves is whether we really need to convert the color components to the [0.0, 1.0]
range. In fact, as long as it is of type float32 and the alpha component is in the [0.0, 1.0]
range, we could do the alpha-premultiplication within the [0.0, 255.0]
range, and then just scale it up to the [0.0, 65535.0]
range. This will remove one division per component, meaning a total of three float divisions. We end up with the following code.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// perform alpha-premultiplication in the range [0.0, 255.0]
r32f := float32(c.R) * a32f
g32f := float32(c.G) * a32f
b32f := float32(c.B) * a32f
// convert to the [0.0, 65535.0] range
r16 := uint32(r32f * 257.0)
g16 := uint32(g32f * 257.0)
b16 := uint32(b32f * 257.0)
a16 := uint32(a32f * 65535.0)
return r16, g16, b16, a16
}
Note: We multiply the r
, g
and b
components by 257
since 255 x 257 = 65535
, where 255
is the maximum value in the [0.0, 255.0]
range. In essence, we ensure a correct mapping from the [0.0, 255.0]
range to the [0.0, 65535.0]
range.
Since we are multiplying each color component by 257
and it is an integer constant, we could perform that multiplication while the color component is still in its integer representation, converting three float multiplications to three integer ones.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// perform alpha-premultiplication in the range [0.0, 255.0]
r32f := float32(uint32(c.R)*257) * a32f
g32f := float32(uint32(c.G)*257) * a32f
b32f := float32(uint32(c.B)*257) * a32f
// convert to the [0.0, 65535.0] range
r16 := uint32(r32f)
g16 := uint32(g32f)
b16 := uint32(b32f)
a16 := uint32(a32f * 65535.0)
return r16, g16, b16, a16
}
Note: We cast the color components to uint32
before performing the multiplications, as otherwise the multiplication would overflow the uint8 size that each color component has.
Let's rework the code a bit further by moving the color-component-to-alpha multiplication to the end of the code.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// perform integer upscaling of the color components
upscaledR := uint32(c.R) * 257
upscaledG := uint32(c.G) * 257
upscaledB := uint32(c.B) * 257
// convert to the [0.0, 65535.0] range
r16 := uint32(float32(upscaledR) * a32f)
g16 := uint32(float32(upscaledG) * a32f)
b16 := uint32(float32(upscaledB) * a32f)
a16 := uint32(a32f * 65535.0)
return r16, g16, b16, a16
}
Something else we should realize is that we can split the color component multiplication from uint32(c) * 257
to uint32(c) * 256 + uint32(c)
. Why would we want to do this? Instead of a single multiplication, don't we now have both a multiplication and an addition?
We have to know a bit base-2 theory to spot the potential performance improvement here. Multiplication by 256
can also be represented through an 8-wise bit-shift to the left. A single bit-shift to the left is equal to a multiplication by 2. Doing 8 bit-shifts to the left is equal to doing eight multiplications by 2, or 2^8, which is 256. The critical thing here is that in general bit-shifts are much faster than arbitrary multiplications or divisions.
Side note: Doing a bit-shift to the right is equal to the reverse - a division by 2.
This allows us to rewrite the code as follows.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// store the 8 bit values inside 32 bit variables so that we can
// perform multiplications without overflowing
upscaledR := uint32(c.R)
upscaledG := uint32(c.G)
upscaledB := uint32(c.B)
// perform integer upscaling of the color components
upscaledR = upscaledR + (upscaledR << 8)
upscaledG = upscaledG + (upscaledG << 8)
upscaledB = upscaledB + (upscaledB << 8)
// convert to the [0.0, 65535.0] range
r16 := uint32(float32(upscaledR) * a32f)
g16 := uint32(float32(upscaledG) * a32f)
b16 := uint32(float32(upscaledB) * a32f)
a16 := uint32(a32f * 65535.0)
return r16, g16, b16, a16
}
By now you should start to see a resemblance to the official source code.
Another thing to realize is that adding an 8 bit number to any number whose first 8 bits are zeroes is the same as just replacing the first 8 bits.
For example, if we were to have a number A
that has ????????00000000
as its 16 bit representation (where each ?
can by either a 0
or 1
) and were to add a number B
that has 10110110
as its 8 bit representation, then the sum of the two would be ????????10110110
. This is basically a bit-wise |
(or) operation.
In our code above, we multiply each component by 256 by doing 8 bit-shifts to the left. This means, that we have 8 lower bits that are zero. This means that we can use bit-wise |
to perform the addition. In general, bit-wise |
is much quicker than arbitrary addition.
We get the following code.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// store the 8 bit values inside 32 bit variables so that we can
// perform multiplications without overflowing
r := uint32(c.R)
g := uint32(c.G)
b := uint32(c.B)
// perform integer upscaling of the color components
r = r | (r << 8)
g = g | (g << 8)
b = b | (b << 8)
// convert to the [0.0, 65535.0] range
r16 := uint32(float32(r) * a32f)
g16 := uint32(float32(g) * a32f)
b16 := uint32(float32(b) * a32f)
a16 := uint32(a32f * 65535.0)
return r16, g16, b16, a16
}
Let's look at the alpha component for a bit. We do the following steps to evaluate it.
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// convert to the [0.0, 65535.0] range
a16 := uint32(a32f * 65535.0)
We convert the alpha down to the [0.0, 1.0]
range, only so that we can scale it back up to [0.0, 65535.0]
. In essence, we divide it by 255.0
and afterwards multiply it by 65535.0
. Well, (v / 255.0) * 65535.0
equals (v * 65535.0) / 255.0
, which equals v * 257.0
. We see the well familiar 257-multiplication paradigm. Let's apply the same logic as we did with the color components.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// store the 8 bit values inside 32 bit variables so that we can
// perform multiplications without overflowing
r := uint32(c.R)
g := uint32(c.G)
b := uint32(c.B)
a := uint32(c.A)
// perform integer upscaling of the components
r = r | (r << 8)
g = g | (g << 8)
b = b | (b << 8)
a = a | (a << 8)
// convert to the [0.0, 65535.0] range
r16 := uint32(float32(r) * a32f)
g16 := uint32(float32(g) * a32f)
b16 := uint32(float32(b) * a32f)
return r16, g16, b16, a
}
Note: We still need the a32f
variable that holds the alpha in the [0.0, 1.0]
range, since we are still using it for the alpha-premultiplication of the color components. Is there something we could do about this as well?
In the code above, we are using the following equation for color alpha-premultiplication.
c_alpha_premultiplied = uint32(float32(c) * (float32(a) / 255.0))
If we move the brackets a bit, we can get to the following, which yields the same output.
c_alpha_premultiplied = uint32((float32(c) * float32(a)) / 255.0)
In the above equation, we can multply c
and a
before casting them to float32
, since they are integer values.
c_alpha_premultiplied = uint32(float32(c * a) / 255.0)
Now that we have rearranged the equation in this simplified form, we can notice that we don't need to use a floating point division. Since we are casting the end result to an integer (uint32
in this case), we are losing any digits after the decimal, hence we might as well use integer division which has that same behavior instead.
c_alpha_premultiplied = uint32(c * a) / 255
Note: This is only possible and valid if c * a
fits inside uint32
. In our case c
is an upscaled value that occupies 16 bits and a
(as used here; prior to upscaling) occupies 8 bits. This means that in total a maximum of 24 bits would be needed and since we have 32 to work with, we should be fine.
Let's apply the above reasoning to our code.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
// store the 8 bit values inside 32 bit variables so that we can
// perform multiplications without overflowing
r := uint32(c.R)
g := uint32(c.G)
b := uint32(c.B)
a := uint32(c.A)
// perform color alpha-premultiplication of the color components
r = r | (r << 8)
r = (r * a) / 255
g = g | (g << 8)
g = (g * a) / 255
b = b | (b << 8)
b = (b * a) / 255
// do the alpha integer upscaling last
a = a | (a << 8)
return r, g, b, a
}
Lastly, we need to recall the following Go short-hand expressions.
c = c | (c << 8)
// is the same as
c |= (c << 8)
c = (c * a) / 255
// is the same as
c *= a
c /= 255
// which is the same as
c *= a
c /= 0xff
Using these and a bit of rearranging, we get to the following code.
func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
r := uint32(c.R)
g := uint32(c.G)
b := uint32(c.B)
a := uint32(c.A)
r |= (r << 8)
r *= a
r /= 0xff
g |= (g << 8)
g *= a
g /= 0xff
b |= (b << 8)
b *= a
b /= 0xff
a |= (a << 8)
return r, g, b, a
}
Except for a few syntactic differences, we have done it - we have simplified and optimized our Color
implementation of a non-alpha-premultiplied RGBA8 color to match the official one.
Summary
I realize that not everyone has to deal with Go's image
or color
packages. Even fewer people would ever need to implement their own Color
type.
Still, I am hoping that this managed to shed some light on alpha-premultiplied colors and the most popular Go color formats to those readers that will need to use them.
Also, while micro-optimizations are usually the last line of defense when dealing with performance problems, in situations where a function is expected to be called a million times within a short duration of time (as is the case with RGBA
), micro-optimizations and knowing how to apply them can be a critical.
Top comments (0)