Cover image by Paul Gorbould on Flickr.
Tagged union, discriminated union, disjoint union, variant, variant record, or sum types. Different name, similar concept, but what is it all about and how do tagged unions differ from regular ones?
Untagged Unions
If you are coming from statically typed languages like C, you probably already know about unions. A basic way to save data of different type into the same memory space. They are also called untagged unions sometimes.
An example in C could look like that
union MyUnion {
int number;
char text[20];
};
int main() {
union MyUnion x;
x.number = 2;
printf( "x.number: %d\n", x.number);
strcpy( x.text, "Hello, world!");
printf( "x.text: %s\n", x.text);
return 0;
}
The size of x
in memory will be the biggest value that MyUnion
can store. It looks a bit like a struct
but if you write a value in one field it overrides the memory of the other fields. The basic idea behind this is to save space, also it makes languages like C a tiny bit more dynamic, because one variable now can store different types.
As you probably can imagine, this can also be used to save different types of structs into one memory space.
The problem with unions is, the type-checker doesn't care what you are doing.
If you declare an int x
, the type-checker will throw an error if you try to put a string inside of it.
If you declare an union MyUnion x
, the type-checker won't keep track of what you are storing, since it's runtime dependent, so you have to check inside of your program logic if it's okay to access x.number
or x.text
.
How is this realated to JavaScript?
Well, in JavaScript, you can't type your variables, which allows you to store anything in them.
let x = 2;
console.log("Number:", x);
x = "Hello, world!";
console.log("Text", x);
This can be rather convenient, because if you data-structure changes, you still can put it inside the same variables, without caring about the types.
The problems arise when you get a bit more complex data-structures.
let x = {
httpMethod: "GET",
path: "/users",
queryParams: { id: 10 }
};
console.log("ID:", x.queryParams.id);
x = {
httpMethod: "POST",
path: "/users",
body: { name: "Jane" }
};
console.log("ID:", x.body.name);
As you can see, a GET
request comes with a queryParams
field and a POST
request comes with a body
field. The path
is the same, but some parts differ.
You can use the httpMethod
field to check what it is, but you have to do it yourself. If you get this wrong, you could end up accessing x.body.id
in a GET
request and everything blows up, because x.body is undefined.
If you used JavaScript for a while, you probably noticed that basically all data is a untagged union. Most of the time you just store one type of data into a variable, but more often than not you end up pushing around objects that are kinda the same, but differ in some fields, like the request example above.
Tagged Unions
So what's the idea about tagged unions?
They let you define the differences of your unions with the help of a static type system.
What does this mean?
Like I explained with the request example, you often have a bunch of different data types, that come in one variable, like an argument of a function or something. They are basically the same, but vary in few fields or they are entirely different. If you want to be sure you don't access data that isn't there and prevent the infamous is undefined errors, you would have to check inside the program code at runtime.
Such a check could look like this:
function handle(request) {
if (request.method == "GET") console.log(request.queryParams.id);
}
You could also directly check the queryParams
object, but nobody forces you to do so, this is completely in your hand and could fail one day in production.
Languages with tagged unions in their type-system allow you to make this check at compile time. Reason is such a language.
An example of a request type could look like this:
type body = {name: string};
type queryParams = {id: string};
type httpMethod = GET(queryParams) | POST(body);
type request = {
path: string,
httpMethod: httpMethod
};
Now the data is encapsulated inside a tagged union (called variant in Reason), which is the httpMethod
type at the top.
If the content of httpMethod
is GET
, you don't even get access to a body
, which could have (and often has) an entirely different structure from queryParams
.
Example of a usage could look like that:
let handleRequest = (req: request) =>
switch (req.httpMethod) {
| GET(query) => Js.log("GET " ++ req.path ++ " ID:" ++ query.id)
| POST(body) => Js.log("GET " ++ req.path ++ " ID:" ++ body.name)
};
What does this do? It types the req
argument as request
. Since req.httpMethod
is a variant (= tagged union), we can use switch to do things for the different types in that variant.
Many languages that have tagged unions even force you to do things for every possibility. This seems strange at first, but it can help later. If someone changes that tagged union, which can be defined somewhere else in the code, the type-checker will tell you that you need to do something for the new type in that union. This could be forgotten if done manually.
Conclusion
Tagged unions are a nice way to store different data-types inside of one variable without losing track of their structure. This allows code to be written more like in a dynamically typed language while giving it more safety in the long run.
Reason is such a language, it tries to make concepts like tagged unions, called variants in Reason, accessible for JavaScript developers while delivering it with a familiar syntax.
TypeScript has tagged unions too, if you aren't into that whole FP thingy.
Top comments (7)
Nice article :)
Tagged unions are the number one thing I miss in most of the 'big' programming languages. Being able to represent the state of something in a way the compiler can verify for you is so nice!
Same here.
I was never a friend of statically typed languages. This feature is the first time I see real value!
Yeah, I think the things that finally sold me on it were Scott Wlaschin's 'Designing With Types' blog posts and Richard Feldman's 'Making Impossible States Impossible' talk. Both well worth a read/watch if you've not already seen them :)
Yes, same here. One of the best resources about that topic, I think!
Nice article. The Rust language also has great support for tagged unions with it's Enum construct. It also forces (at compile time) that all possibilities are catered for.
Enforcing safety as a formality in a language is a fantastic feature. It prevents the laziness that we often resort to when dealing with less strict languages.
TypeScript has untagged sum types, but it's kind of a pain to handle the cases because you have to write the runtime checks manually:
Oh, and the checks would be different if the cases were primitive types or function-constructed values.
IMO tagged unions are much quicker to handle precisely because the tags (i.e. data constructors) are first-class language entities.
You can "simplify" the type checking part by making the method part of your type definition and setting it's type to the string literal:
Inside of an if/switch-case that checks
method
typescript only allows access to the proper fields.In case you cover all cases in a switch, inside
default
, the type of the variable will benever
.Typescript calls this "Discriminated Unions":
typescriptlang.org/docs/handbook/a...
(There is no way to link directly to the section, you need to search for the term on the page.)