Hi! Long time coder, first time poster.
I'm making a programming language called Toy, which is intended to allow easy modding of video game logic by the players. To this end, each value within Toy, be it a number, a string, a variable name, etc. is stored in a structure called "Toy_Literal".
Lets see if you can see the problem here:
typedef struct Toy_Literal {
Toy_LiteralType type;
union {
bool boolean;
int integer;
float number;
Toy_RefString* stringPtr;
Toy_LiteralArray* array;
Toy_LiteralDictionary* dictionary;
struct {
void* bytecode;
Toy_NativeFn native;
Toy_HookFn hook;
void* scope;
int length;
} function;
struct {
Toy_RefString* ptr;
int hash;
} identifier;
struct {
Toy_LiteralType typeOf;
bool constant;
struct Toy_Literal* subtypes;
int capacity;
int count;
} type;
struct {
void* ptr;
int tag;
} opaque;
} as;
} Toy_Literal;
This is slightly adjusted from the actual definition of the literal structure I was using for the longest time. As the language progressed, I would add new features as needed.
The big problem, is that this is 48 bytes in size, with a lot of wasted space. It's kind of obvious if you know ahead of time, but it was a real lightbulb moment when I realized I could shrink this down by 50%, thus speeding up the copious copying of literals throughout my lang's internals:
typedef struct Toy_Literal {
union {
bool boolean; //1
int integer; //4
float number;//4
struct {
Toy_RefString* ptr; //8
//string hash?
} string; //8
struct Toy_LiteralArray* array; //8
struct Toy_LiteralDictionary* dictionary; //8
struct {
union {
void* bytecode; //8
Toy_NativeFn native; //8
Toy_HookFn hook; //8
} inner; //8
struct Toy_Scope* scope; //8
} function; //16
struct { //for variable names
Toy_RefString* ptr; //8
int hash; //4
} identifier; //16
struct {
struct Toy_Literal* subtypes; //8
Toy_LiteralType typeOf; //4
unsigned char capacity; //1
unsigned char count; //1
bool constant; //1
} type; //16
struct {
void* ptr; //8
int tag; //4
} opaque; //16
} as; //16
Toy_LiteralType type; //4
int bytecodeLength; //4 - shenanigans
} Toy_Literal;
By rearranging the members from largest-to-smallest in each struct/union, byte alignment allowed me to pack the whole literal into just 24 bytes, using the entire structure's contents.
I should also note a couple of quirks with the function type: as functions can only be one type (A Toy function represented by bytecode, a native C function, or a "hook" function which is used for libraries) I stuck them all in a union.
Also, the out-of-place member bytecodeLength
represents the number of bytes used by Toy functions. This member is the only "wasteful" part, as it's used exclusively by the bytecode function.
This seems like it would've taken a lot of work to thread through my lang's internals, but it was surprisingly easy, since I had made it a habit of only interacting with literals via a big set of macros:
#define TOY_IS_NULL(value) ((value).type == TOY_LITERAL_NULL)
#define TOY_IS_BOOLEAN(value) ((value).type == TOY_LITERAL_BOOLEAN)
#define TOY_IS_INTEGER(value) ((value).type == TOY_LITERAL_INTEGER)
//etc.
So I only had to rework the macros, and add in a single macro that accessed the bytecode function length.
When not tinkering with languages, I'm usually a game developer - come follow me on Twitter! Or hire me, either is good.
Top comments (1)
Why have function and type embedded inside
Toy_Literal
? If you have twoToy_Literal
objects that refer to the same function or type, then all that information is duplicated. Why not define functions and types elsewhere (a distinct set for each) with all their relevant information then haveToy_Literal
contain only pointers to those?