You know the old saying: «There are only two hard things in programming: cache invalidation and naming things», or its tongue-in-cheek variant: «There are two hard problems in programming: cache invalidation, naming things, and off-by-1 errors». They are misguiding, though, because naming things is way harder than the other two problems. Off-by-1 errors are annoying, but they can be solved just by being methodical. Cache invalidation is harder, but at least you have a lot of research around it. But if you are having trouble naming a variable, you are on your own.
Is there something we can do about this? Does it even make sense to think about doing something? I think it does.
Choosing the proper words to express an idea is not a new problem. I bet it exists since writing was invented, and maybe even before that. And there are tools meant to help with this, like dictionaries, thesauruses and usage dictionaries. What exactly are usage dictionaries? Well, I just used one to check if it was OK to use "thesauruses". Here's what Garner's Modern English Usage has to say about it:
thesaurus (= a book or online resource that supplies synonyms) has long formed the standard plural thesauri. But since 1960, thesauruses has climbed in frequency, especially in AmE.
That's not a dictionary. Even if it has short definitions, it also —and mainly— offers commentary about the word: how it's used and how it should be used.
Should? Who is to say how words should be used? Language is organic, you can't stop it from changing, etc. I'm not going to get into the war between prescriptivists and descriptivists. Suffice it to say that I am on the side of (moderate) prescriptivists. Otherwise I wouldn't consult a usage dictionary in the first place. (If this subject interests you, go read David Foster Wallace's essay "Authority and American Usage"; if it doesn't, go read it anyway: Wallace can make any subject interesting.)
Garner's has another usage dictionary called "Garner's Dictionary of Legal Usage", that is limited to the vocabulary and style of legal texts. It makes sense: lawyers have their own dialect, and how they use it is very important.
Sounds familiar?
We as programmers have a similar problem. Using a bad name won't affect a trial, but it will hurt maintainability in the long run. Bad names contribute to technical debt and are mentally taxing. How comes, then, that nobody has written a "Dictionary of Modern Programming Usage"?
Of course, this would get outdated. But so did usage dictionaries from the 1920s. Does that mean they weren't useful when they came out? And yes: programming languages change much faster than natural languages, but I would argue that the way we choose names for variables, functions or classes doesn't change as fast as available syntax features or libraries.
It's interesting to think about the implementation issues that arise from this idea, the first one being who would write it —or rather, how many people. Can this be a one person job? Or do you need a group of people that come from different backgrounds? Or, going further, maybe this should be a crowdsourced effort, like a wiki?
Another question is how do you define the scope of such a thing. This would require serious thought, but I think some things can be said for sure. First, you don't get into anything that is already in an English usage dictionary. Second, you don't talk about domain specific things (like networking concepts, for example), since this would make the work impossible and, besides, it wouldn't make sense to talk about the usage of well-defined technical terms.
What is in scope then? I can think of some examples:
- What does qualified means? How it's different from canonical?
- What's the difference between formatting, serializing, pretty-printing, etc.? (See this discussion.)
- When do you use get, compute, select, fetch?
- There are a lot of words for (what I call) maps: maps, dictionaries, associative arrays, hashes (a perl/ruby monstrosity). Can they be differentiated or it just depends on the programming language?
That last point takes us to the third question: should this hypothetical dictionary be specific to some programming language? I'm inclined to say no, since I feel most things would be language-agnostic. But some terms should include clarifications on their different usages among languages.
There's something that should make a project like this easier, or at least more interesting and valuable. The fourth edition of Garner's Dictionary takes advantage of a tool that wasn't available when its previous editions came out: the Google Ngram Viewer. You can imagine how useful this can be for a lexicographer.
Something similar could be used for a programming usage dictionary. For starters, we have GitHub, where we can search any term across millions of repositories. But more complex things could be done. You could use parsers to get the names of every defined function. How many of them start with compute
? And then you could sample some of them, or the ones from the most famous projects, to get a better understanding of how they are used and why they are named that way. Again, this runs into the issue of which subset of the existing programming languages should you use. But you can start small, with two or three of the most prevalent ones, and work your way from there.
This doesn't mean that the effort could be automated, not even close. A project like this would require a lot of reading and exploring. But open source and tools that analyze code could amplify the work of programming lexicographers and give more certainty to their judgements.
I'm not aware of the existence of something like this, nor of someone working on it. Maybe it wouldn't be as useful as it seems to me, or maybe the necessary effort is just too large. I don't know. At the very least, I think it would be worth the shot.
Originally published in my newsletter. You can subscribe here.
Top comments (0)