Part II: The Bugs
There's an old programming joke that goes something like
There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.
I think we should add a third (fourth?) problem to that list: sorting things.
Sorting Things
There are lots of different ways to sort things in computer science. C.S. students learn about time and space complexity of these sorting algorithms, YouTubers make cool visualisations of them, and occasionally, a guy named Tim will invent a new one.
But there's one aspect of sorting algorithms that -- for me, at least -- seems completely impossible: remembering in which direction things are sorted.
If you say to a group of people: "okay, everyone, stand in a single-file line, ordered by height", the next question you might ask is "okay, but in which direction?" Who should stand at the front of the line? The shortest person or the tallest person?
In programming, we define comparison functions, which describe how to order whatever objects we're interested in.
Some comparison functions seem obvious. For example, in TypeScript, using the default string
comparison...
const array: string[] = ["cherry", "apple", "banana"]
array.sort()
//...
...we would expect array
to be sorted alphabetically, with apple
as the first (0
th) element of the sorted array
//...
console.log(array) // [ 'apple', 'banana', 'cherry' ]
console.log(array[0]) // apple
Note that
<array>.sort()
in JavaScript sorts the array "in place", so that the original, unsortedarray
no longer exists afterward. In some languages, and for some sorting algorithms, arrays are not sorted in place, and a new array will be returned. This new array should be assigned to a new variable.
But often we will be working with objects more complex than string
s, and we will need to define custom comparison functions. These are functions which take two elements of type T
and return a number
, and are used to sort arrays of type T
:
type T = string
const newArray: T[] = ["cherry", "apple", "banana"]
function comparison(t1: T, t2: T): number {
return t1.charCodeAt(0) - t2.charCodeAt(0)
}
newArray.sort(comparison)
console.log(newArray) // ?
console.log(newArray[0]) // ?
Without reading the docs, will the console.log()
s above give the same result as the earlier ones? How about something a bit simpler -- sorting an array of number
s:
type T = number
const newArray: T[] = [42, 2112, 19]
function comparison(t1: T, t2: T): number {
return t2 - t1
}
newArray.sort(comparison)
console.log(newArray) // ?
console.log(newArray[0]) // ?
Will the first element above be 19
? Or 2112
? Are you sure?
I understand the utility of sorting algorithms, and I understand the need for a ternary (greater than, less than, or equal) return value, and hence number
as the return type instead of boolean
, but comparison functions are just one of those things that I've always had to test every time. Sometimes in development, and sometimes in production.
So What Happened?
With what we learned above, you should now be able to see what went wrong with my initial code. The problem was here
// get the blog post date from its git commit date
const gitLog = SlugFactory.git.log({ file: `blog/${slug.params.slug}.md` });
return gitLog.then(lines => {
const dates = lines.all.map(each => each.date);
// if blog post hasn't been committed yet, use current date
const date = dates[0] ?? new Date().toISOString();
return new FrontMatter(slug.params.slug, title, description, date, rawContent);
});
git log
returns commits sorted by date, such that newer commits come first and later commits come afterward. So dates[0]
, above, is the newest commit returned from git log
, and each blog post was being given a "publication" date of the most recent commit in which that post was modified.
When were these blog posts most recently modified? Well, all of them were modified in that same commit, because the point of the commit was to remove the date
parameter from the front matter. Essentially, I was mixing up the lastUpdated
date and the published
date. One of these is the first element in the list (dates[0]
) and one of them is the last element in the list (dates[dates.length-1]
).
So like I said, there are four hard problems in computer science.
On To The Next One
With that fixed, we're off to the races, right?
Oh... well, that's not right.
Those two posts were both committed on January 2 (Hello, World! and Git Hooks), not on January 6. So why did they both have the wrong date?
That's right, it's another bug... Or is it?
Find out in the thrilling final installation of this debugging mystery!
Top comments (4)
If you need to rely on memorization:
compare(a,b) > 0
thenb
beforea
knowing that
t1 = "cherry", t2 = "apple"
'c' - 'a' > 0
i.e. "apple" before "cherry"t1 = 42, t2 = 2112
t2 - t1 > 0
i.e.2112
before42
Is this painful?
Sure, hence
i.e. state the intent.
One reason to steer clear of inline anonymous functions.
I'm not convinced this has anything to do with sorting.
I could be wrong but I think there's some ambiguity when it comes to Simple Git's API (TypeScript or not). It mostly defers to the git log documentation (and Pretty Formats).
So given the amended code:
it seems to be more a misunderstanding of the data that is being returned by the API.
Also given the way Simple Git depends on the Git documentation the
date
property name is tricky.Commit Formatting
If Simple Git follows the Git convention then
date
would be "Author date". The "commit date" is only available with thefuller
format asCommitDate
orcommitter date
.(edit: it is the author date; the "commit date" requires a custom
format
option with Simple Git).Viewing the Commit History
So
date
may actually be the "Author Date" which may be good enough for your purposes as it refers to the time ofgit commit
. But the actual "commit date" can be altered by other operations.So if you need sorting by "author date" then perhaps an explicit sort may be the "safer" route.
This seems to be more related to (mis)understanding the data being manipulated/relied on. If anything, it demonstrated again how "naming things is hard".
There are definitely some quirks here re: author vs. committer, and the potential for commits to change due to cherry-picking, etc. Hopefully I don't make that much of a mess of the repo, though, as it's just me committing to it. But if I do, you can bet there'll be another blog post about what I learned from fixing it!
My biggest concern would be having to put the date back into the frontmatter to implement drafts (with a future date) - i.e. the oldest "author date" would no longer imply "publish date".
The way I'm doing drafts right now is prepending
wip-
to the filename, and.gitignore
-ingblog/wip-*
. That means that drafts aren't under version control. This is fine for me (for now) because usually I have zero or one drafts at any given time.Bringing drafts into version control is an interesting problem. As you say, it would require basically throwing out the idea that first commit == published date. Maybe a
drafts
directory is the easiest way to go?git log <filename>
doesn't follow file renames by default (see--follow
), so moving a file fromdrafts/
toblog/
could essentially be the "trigger" for publication.Interesting stuff. Thanks for pointing this out!