Andrew (he/him)

Posted on Jan 23, 2022 • Edited on Mar 6, 2022 • Originally published at awwsmm.com

What's Wrong This Time? Part II: Electric Bugaloo

#typescript #javascript #beginners

Part II: The Bugs

There's an old programming joke that goes something like

There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.

I think we should add a third (fourth?) problem to that list: sorting things.

Sorting Things

There are lots of different ways to sort things in computer science. C.S. students learn about time and space complexity of these sorting algorithms, YouTubers make cool visualisations of them, and occasionally, a guy named Tim will invent a new one.

But there's one aspect of sorting algorithms that -- for me, at least -- seems completely impossible: remembering in which direction things are sorted.

If you say to a group of people: "okay, everyone, stand in a single-file line, ordered by height", the next question you might ask is "okay, but in which direction?" Who should stand at the front of the line? The shortest person or the tallest person?

In programming, we define comparison functions, which describe how to order whatever objects we're interested in.

Some comparison functions seem obvious. For example, in TypeScript, using the default string comparison...

const array: string[] = ["cherry", "apple", "banana"]
array.sort()
//...

...we would expect array to be sorted alphabetically, with apple as the first (0th) element of the sorted array

//...
console.log(array)    // [ 'apple', 'banana', 'cherry' ]
console.log(array[0]) // apple

Note that <array>.sort() in JavaScript sorts the array "in place", so that the original, unsorted array no longer exists afterward. In some languages, and for some sorting algorithms, arrays are not sorted in place, and a new array will be returned. This new array should be assigned to a new variable.

But often we will be working with objects more complex than strings, and we will need to define custom comparison functions. These are functions which take two elements of type T and return a number, and are used to sort arrays of type T:

type T = string

const newArray: T[] = ["cherry", "apple", "banana"]

function comparison(t1: T, t2: T): number {
  return t1.charCodeAt(0) - t2.charCodeAt(0)
}

newArray.sort(comparison)

console.log(newArray)    // ?
console.log(newArray[0]) // ?

Without reading the docs, will the console.log()s above give the same result as the earlier ones? How about something a bit simpler -- sorting an array of numbers:

type T = number

const newArray: T[] = [42, 2112, 19]

function comparison(t1: T, t2: T): number {
  return t2 - t1
}

newArray.sort(comparison)

console.log(newArray)    // ?
console.log(newArray[0]) // ?

Will the first element above be 19? Or 2112? Are you sure?

I understand the utility of sorting algorithms, and I understand the need for a ternary (greater than, less than, or equal) return value, and hence number as the return type instead of boolean, but comparison functions are just one of those things that I've always had to test every time. Sometimes in development, and sometimes in production.

So What Happened?

With what we learned above, you should now be able to see what went wrong with my initial code. The problem was here

    // get the blog post date from its git commit date
    const gitLog = SlugFactory.git.log({ file: `blog/${slug.params.slug}.md` });

    return gitLog.then(lines => {
      const dates = lines.all.map(each => each.date);

      // if blog post hasn't been committed yet, use current date
      const date = dates[0] ?? new Date().toISOString();

      return new FrontMatter(slug.params.slug, title, description, date, rawContent);
    });

git log returns commits sorted by date, such that newer commits come first and later commits come afterward. So dates[0], above, is the newest commit returned from git log, and each blog post was being given a "publication" date of the most recent commit in which that post was modified.

When were these blog posts most recently modified? Well, all of them were modified in that same commit, because the point of the commit was to remove the date parameter from the front matter. Essentially, I was mixing up the lastUpdated date and the published date. One of these is the first element in the list (dates[0]) and one of them is the last element in the list (dates[dates.length-1]).

So like I said, there are four hard problems in computer science.

On To The Next One

With that fixed, we're off to the races, right?

Oh... well, that's not right.

Those two posts were both committed on January 2 (Hello, World! and Git Hooks), not on January 6. So why did they both have the wrong date?

That's right, it's another bug... Or is it?

Find out in the thrilling final installation of this debugging mystery!

Top comments (4)

peerreynders • Jan 24 '22 • Edited

but comparison functions are just one of those things that I've always had to test every time

If you need to rely on memorization:

compare(a,b) > 0 then b before a

knowing that

function comparison(t1: T, t2: T): number {
  return t1.charCodeAt(0) - t2.charCodeAt(0)
}

t1 = "cherry", t2 = "apple"
'c' - 'a' > 0 i.e. "apple" before "cherry"
ascending order

function comparison(t1: T, t2: T): number {
  return t2 - t1
}

note the "reversed" order in the operation
t1 = 42, t2 = 2112
so t2 - t1 > 0 i.e. 2112 before 42
descending order

Is this painful?

Sure, hence

function byFirstCuAscending(t1: T, t2: T): number {
  return t1.charCodeAt(0) - t2.charCodeAt(0)
}

newArray.sort(byFirstCuAscending)

function byNumberDescending(t1: T, t2: T): number {
  return t2 - t1
}

newArray.sort(byNumberDescending)

i.e. state the intent.
One reason to steer clear of inline anonymous functions.

So What Happened?

I'm not convinced this has anything to do with sorting.

I could be wrong but I think there's some ambiguity when it comes to Simple Git's API (TypeScript or not). It mostly defers to the git log documentation (and Pretty Formats).

So given the amended code:

    const postCommits: RichCommit[] = await env.git.log([ 'master', `blog/${slug}.md` ]).
      then(commits => commits.all.filter(commit => commit.message != "Merge branch 'development'"));

it seems to be more a misunderstanding of the data that is being returned by the API.

Also given the way Simple Git depends on the Git documentation the date property name is tricky.

Commit Formatting

When =<format> part is omitted, it defaults to medium.

If Simple Git follows the Git convention then date would be "Author date". The "commit date" is only available with the fuller format as CommitDate or committer date.
(edit: it is the author date; the "commit date" requires a custom format option with Simple Git).

Viewing the Commit History

You may be wondering what the difference is between author and committer. The author is the person who originally wrote the work, whereas the committer is the person who last applied the work

So date may actually be the "Author Date" which may be good enough for your purposes as it refers to the time of git commit. But the actual "commit date" can be altered by other operations.

So if you need sorting by "author date" then perhaps an explicit sort may be the "safer" route.

This seems to be more related to (mis)understanding the data being manipulated/relied on. If anything, it demonstrated again how "naming things is hard".

Andrew (he/him) • Jan 24 '22

There are definitely some quirks here re: author vs. committer, and the potential for commits to change due to cherry-picking, etc. Hopefully I don't make that much of a mess of the repo, though, as it's just me committing to it. But if I do, you can bet there'll be another blog post about what I learned from fixing it!

peerreynders • Jan 24 '22

My biggest concern would be having to put the date back into the frontmatter to implement drafts (with a future date) - i.e. the oldest "author date" would no longer imply "publish date".

Andrew (he/him) • Jan 24 '22 • Edited

The way I'm doing drafts right now is prepending wip- to the filename, and .gitignore-ing blog/wip-*. That means that drafts aren't under version control. This is fine for me (for now) because usually I have zero or one drafts at any given time.

Bringing drafts into version control is an interesting problem. As you say, it would require basically throwing out the idea that first commit == published date. Maybe a drafts directory is the easiest way to go? git log <filename> doesn't follow file renames by default (see --follow), so moving a file from drafts/ to blog/ could essentially be the "trigger" for publication.

Interesting stuff. Thanks for pointing this out!