TL;DR
Novu's team encountered a significant bug affecting date calculations in their CI/CD pipelines, hindering all deployments.
The issue arose from the date-fns library's addMonths and subMonths functions.
We fixed this by using addDays and subDays functions instead.
Novu: Open-source notification infrastructure π
Just a quick background about us. Novu is an open-source notification infrastructure. We basically help to manage all the product notifications. It can be In-App (the bell icon like you have in the Dev Community - Websockets), Emails, SMSs and so on.
The Mindset
When working in software development, we're always prepared for bugs to crop up.
Sometimes they're small, easy to identify, and quick to fix.
Other times, they're like this year's candidate for our 'Bug Of The Year'.
This was a bug so elusive and mysterious that it had us rummaging through our pipelines, questioning our code-base, and coming face-to-face with the intricacies of date manipulation.
Problems, Different Problems, and More Problems
Our CI/CD pipelines were failing. Specifically, two tests which were blocking ALL new deployments. It was time to put on our detective hats π΅οΈ.
We dove into our commit history using git bisect
however it offered us no insight. Git bisect took us back to commits that where over 6 months in the past, long before any of our newest changes to the system that would have caused this. Was this bug created at the very beginning of Novu?
However, we did have a clue. Our failing unit tests showed us that we had incorrect date calculations.
Gathering the Clues π‘
Strangely, the difference was just one day.
const startDate = new Date("2023-08-31");
const oneMonthAhead = addMonths(startDate, 1);
const result = subMonths(oneMonthAhead, 1);
console.log(result); // Expected: 31st of August, Reality: 30th of August
We also found that this does not happen on 31st July.
const startDate = new Date("2023-07-31");
const oneMonthAhead = addMonths(startDate, 1);
const result = subMonths(oneMonthAhead, 1);
console.log(result); // Expected: 31st of July, Reality: 31th of July
But the bug shows up again January 31st.
const startDate = new Date("2023-01-31");
const oneMonthAhead = addMonths(startDate, 1);
const result = subMonths(oneMonthAhead, 1);
console.log(result); // Expected: 31st of January, Reality: 28th of January
So this bug only happens when we add 1 month to a month that has more days then the next month and then subtract 1 month to go back to the month before.
This is a sneaky one
So here is what we know so far:
- It would only show up on systems that does this specific sequence of logic.
- The code would have to be ran on one of the few dates that are effected.
- This effect is not documented anywhere on any of the libraries we use.
The worst thing is that this bug is also shows up HR tools, finance tools, salary tools, public government tools all rely on this package but unfortunately it is still better then us making the functions our-self's.
It has been said many times that date-times are among the trickiest aspects of programming, and our current predicament served as a hash reminder.
Why a simple actions can lead to bad things
After finding this out, we had a 'Eureka!' moment.
Our CTO, Dima Grossman, then had the idea to try it it on raycast. Interestingly enough it was happening in their product too.
We realized that the issue stemmed from being on the last day of the month, but what exactly was going awry?
The Culprit:
This popular utility library for date operations was at the heart of the problem.
Specifically, the addMonths
and subMonths
functions.
The addMonths
function, when adding a month to the last day of any given month, would take you to the last day of the following month. Logical, right?
// source: https://github.com/date-fns/date-fns/blob/main/src/addMonths/index.ts
const daysInMonth = endOfDesiredMonth.getDate()
if (dayOfMonth >= daysInMonth) {
// If we're already at the end of the month, then this is the correct date
// and we're done.
return endOfDesiredMonth
} else {
// Otherwise, we now know that setting the original day-of-month value won't
// cause an overflow, so set the desired day-of-month. Note that we can't
// just set the date of `endOfDesiredMonth` because that object may have had
// its time changed in the unusual case where where a DST transition was on
// the last day of the month and its local time was in the hour skipped or
// repeated next to a DST transition. So we use `date` instead which is
// guaranteed to still have the original time.
_date.setFullYear(
endOfDesiredMonth.getFullYear(),
endOfDesiredMonth.getMonth(),
dayOfMonth
)
return _date
}
But the subMonths
function, rather than having its own dedicated logic, simply reused addMonths
with a negative number. D.R.Y principles in action, but with an unintended consequence.
// source: https://github.com/date-fns/date-fns/blob/main/src/subMonths/index.ts
export default function subMonths<DateType extends Date>(
date: DateType | number,
amount: number
): DateType {
return addMonths(date, -amount)
}
Here is what exactly caused our issue
Let's put it this way:
- For 28th February, add one month and then subtract one month, and you get 28th February. No problems there.
- But, for 31st August, add one month and then subtract one month, and you land on... 30th August. That's one day lost in date limbo!
The core of the issue was the way addMonths
determined the end of the desired month.
For days that were not at the end of the month, the logic was sound.
However, for the last day of a month, the function defaulted to the end of the next month instead of adding the correct amount of days.
The Simple Fix
To ensure a consistent approach to date manipulation, we shifted from using addMonths
and subMonths
to addDays
and subDays
.
This provided a more granular and precise way to handle date calculations, and importantly, allowed us to sidestep the addMonths
pitfall.
Lessons Learnt
This bug served as a strong lesson in a few key areas:
- Assumptions are Risky: Never assume that widely-used libraries are infallible. Even the most popular ones have their quirks.
- Tests are Gold: If not for our rigorous testing suite, this bug might have remained hidden, only to wreak havoc at the most inopportune moment.
- Dates are Tricky: They've always been, and will continue to be, a challenging aspect of software development. Always handle with care.
While this bug threw a wrench in our pipes, it also reinforced the importance of comprehensive tests and the need to continually question and challenge our assumptions.
Death of this Bug
In a world of code where dates and times form such a crucial part of our applications, bugs like these provide not just a hiccup, but a learning opportunity. The next time you find a weird issue in your application, dig deep. Who knows, you might just uncover the next 'Bug Of The Year'.
You can find the PRs and Issues here:
Top comments (44)
I'm sorry for the off-topic, but the joke just has to be made:
Two dates a year is a problem? I'm lucky if I even get that many... π€
π€£
Two key takeaways:
Never Assume: Even popular libraries can have hidden pitfalls. Always question, regardless of its reputation.
Testing is Crucial: Your robust tests highlighted a significant issue, emphasizing the need for thorough testing in software development.
Thanks for sharing this valuable lesson! Here's to more bug-free coding! π₯
These are great takeaways, thank you for sharing!
I think that the issue is one of definition rather than coding around the end of the month. We write software that calculates rental periods. When the rent is monthly, the start date could fall on (say) the 31st or 30th of the month. Advancing these through the year gives an ambiguity. When you get to (e.g.) February, both charges are adjusted to the 28th / 29th. What day should they be in March? They should be back to 31st or 30th. You can't know this without storing an additional peice of information, in this case the regular billing day. However, without needing to support both methods, most people expect the last day of the month to be dominant so, once the day is the last date of the month it stays there when next move a month. To easily spoort this dominant definition, you add one day to the date (for all calculations), then add / subtract the required number of months, then subtract one day. This makes the end of the month dominant. Remember that, if advancing a date monthly through the year using the dominant method, then any date with a day greater than or equal to 28 will eventually end up being the 31st or last day of the month (without also knowing the regular billing day to make the distinction).
From my experience, I'd advise using always fixed version of the npm package. The most common usage I've seen is to use a tilde. Using tilde (~) gives you bug-fix releases. However, on CI/CD, npm may fetch a really tiny next version, and that may fail your build. You'll spend a whole day investigating the root cause. Even more confusion comes from the fact that package.json contains the same version, e.g., 3.4.0 in the repository and on your computer, but in fact on the server there might be 3.4.1 installed. Hence, fixed versions ensure that npm dependencies will always be on the same version.
It's fine using dependency constraints like
^
and~
as long as you are locking your dependencies with a lock file and only usenpm ci
inside your pipeline instead ofnpm i
as this will always take the resolves versions from the lockfile. If any commit breaks the pipeline you can just check if the lock file was updated π@niklaspor I don't recall exactly, but I think I had a past problem with
package-lock.json
. There were unresolvable conflicts. You can use the keywords "package lock json problem" to find out what others are struggling with.One of the points npm documentation says is "install exactly the same dependencies". That's what you can exactly achieve without
^
and~
.From my personal side, I have never found any real practical usage of
package-lock.json
.npm i
/npm install
will bump any package to the latest matching package version.npm ci
orpnpm --frozen-lockfile
will keep exactly the versions which were resolves in the lastnpm i
which was executes.Always use
npm ci
inside your pipelines, otherwise you risk getting different packages from your local installation, even if you don't use any ranged but just plain veesions. Also any package deeper in your dependency tree might specify a version range, which might lead to a newer resolves dependency, if you executenpm i
instead ofnpm ci
. Same for pnpm and yarn.I would even suggest you use
npm ci
on your local machine, when your working with any teams of bigger size and you don't want to update dependencies. Otherwise the code on your machine might differ from the one one your colleagues.stackoverflow.com/questions/524996...
support.deploybot.com/article/131-....
@niklaspor Let me put this into a different perspective. The most important thing here is to ask, what kind of problem is being solved here? So the problem is: how to keep npm package dependencies consistent? If using a fixed version works, any other solution will simply be unnecessary. And so far, the solution is working excellently.
As for CI/CD, it is organised in very different ways, and there is no single approach. So if
npm ci
solves specific problems for someone, then using this approach is the right solution.Have you had a bug with dates before?
The question should be if you ever wrote something with dates with no bugs at all????
Two hardest things in programming: Cache invalidation and naming things.
I think we should add time, timezones and dates.
That is so true, even the best of us get tripped up by this.
Not quite dates, but arguably even more strange: Lua is a very small language, with a relatively simple model for handling dates, and sometimes you need to get a bit hackey to deal with time zones.
Well, luckily it just uses the C time/date functions, which are clearly defined, so you can at least rely on your hacks to work everywheβ hold up a second, you didn't consider windows.
Turns out microsoft's fuck-up of a C compiler knows better than the spec, so on any Lua version compiled using it, you cannot get the current time zone as a numeric offset, because the library function just returns some other short string instead.
This took way too long to debug, and when I figured out the cause was that microsoft simply does what microsoft wants, I felt like throwing my PC out the window.
I think most people feel the same way about microsoft.
and, by the way, I enjoyed reading this story!
The good old date time problem.
We have two popular problems in programming:
Now with this exposΓ©, we have a third - date/time problems!
Thanks for writing this u @cliftonz . Good lessons learned
I wouldn't consider this a bug.
This works as I expect it to work.
I think your thought is very true, while in our perspective it was a bug I agree the actually logic is not a bug.
However, I do think that there should be a warning about this edge case as not everyone would be able to see it.
Doesn't this happen for Mars, May and October too? So 5 times a year rather than 2
The fact that the bug was happening 2 times a year brough me all the way here, like have you ever encountered a bug that only manifests say after 5 years under very specific scenarios? Damn it was working all long you say!!!
Date-math is non-trivial, month-math especially so.
What does "add one month" (or subtract) actually mean? In the case of this library, it's "jump to the same day number ahead/back". Adding/subtracting 30 would also be error-prone.
So your takeaways are spot on, though I'd also add this caveat:
When something isn't of a given fixed quantity, beware abstractions that treat them as if they were...!
Great Insight!