This post originally appeared on Masilotti.com.
I woke up yesterday eager to add pretty social shares to my side project, weTabletop. As I plugged a few URLs into the Twitter Card Validator a few started to 404. Oddly, the records seemed to be fine in my development database running a production dump from late last week.
Time to Sanity Check
Something was going on. Did I somehow delete a bunch of records from production?
To verify I remoted in to the Heroku Rails console and started debugging. Comparing table counts with those in dev taught me that the events
table was out of sync. By a lot.
I figured out which records were missing by comparing the IDs of all records in production to a normal database sequence (1, 2, 3, 4, etc.). I very rarely actually delete events, so any gaps in the pattern will show something wrong.
(irb)> ((1..Event.last.id).to_a - (Event.first.id..Event.last.id).to_a).count
=> 2526
I had somehow lost over 2500 records. ๐ญ
Backups? What Backups?
Ideally, this would be fixed by merging a database backup into the current dataset in production. I could have downloaded a dump from a week ago, found all the records, and uploaded those back to prod.
However, to keep Heroku expenses as low as possible I don't pay for the $50 database add-on, the cheapest one that supports automatic backups. So no automated backups for me. ๐ค
Luckily, my dataset is still very small (~30k records) so I can pull production into development every now and then. And my most recent dump was from before the random deletions started occurring! While not perfect, I figured I can write a script to manually export and import the records from development to production.
Export/Import Raw JSON
- Get the IDs of missing records (from before)
- Write each record to a file, as JSON
- Upload the file and an import script to production
- Parse each line and
#create!
a new record
Exporting the entire record to JSON (bypassing any custom serializers) ensures all the data is preserved. This includes the record's ID and timestamps, something you usually don't want to carry over when moving data.
This is one of the rare times you actually want that data. I needed the backups to look just like the records they were restoring. Event #412 should be marked as created on Feb 14, not Mar 18 (today).
The exporter, run in development
class EventJSONExporter
def export
ids = [1, 2, 3, ...].freeze # pasted in from before
File.open("db/events.json", "wb") do |file|
Event.where(id: ids).find_each do |event|
file.write event.to_json
file.write "\n"
end
end
end
end
The importer task, run in production
namespace :events do
desc "Import deleted events from JSON file."
task import_json: :environment do
saved_event_ids = Set.new
filename = Rails.root.join "db", "events.json"
File.open(filename).each_line do |line|
begin
event = Event.new(JSON.parse line)
event.save!
saved_event_ids << event.id
rescue JSON::ParserError
puts "Couldn't parse JSON: #{line}"
rescue StandardError => e
puts "Couldn't save event #{line["id"]}: #{e}"
end
end
puts "Imported #{saved_event_ids.count} events:"
puts saved_event_ids
end
end
Root Cause Analysis
None of this matters if the records continue to magically delete themselves. A rigorous seach for destroy
and delete
through the entire codebase lead me to a single culprit: the Google Calendar event importer.
class GoogleCalendarEventImporter
# ...
def create_or_update_event(google_calendar_event)
event = Event.find_or_initialize_by i_cal_uid: e.i_cal_uid
if google_calendar_event.cancelled? && event.persisted?
event.delete
end
# ...
end
end
Looks fairly innocent, right? If the Google Calendar event was cancelled then delete it from the database.
Turns out i_cal_uid
can be nil
when the event is cancelled. Only ~10% of all events are from Google Calendar, the other 90% never get an i_cal_uid
! This leads to #find_or_initialize_by
finding any of the other 90% of the events and deleting that one. And this code is run every time a synced calendar is updated โ a lot.
In their defense, Google does document that the iCalUID
can be blank for cancelled events. However, it was noted under the status
section, not where I expected it near the iCalUID
reference.
status: Deleted events are only guaranteed to have the id field populated.
Fixes and Looking Forward
The quick fix is to not delete the record when the Google Calendar event is cancelled. I made this change and deployed to ensure I didn't lose any more data.
def create_or_update_event(google_calendar_event)
if google_calendar_event.cancelled?
return # TODO: Remove cancelled events by e.id, not iCalID
end
# ...
end
But this event is still, well, cancelled. It shouldn't be shown to anyone. The code will need to additionally track the Google event ID for each record and only delete if there is a match.
What's the difference between
id
andi_cal_uid
? Every event has a uniqueid
per calender and repeating events all share the samei_cal_uid
.
How to Prevent This
Phew. In the end I restored all but 17 records; I'll have to manually re-create those myself. But still, I never want to have to do this again. Here are some ways this could have been avoided:
- A better understanding of the API contract with Google
- More aggressive alerting when destructive actions occur
- Better unit tests that handle when
i_cal_uid
isnil
At best this post helps someone recover from a similiar data loss. At worst it shows how easily a full-time developer with almost a decade of experience can make such a huge mistake!
Enjoy my humility but please spend the $50 for a database with a backup strategy. ๐
Top comments (4)
You could roll your own backup solution at very low cost by having the Heroku CLI buildpack installed, and running a scheduled task to perform the backup?
dev.to/databasesponge/running-hero...
Heroku is expensive but with some AWS knowledge, you can get this much cheaper.
Migrate your database using Database Migration Service (DMS) from Heroku to RDS, it will cost nothing to do so.
The smallest RDS instance (t2.micro) costs $14 USD / mo. So that comes with automatic backups. So that's already saving you ~40 USD / mo over Heroku.
AWS comes with Free Tier usage of 750 hours. There's something ~730 hours in a month so that means the first month free.
AWS has a service called Aurora Serverless which runs a MySQL database only when you need it, so if this app is a pet project or not really in heavy use (some traffic each day) you would be paying pennies per month and it's super highly available and durable and has automatic backups.
Thanks a lot for writing this. Itโs one of those situations where the more info that is out there the best. I could see this post coming in handy for some future desperate searcher.
Thank you! That's why I posted it. Even if it's me who is that future desperate searcher. ๐